Abstract:To address the issue of poor detection and positioning capabilities of fruit picking robots in scenes with densely distributed targets and fruits occluding each other, a method to improve the fruit detection and positioning of Faster R-CNN was proposed by introducing an efficient channel attention mechanism(ECA)and a multiscale feature fusion pyramid(FPN). Firstly,the commonly used VGG16 network was replaced with a ResNet50 residual network with strong expression capability and eliminate network degradation problem,thus extracting more abstract and rich semantic information to enhance the models detection ability for multiscale and small targets. Secondly,the ECA module was introduced to enable the feature extraction network to focus on local and efficient information in the feature map,reduce the interference of invalid targets,and improve the model's detection accuracy. Finally,a branch and leaf grafting data augmentation method was used to improve the apple dataset and solve the problem of insufficient image data. Based on the constructed dataset,genetic algorithms were used to optimize K-means++ clustering and generate adaptive anchor boxes. Experimental results showed that the improved model had average precision of 96.16% for graspable apples and 86.95% for non-graspable apples,and the mean average precision was 92.79%,which was 15.68 percentages higher than that of the traditional Faster R-CNN. The positioning accuracy for graspable and non-directly graspable apples were 97.14% and 88.93%, respectively,which were 12.53 percentages and 40.49 percentages higher than that of traditional Faster R-CNN. The weight was reduced by 38.20%. The computation time was reduced by 40.7%. The improved model was more suitable for application in fruit-picking robot visual systems.