Abstract:For the deep neural network model deployed to embedded devices (such as tomato clusters picking robots), there are some problems, such as slow running speed, low recognition rate of picking targets, inaccurate positioning and so on, an efficient model for tomato clusters detection was proposed and verified. The model was composed of two modules: detection and semantic segmentation. Target detection was responsible for extracting the rectangular region where the tomato cluster was located, and then using the semantic segmentation algorithm to obtain the tomato stem position in the rectangular region. In the tomato detection module, a backbone network based on deep convolution structure was designed to improve the accuracy of crop recognition while realizing the sparsity of model parameters. K-means++ clustering algorithm was used to obtain a priori frame, and DIoU distance calculation formula was improved to obtain a more compact lightweight detection model (DC-YOLO v4). In the semantic segmentation module (ICNet), MobileNetv2 was used as the backbone network to reduce the amount of parameter calculation and improve the operation speed of the model. The model was deployed on the tomato clusters picking robot for verification. The self-made tomato data set was used for testing. The results showed that the average detection accuracy was 99.31% on tomato test set, outperforming YOLO v4 by 2.04 percentage points. The mIoU and mPA achieved 81.63% and 91.87% on tomato stem set, exceeding ICNet by 2.19 percentage points and 1.47 percentage points, respectively. The accurate picking rate of tomato clusters was 84.8%, it took 6s to complete a picking operation.