Abstract:The natural scenes faced by fruit and vegetable picking robots are complex and changeable. Accurate identification and segmentation of the target fruit are crucial for high success rate harvesting. The instance segmentation is an effective method to solve the problem. Howerver, existing instance segmentation algorithms have some drawbacks, such as the limited effect of edge segmentation accuracy for single-source images, the workload and time spent on image labeling. Therefore, a tomato fruit recognition algorithm based on multi-source fusion image and extended Mask R-CNN model was proposed. Firstly, aiming at the problem of insufficient information in different natural scenes with a single image channel, a multi-source information fusion method combining RGB images, depth images and infrared images was proposed, which enabled the robot to adapt to different lighting and fruits at different maturity stages. Secondly, aiming at the problem of inefficiency of traditional machine learning training sample standards, a clustering method was proposed to assist the rapid labeling of samples to complete the model training. Thirdly, an extended Mask R-CNN deep learning algorithm model was established for online fruit recognition by picking robots. The experimental results showed that the extended Mask R-CNN algorithm model achieved 98.3% detection accuracy and 0.916 detection IoU in the test set, which can well meet the requirements of tomato fruit recognition;under different lighting conditions, compared with the Otsu threshold segmentation algorithm, the extended Mask R-CNN algorithm model was able to distinguish the adherent fruits with clear and complete segmentation results and stronger anti-interference ability.