Abstract:Canopy-under navigation path recognition in tobacco fields is often hindered by leaf and weed occlusion, as well as significant variations in plant morphology, presenting challenges for the autonomous operation of human-machine collaborative harvesting robots. To address these issues, a novel canopy-under navigation path recognition method was proposed based on the EME-Net, an inter-row image semantic segmentation model. Built upon the DeepLabV3+ architecture, EME-Net featured an encoder that employed the ECA-MobileNetV2 (dubbed EMNet) to replace the original Xception backbone for efficient feature extraction, enabling the model to effectively capture key inter-row path features. The pyramid split attention (PSA) multi-scale feature fusion mechanism was introduced to enhance the representation of inter-row boundary features, particularly under occlusion. Additionally, the ECA mechanism, embedded at both the output and terminal of EMNet, filtered out irrelevant features from tobacco field images, improving feature utilization without reducing the number of channels. To mitigate accuracy degradation caused by foreground-background imbalance, a robust BCE_DiceLoss function was proposed, combining binary cross entropy (BCE) Loss and DiceLoss. Based on the autonomous traversal region masks outputed by EME-Net, the least squares method was used to reshape edge points and extract inter-row navigation lines. Experimental results showed that EME-Net achieved a mean pixel accuracy (mPA) of 91.3% and a mean intersection over union (mIoU) of 88.9%, surpassing the baseline model DeepLabV3+ by 7.9 and 6.1 percentage points, respectively. The average detection frame rate reached 29.5 frames per second, outperforming mainstream segmentation models such as PSPNet, U-Net, HRNet, and Segformer. In practical tobacco field navigation path recognition experiments, the proposed method effectively extracted navigation lines in canopy-under areas with varying levels of occlusion. The mean heading deviation ranged from 1.45° to 3.80°, and the average lateral pixel distance varied from 1.46 pixels to 3.68 pixels. This method met the practical requirements of inter-row navigation tasks and provides a reliable technical solution for the autonomous transportation operation of human-machine collaborative harvesting robots.