Abstract:Cotton topping is a crucial management practice in cotton farming, and accurately and rapidly detecting top buds in complex field environments is a key step to achieving precise topping. To address this challenge, a lightweight visual detection model, RE-YOLO-QAT, based on the YOLO 11n framework, was developed. The model was improved upon the YOLO 11n architecture by replacing its backbone network with the EfficientViT, a lightweight vision transformer. This replacement, coupled with a hybrid attention and convolutional structure, effectively reduced the model's computational costs while maintaining its feature extraction capabilities. Furthermore, the model incorporated a reparameterized feature pyramid network (RepGFPN), which significantly enhanced the model's ability to detect small top buds, improving detection accuracy in the field. Additionally, the model employed quantization-aware training and structured pruning techniques to dramatically compress the model size and reduce computational costs without significantly sacrificing detection accuracy. These optimizations helped ensure the model meet the real-time detection requirements necessary for efficient cotton topping operations. The experimental results demonstrated that the RE-YOLO-QAT model achieved a cotton top bud recognition rate of 94.2% in complex field scenarios. The model contained only 1.01×10? parameters and required just 2.3×10? FLOPs, which was a significant reduction in computational cost. Compared with the baseline model, RE-YOLO-QAT reduced the computational cost by 64.06% while suffering a negligible accuracy loss of just 0.2 percentage points. This made the model highly efficient, suitable for real-time detection in cotton topping operations. Overall, the results indicated that this research provided both the theoretical foundation and the technical framework necessary for the development of intelligent, precise, and efficient cotton topping systems in future agricultural operations.