Abstract:Aiming to address the challenges of low recognition accuracy, high labor intensity in manual identification, and minimal behavioral differences between selective feeding and normal feeding in dairy cows under complex environmental conditions, a method for identifying selective feeding behavior was proposed based on inspection robots and an improved RT-DETR model. An inspection robot was designed according to dairy cows-feeding characteristics to collect feeding process data. Data collection was conducted in three barns during three time periods (noon, afternoon, and night), ultimately establishing a dataset containing 10280 feeding behavior images across these periods. The RT-DETR model was enhanced by integrating a DBRA structure, which combined the DAttention (DAT) module and Bi-Level Routing Attention (BRA) module into the shallow layers, creating a novel image feature extraction architecture to improve the deep fusion capability of local and global features. Additionally, the Efficient Multi-Scale Attention (EMA) module was incorporated into the model encoder to strengthen high-level semantic information extraction and contextual correlation. Experimental results demonstrated that the improved model achieved a mean average precision (mAP@0.5) of 99.1% on the dairy cow feeding video dataset, with a model memory occupancy of 39.6MB and floating-point operations (FLOPs) of 4.67×1010. Compared with the original model, the mAP@0.5 was increased by 7.4 percentage points, memory occupancy was reduced by 0.9MB, and FLOPs was decreased by 2%. When compared with Sparse R-CNN, YOLO v7-L, YOLO v8n, DINO, Swin Transformer, and DETR models, the proposed model exhibited mAP@50 improvements of 8.5, 9.8, 7.8, 6.6, 11.4 and 9.5 percentage points, respectively. The findings enabled accurate differentiation between normal feeding and selective feeding behaviors, providing technical support for intelligent livestock farming.