Abstract:Timely and accurate information on sow nursing behaviour in intensive pig industry is beneficial to efficient reproductive performance. The purpose was to establish deep-learning networks to recognize sow nursing behaviour automatically. The recognition was performed at two stages: nursing zone localization in temporal and spatial domain and nursing behaviour recognition using spatio-temporal information extraction and fusion. Firstly, video image sequences were input into Mask R-CNN, whose backbone ResNet-101+FPN generated feature maps and the feature maps were used to produce a set of regions of proposal that were fed into classification head and keypoints head, respectively. The classification head performed sow posture classification and sow detection and keypoint head detection of keypoints related to sow nursing zone extraction. If sow was classified as laterally lying, the keypoint detection results would remain or be filtered out. A sequence of extracted nursing zones were passed into following subnetwork. A self-adaptive nursing zone extraction method was proposed, according to the piglet’s postpartum day and video recording height. Afterwards, within the spatio-temporal region of interest, spatio-temporal features were extracted by the temporal stream and spatial stream of the two-stream convolutional network, respectively. Convolutional features from the two streams were fused with combination of concatenation and convolution for final nursing recognition. Test results showed that the total keypoint detection recall Rk and precision Pk were 94.37% and 94.53%, respectively. Sow nursing behavior in long videos were recognized with an accuracy of 97.85%,a sensitivity of 94.92% and a specificity of 98.51%, which demonstrated the feasibility of automatic recognition of sow nursing behavior with computer vision.