Estimation of Pig Weight Based on Cross-modal Feature Fusion Model
CSTR:
Author:
Affiliation:

Clc Number:

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    In recent years, with the increasing scale of pig farming in the world, farms are in urgent need of automated livestock information management systems to ensure animal welfare. As one of the significant growing information of pigs, the weight of pigs can help farmers to grasp the healthy status of pigs. The traditional methods manually measure pig weight, which are time-consuming and laborious. With the development of image processing technology, the estimation of pig weight by analyzing images has opened up a way for intelligent determination of pig weight. However, many recent studies usually considered only one image modality, either RGB or depth, which ignored the complementary information between the two modalities. To address the above issues, a cross-modality feature fusion model CFF-ResNet was proposed, which made full use of the complementary between texture contour information of RGB images and spatial structure information of depth images, for realizing the intelligent estimation of pig weight without human contact in a group farming environment. Firstly, RGB and depth images of the piggery in top view were acquired, and the correspondence between the pixel coordinates of the two different modalities were used to achieve alignment. Then the EdgeFlow algorithm was used to segment each target individual pig in the coarse-to-fine pixel level, while filtering out irrelevant background information. A two-stream architecture model was constructed based on the ResNet50 network, and a bidirectional connection was formed by inserting internal gates to effectively combine the features of RGB and depth streams for cross-modal feature fusion. Finally, the two streams were regressed separately to produce pig weight predictions, and the final weight estimation values were obtained by averaging. In the experiment, the data was collected from a commercial pig farm in Henan, and a dataset with 9842 pairs of aligned RGB and depth images was constructed, including 6909 pairs of training images and 2933 pairs of test images. The experimental results showed that the mean absolute error of the proposed model on the test set was 3.019kg, which was reduced by 18.095% and 12.569% compared with the RGB and depth-based single-stream benchmark models, respectively. The average accuracy of proposed method reached 96.132%, which was very promising. Noting that, the model did not add additional training parameters when compared with the direct use of two single-stream models to process RGB and depth images separately. The mean absolute error of the model was reduced by 46.272%, 14.403%, 8.847%, and 11.414% compared with other existing methods: the conventional method, the improved EfficientNetV2 model, the improved DenseNet201 model, and the BotNet+DBRB+PFC model, respectively. In addition, to verify the effectiveness of cross-modal feature fusion, a series of ablation experiments were also designed to explore different alternatives for two stream connections, including unidirectional or bidirectional additive or multiplicative connections. The experimental results showed that the model with a bidirectional additive connection obtained the best performance among all alternatives. All the above experimental results showed that the proposed model can effectively learn the cross-modal features and meet the requirements of accurate pig weight measurement, which can provide effective technical support for pig weight measurement in group farming environment.

    Reference
    Related
    Cited by
Get Citation
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:June 20,2023
  • Revised:
  • Adopted:
  • Online: December 10,2023
  • Published:
Article QR Code