温室番茄场景单目RGB模态向深度模态转换模型研究
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家重点研发计划项目(2022YFD2002303-01)和辽宁省教育厅基本科研项目面上项目(JYTM20231303)


Monocular RGB to Depth Conversion Model for Greenhouse Tomato Scene
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    在温室场景下,针对番茄的表型解析、自主采摘、多模态联合分割等任务,快速、高精度、低成本地获取场景深度信息对农机视觉系统至关重要。本研究提出了一种嵌入注意力机制的RGB模态向深度模态转换的单目深度估计网络(RGB to depth conversion network,RDCN),以解决传统算法无法充分挖掘编码器的特征提取能力、深度估计精度低以及边界模糊问题。首先以ResNext101替换原来的ResNet101骨干网络,提取各个不同层级的特征图并将其融合到拉普拉斯金字塔分支,强调特征的尺度差异性并强化特征融合的深入与广泛性;同时为了增强模型获取全局信息以及上下文信息交互的能力,引入了置换注意力模块(Shuffle attention module, SAM),以减少下采样过程造成的局部细节信息丢失;其次,为了改善预测深度图的边界模糊问题,嵌入深度细化模块(Depth refinement module, DRM),感知预测特征图物体附近的深度变化;实现了温室场景下番茄植株图像深度信息的精准预测。试验结果表明,RDCN在测试集上的平均相对误差、均方根误差、对数均方根误差、对数平均误差相比于基准模型分别降低了20.5%、10.3%、8.3%、21.8%,在1.25、1.252、1.253阈值下的准确率分别提高3.2%、1.2%和1.0%;并且网络生成的深度图像视觉上全局完整清晰且有较多的纹理细节;研究表明,RDCN在温室场景下能够基于RGB信息获得高质量的深度信息,可为基于单目传感器的温室场景农机导航以及深度图像在多模态任务中的应用提供技术支持。

    Abstract:

    In greenhouse environments, fast, high-precision, and low-cost acquisition of scene depth information is crucial for agricultural machine vision systems in tasks such as tomato phenotype analysis, autonomous harvesting and multimodal joint segmentation. An attention-embedded RGB-to-depth conversion network (RGB to depth conversion network,RDCN) for monocular depth estimation was proposed, addressing issues in traditional algorithms such as insufficient feature extraction capability of encoders, low depth estimation accuracy, and blurred boundaries. Firstly, ResNext101 was employed to replace the original ResNet101 backbone network, extracting feature maps from different levels and integrating them into the Laplacian pyramid branches. This approach emphasized the scale differences of features and enhances the depth and breadth of feature fusion. To enhance the models capacity for capturing global information and contextual interactions, a shuffle attention module (SAM) was introduced. This module also helped minimize the loss of local detail information caused by the down-sampling process. This module also mitigated the loss of local detail information caused by the downsampling process. Secondly, to address the issue of blurred boundaries in the predicted depth maps, a depth refinement module (DRM) was embedded to capture depth variations near object edges in the predicted feature maps. For the study, an RGBD image acquisition platform for tomatoes was constructed in a daylight greenhouse environment using an Azure Kinect DK depth camera. To ensure diversity in the dataset, images were collected at different times of the day based on varying light intensities in the greenhouse environment. The training set was then augmented by using three methods: horizontal mirroring, random rotation, and color jittering, resulting in a total of 8515 aligned RGBD image sets of tomatoes. Experimental results indicated that by introducing the shuffle attention module and the depth refinement module, the model achieved accurate depth information prediction in greenhouse scenes. Compared with the baseline model, the visualized depth maps generated by the network demonstrated global completeness and clarity, with more texture details, especially in regions with complex geometries and significant depth variations, exhibiting superior visual effects. Experimental results showed that, compared with the baseline model, RDCN reduced the mean relative error, root mean square error, log root mean square error, and log mean error on the test set by 20.5%, 10.3%, 8.3%, and 21.8%, respectively. Additionally, accuracy under the 1.25, 1.252, and 1.253 thresholds was improved by 3.2%, 1.2%, and 1.0%, respectively. Moreover, the depth images generated by the network were visually complete and clear, with abundant texture details. Studies showed that RDCN can obtain highquality depth information from RGB data in greenhouse environments, providing technical support for agricultural machine navigation in greenhouse scenarios using monocular sensors, as well as for the application of depth images in multi-modal tasks.

    参考文献
    相似文献
    引证文献
引用本文

高旺,邓寒冰,邢志鸿,朱彦强.温室番茄场景单目RGB模态向深度模态转换模型研究[J].农业机械学报,2025,56(6):499-508,574. GAO Wang, DENG Hanbing, XING Zhihong, ZHU Yanqiang. Monocular RGB to Depth Conversion Model for Greenhouse Tomato Scene[J]. Transactions of the Chinese Society for Agricultural Machinery,2025,56(6):499-508,574.

复制
相关视频

分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-09-22
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2025-06-10
  • 出版日期:
文章二维码