基于改进RTMPose3D模型的番茄三维关键点估计方法
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

苏州市科技强农创新项目(SNG2025009)和国家重点研发计划项目(2022YFB4702202)


Tomato 3D Keypoint Estimation Method Based on Improved RTMPose3D Model
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对温室环境下串番茄枝叶遮挡严重、光照干扰强,导致自动采摘机器人难以稳定获取果实三维位姿的问题,提出了一种改进的串番茄三维关键点估计模型TomatoPose3D。该模型在训练阶段引入RGB图像与三维真值关键点的联合约束,增强结构一致性与泛化能力;在推理阶段,仅输入单幅RGB图像即可端到端回归三维关键点坐标,避免了因点云缺失或稀疏导致的定位失败。改进模型以RTMPose3D为基线,引入全局结构感知模块MobileVit Block与分布感知解码策略(DARK),在保持轻量化的同时提升了定位精度。温室场景对比实验表明,TomatoPose3D的PCK@0.05指标较RTMPose3D和SimpleBaseline3D分别提升5.18、9.98个百分点;在无深度信息辅助下,其定位精度与基于RGB D投影的方法相当,且鲁棒性更优。此外,模型经TensorRT加速部署于工业级嵌入式平台,端到端推理速度达37 f/s,满足采摘机器人实时空间视觉感知需求。

    Abstract:

    Aiming to address the challenge of reliably acquiring 3D pose information of truss tomatoes for autonomous harvesting robots under conditions of severe occlusion and strong light interference in greenhouses, an improved 3D keypoint estimation model named TomatoPose3D was proposed. During the training phase, the model incorporated joint constraints between RGB images and 3D ground-truth keypoints to enhance structural consistency and generalization capability. In the inference phase, the model can end-to-end regress 3D keypoint coordinates from a single RGB image, thereby avoiding localization failures caused by sparse or missing point clouds. Based on the RTMPose3D baseline, the improved model introduced the global structure-aware MobileVit Block and the distribution-aware coordinate representation of keypoints (DARK) decoding strategy, improving localization accuracy while maintaining a lightweight architecture. Comparative experiments in greenhouse scenarios indicated that TomatoPose3D improved the PCK @ 0.05 score by 5.18 and 9.98 percentage points compared with RTMPose3D and SimpleBaseline3D, respectively. Without the assistance of depth information, the model achieved localization accuracy comparable to RGB D projection-based methods while demonstrating superior robustness. Furthermore, the model was deployed on an industrial-grade embedded platform accelerated by TensorRT, achieving an end-to-end inference speed of 37 f/s, which met the real-time spatial visual perception requirements of harvesting robots.

    参考文献
    相似文献
    引证文献
引用本文

王蓬勃,刘宇,赵胜辉,傅毅凯.基于改进RTMPose3D模型的番茄三维关键点估计方法[J].农业机械学报,2026,57(5):149-158. WANG Pengbo, LIU Yu, ZHAO Shenghui, FU Yikai. Tomato 3D Keypoint Estimation Method Based on Improved RTMPose3D Model[J]. Transactions of the Chinese Society for Agricultural Machinery,2026,57(5):149-158.

复制
分享
相关视频

文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2025-12-22
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2026-03-01
  • 出版日期:
文章二维码