面向温室移动机器人视觉定位场景识别方法
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家重点研发计划项目(2023YFD1501303、2021YFD1500204)


Visual Place Recognition for Localization of Mobile Robots in Greenhouse
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对温室移动机器人定位需求及高动态变化且高度相似的温室场景识别难题,本文提出一种基于局部特征选择与聚合的视觉定位场景识别方法模型。模型以预训练视觉Transformer(DINOv2)为骨干网络,提取场景图像局部特征,并设计一种基于可学习查询词局部特征选择与聚合的全局描述符生成策略,通过交叉注意力机制筛选判别性强的局部特征,生成紧凑的全局描述符。融合对比学习与三元组学习优势,应用多重相似性损失函数对识别模型进行优化。构建了包含2100个场景、25000幅图像的温室场景数据集,数据涵盖了光照、视角、距离和时序变化等多维干扰因素。试验结果表明,在自建的温室场景数据集上,模型top-1召回率(R@1)、R@5和R@10分别达到88.79%、96.49%和97.96%,较现有场景识别基准方法NetVLAD、GeM、CosPlace、EigenPlaces、MixVPR和SALAD等,R@1分别提高23.70、19.24、10.64、3.30、3.90、0.44个百分点。强光和弱光条件下R@1波动小于5个百分点,视角变化15°内精度保持为93.12%,距离变化2倍时R@1为63.94%,5d内作物生长变化下R@1为61.14%。模型对光照、视角及采样距离变化具有一定的鲁棒性,但大幅度的视角和距离变化会降低模型性能,作物长期生长变化对模型性能影响较大。在温室移动机器人上进行场景识别验证,平均识别率为85.88%,具有一定的实用性。构建的基于可学习查询词的特征选择与聚合机制及所选取的特征提取骨干网络,有效提升了温室场景识别精度,提出的技术框架可为温室移动机器人视觉系统设计提供参考。

    Abstract:

    Aiming to address the challenges of greenhouse mobile robot localization and scene recognition in highly dynamic environments with visually similar scenes, a novel scene recognition model was proposed based on local feature selection and aggregation. The model employed a pre-trained vision Transformer (DINOv2) as its backbone network to extract local image features and introduced a learnable query-based feature selection and aggregation strategy to generate discriminative global descriptors. By leveraging cross-attention mechanisms, the model selectively aggregated the most informative local features into compact global representations. Furthermore, a hybrid loss function combining contrastive learning and triplet learning was applied to optimize the recognition model. A comprehensive greenhouse scene dataset containing 2100 scenes and 25000 images was constructed, covering multiple challenging factors such as illumination variations, viewpoint changes, distance scaling, and temporal crop growth. Experimental results demonstrated that the proposed model achieved Top-1 recall rates (R@1) of 88.79%, 96.49% (R@5), and 97.96% (R@10) on the collected dataset, outperforming state-of-the-art scene recognition benchmarks, including NetVLAD, GeM, CosPlace, EigenPlaces, MixVPR, and SALAD by 23.70, 19.24, 10.64, 3.30, 3.90, and 0.44 percentage points in R@1, respectively. The model exhibited strong robustness under varying illumination conditions (R@1 fluctuation <5 percentage points), moderate viewpoint changes (93.12% accuracy within 15° deviation), and scaling variations (63.94% R@1 at 2×distance). However, performance declined under extreme viewpoint/distance changes and long-term crop growth variations (61.14% R@1 after 5 days). Real-world validation on a greenhouse mobile robot confirmed the model’s practicality, achieving an average recognition rate of 85.88%. The proposed learnable query-based feature aggregation mechanism, combined with the carefully selected feature extraction backbone, significantly improved recognition accuracy in greenhouse environments. This framework can provide a viable technical solution for vision systems in agricultural mobile robotics.

    参考文献
    相似文献
    引证文献
引用本文

周云成,于美玲,吴帛航,张福宁.面向温室移动机器人视觉定位场景识别方法[J].农业机械学报,2026,57(4):151-161. ZHOU Yuncheng, YU Meiling, WU Bohang, ZHANG Funing. Visual Place Recognition for Localization of Mobile Robots in Greenhouse[J]. Transactions of the Chinese Society for Agricultural Machinery,2026,57(4):151-161.

复制
分享
相关视频

文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2025-08-06
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2026-02-15
  • 出版日期:
文章二维码