基于深度强化学习的虚拟机器人采摘路径避障规划
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家自然科学基金项目(32071912)、广东省自然科学基金项目(2018A030313330)、广州市科技计划项目(202002030423)和国家级大学生创新创业训练计划项目(201910564033)


Obstacle Avoidance Planning of Virtual Robot Picking Path Based on Deep Reinforcement Learning
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对采摘机器人在野外作业环境中,面临采摘任务数量多,目标与障碍物位置具有随机性和不确定性等问题,提出一种基于深度强化学习的虚拟机器人采摘路径避障规划方法,实现机器人在大量且不确定任务情况下的快速轨迹规划。根据机器人本体物理结构设定虚拟机器人随机运动策略,通过对比分析不同网络输入观测值的优劣,结合实际采摘行为设置环境观测集合,作为网络的输入;引入人工势场法目标吸引和障碍排斥的思想建立奖惩函数,对虚拟机器人行为进行评价,提高避障成功率;针对人工势场法范围斥力影响最短路径规划的问题,提出了一种方向惩罚避障函数设置方法,将障碍物范围惩罚转换为单一方向惩罚,通过建立虚拟机器人运动碰撞模型,分析碰撞结果选择性给予方向惩罚,进一步优化了规划路径长度,提高采摘效率;在Unity内搭建仿真环境,使用ML-Agents组件建立分布式近端策略优化算法及其与仿真环境的交互通信,对虚拟机器人进行采摘训练。仿真实验结果显示,不同位置障碍物设置情况下虚拟机器人完成采摘任务成功率达96.7%以上。在200次随机采摘实验中,方向惩罚避障函数方法采摘成功率为97.5%,比普通奖励函数方法提高了11个百分点,采摘轨迹规划平均耗时0.64s/次,相较于基于人工势场法奖励函数方法降低了0.45s/次,且在连续变动任务实验中具有更高的适应性和鲁棒性。研究结果表明,本系统能够高效引导虚拟机器人在避开障碍物的前提下快速到达随机采摘点,满足采摘任务要求,为真实机器人采摘路径规划提供理论与技术支撑。

    Abstract:

    In the field environment, picking robots are faced with the problems of a large number of picking tasks, randomness and uncertainty in the positions of targets and obstacles, and so on. Traditional picking path planning methods usually use kinematics equations combined with the shortest path algorithm to solve them, while takes a lot of time to calculate in each planning. In order to improve the efficiency of trajectory planning to adapt to the field picking environment, a virtual robot picking path planning method based on deep reinforcement learning was proposed. Firstly, the virtual robot random action strategies were set according to the real robot physical structure, and the environment observation set was rationally set as the input of the network by analyzing the actual picking behavior. Establishing reward function with the reference to the idea of target attraction and obstacle rejection in artificial potential field method, which was used to evaluate the behavior of virtual robots and improve the success rate of obstacle avoidance. Aiming at the problem that the range repulsion of the artificial potential field method affected the shortest path planning, a directional penalty obstacle avoidance function setting method was proposed, which converted the obstacle range penalty into a single direction penalty. Besides, by establishing a virtual robot motion collision model, the direction penalties were giving selectively by analysis results of the model. Finally, a simulation environment in Unity was built, and the distributed proximal policy optimization algorithm was used to train the virtual robot. The simulation experiment results showed that the success rate of the virtual robot in completing the picking task was over 96.7% under the condition of obstacles in different positions. In 200 random picking experiments, the directional penalty obstacle avoidance function method had a picking success rate of 97.5%, which was 11 percentage points higher than the ordinary reward function method, and the picking trajectory planning took an average of 0.64s/time, which was 0.45s/time shorter than the artificial potential field method. The research results showed that the system can efficiently guide virtual robots to quickly reach random picking points under the premise of avoiding obstacles, and met the requirements of picking tasks, which provided theoretical and technical support for real robot picking path planning.

    参考文献
    相似文献
    引证文献
引用本文

熊俊涛,李中行,陈淑绵,郑镇辉.基于深度强化学习的虚拟机器人采摘路径避障规划[J].农业机械学报,2020,51(s2):1-10. XIONG Juntao, LI Zhonghang, CHEN Shumian, ZHENG Zhenhui. Obstacle Avoidance Planning of Virtual Robot Picking Path Based on Deep Reinforcement Learning[J]. Transactions of the Chinese Society for Agricultural Machinery,2020,51(s2):1-10.

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2020-08-05
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2020-12-10
  • 出版日期: 2020-12-10