Abstract:Picking trajectory planning is one of the important research aspects of apple picking manipulator. The unstructured natural environment leads to low training efficiency for picking trajectory planning based on deep reinforcement learning. A deep deterministic policy gradient algorithm (DDPG) based on a stepwise migration strategy was proposed for apple picking trajectory planning. Firstly, a progressive spatially constrained stepwise training strategy based on DDPG was put forward to solve the problem of hard converging in natural environments. Secondly, the transfer learning idea was utilized to migrate the strategies obtained from the obstaclefree scenario to the simple obstacle scenario, from the simple obstacle scenario to the hybrid obstacle scenario, to accelerate the training process in an obstacle scenario from prior strategies and guide the obstacle avoidance trajectory planning of the apple picking manipulator. Finally, the simulation experiments on the multidegreeoffreedom apple picking manipulator for picking trajectory planning were carried out, and the results showed that the stepwise migration strategy can improve the training efficiency and network performance of the DDPG algorithm. It was validated that the trajectory planning method for apple picking manipulator based on stepwise migration strategy was feasible.