- 简介最近机器人技能学习的进展已经开启了构建任务不可知技能库的潜力,从而实现多个简单操作基元(也称技能)的无缝序列化,以解决更为复杂的任务。然而,确定独立学习技能的最佳序列仍然是一个开放的问题,特别是当目标仅以最终几何配置而非符号目标的形式给出时。为了解决这个挑战,我们提出了逻辑技能编程(LSP),这是一种基于优化的方法,用于对长期任务进行独立学习技能的序列化。我们制定了一个数学程序的一阶扩展,以优化计划中所有技能的总体累计奖励,这些技能由价值函数的总和所抽象。为了解决这样的问题,我们利用张量列车分解来构建价值函数空间,并依靠符号搜索和技能价值优化之间的交替来找到适当的技能框架和最优的子目标序列。实验结果表明,与最先进的强化学习方法相比,所得到的价值函数提供了累计奖励的更优近似。此外,我们在三个操作领域验证了LSP,涵盖了抓握和非抓握基元。结果表明,它能够识别出完整逻辑和几何路径上的最优解。在真实机器人实验中,我们展示了我们的方法应对接触不确定性和外部干扰的有效性。
- 图表
- 解决问题LSP: Logic-Skill Programming for Task-Agnostic Skill Sequencing
- 关键思路LSP is an optimization-based approach that sequences independently learned skills to solve long-horizon tasks by formulating a first-order extension of a mathematical program to optimize the overall cumulative reward of all skills within a plan.
- 其它亮点LSP leverages the use of tensor train factorization to construct the value function space, and relies on alternations between symbolic search and skill value optimization to find the appropriate skill skeleton and optimal subgoal sequence. Experimental results indicate that the obtained value functions provide a superior approximation of cumulative rewards compared to state-of-the-art reinforcement learning methods. LSP is validated in three manipulation domains, encompassing both prehensile and non-prehensile primitives. The real-robot experiments showcase the effectiveness of LSP to cope with contact uncertainty and external disturbances in the real world.
- Related work includes recent advances in robot skill learning and reinforcement learning methods, such as Deep Q-Networks (DQN) and Trust Region Policy Optimization (TRPO).
沙发等你来抢
去评论
评论
沙发等你来抢