简介: 分层强化算是强化学习领域比较流行的研究方向,每年顶会论文中都有一定比例的分层论文。分层主要解决的是稀疏reward的问题,实际的强化问题往往reward很稀疏,再加上庞大的状态空间和动作空间组合,导致直接硬训往往训不出来,遇到头铁的agent更是如此。我们人类在解决一个复杂问题时,往往会将其分解为若干个容易解决的子问题,分而治之,分层的思想正是来源于此。个人理解目前分层的解决手段大体分两种,一种是基于目标的(goal-reach),主要做法是选取一定的goal,使agent向着这些goal训练,可以预见这种方法的难点就是如何选取合适的goal;另一种方式是多级控制(multi-level control),做法是抽象出不同级别的控制层,上层控制下层,这些抽象层在不同的文章中可能叫法不同,如常见的option、skill、macro action等,这种方式换一种说法也可以叫做时序抽象(temporal abstraction)

包含:Feudal、HAM、MAXQ、Options、Option-Critic、A2OC、H-DRL、h-DQN、FuN、UVFA、HER、HAC、HIRO、Skill Chaining、Information-Constrained Primitives、DIAYN、DADS 1、Feudal-Feudal Reinforcement Learning; 2、HAM-Reinforcement Learning with Hierarchies of Machines; 3、MAXQ-Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition; 4、Options-Between MDPs and semi-MDPs:A framework for temporal abstraction in reinforcement learning; 5、Option-Critic-The Option-Critic Architecture; 6、A2OC-When Waiting is not an Option : Learning Options with a Deliberation Cost; 7、H-DRL-A Deep Hierarchical Approach to Lifelong Learning in Minecraft; 8、h-DQN-Hierarchical Deep Reinforcement Learning : Integrating Temporal Abstraction andIntrinsic Motivation; 9、FuN-FeUdal Networks for Hierarchical Reinforcement Learning; 10、UVFA-Universal Value Function Approximators; 11、HER-Hindsight Experience Replay; 12、HAC-Learning Multi-Level Hierachies with Hindsight; 13、HIRO-Data-Efficient Hierarchical Reinforcement Learning; 14、Skill Chaining-Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining Option Discovery Using Deep Skill Chaining; 15、Information-Constrained Primitives-Reinforcement Learning with Competitive Ensembles of Information-Constrained Primitives 16、DIAYN-Diversity Is All You Need: Learning Skills Without A Reward Function; 17、DADS-Dynamics-Aware Unsupervised Discovery of Skills。 论文原文链接

内容中包含的图片若涉及版权问题,请及时与我们联系删除