来自今天的爱可可AI前沿推介
[RO] Emergence of Maps in the Memories of Blind Navigation Agents
E Wijmans, M Savva, I Essa, S Lee, A S. Morcos, D Batra
[Georgia Institute of Technology & Meta AI & Oregon State University]
盲目导航智能体记忆中地图的涌现
要点:
-
盲目AI导航智能体可以只使用自我运动感应,来有效地导航新环境,成功率达到95%,最佳路径效率达到62.9%; -
记忆是使这些智能体取得强大导航性能的机制。无记忆的智能体在任务中失败,而有记忆的智能体则使用储存的广大的时间和空间范围内的信息; -
盲目智能体仅通过学习导航来学习建立和使用其环境的隐性地图状表征。将偶发记忆从一个智能体移植到另一个智能体身上会带来更好的导航,因为植入记忆的探针智能体会走捷径。 -
新出现的地图是任务依赖性和选择性的,只保留与导航目标相关的环境特征。这就解释了为什么移植记忆会导致走捷径,因为偏离和绕道的情况会被遗忘。
一句话总结:
盲目AI导航智能体可以通过强化学习只用自我运动感应和记忆来有效地进行导航,其表征会涌现隐性地图,证明地图可能是智能具身智能体导航的一个自然解决方案。
摘要:
动物导航研究表明,生物体会建立并维持其环境的内在空间表征,或地图。本文要问的是,机器——特别是人工智能(AI)导航智能体——是否也会建立隐性(或“心理")地图。对这个问题的肯定回答将 (a) 解释最近文献中令人惊讶的现象,即表面上没有地图的神经网络取得了强大的性能,以及 (b) 加强映射作为智能具身智能体导航的基本机制的证据,不管是生物还是人工。与动物导航不同的是,可以明智地设计智能体的感知系统,并控制学习范式,使其他导航机制失效。本文训练"盲目”智能体——感知仅限于自我运动,没有其他任何类型的感知——通过强化学习进行 PointGoal 导航(”去到 Δ x, Δ y")。该智能体由导航无关的组件(全连接和递归神经网络)组成,所用实验设置没有提供对映射的归纳偏差。尽管有这些苛刻的条件,结果发现盲目智能体式是 (1) 在新环境中令人惊讶地有效的导航器(~95%的成功率);(2) 在很长的范围内利用记忆(在一轮中记住~1,000步的过往经验)。(3) 这种记忆使它们能够表现出智能行为(跟随墙壁,检测碰撞,走捷径);(4) 盲目智能体在导航时建立的环境表征中出现了地图和碰撞检测神经元;(5) 出现的地图是有选择性的,而且是依赖于任务的(例如,智能体"忘记"了探索性的迂回路线)。总的来说,本文没有提供新的技术,但提出了一个令人惊讶的发现、一个洞察力和一个解释。
Animal navigation research posits that organisms build and maintain internal spatial representations, or maps, of their environment. We ask if machines -- specifically, artificial intelligence (AI) navigation agents -- also build implicit (or 'mental') maps. A positive answer to this question would (a) explain the surprising phenomenon in recent literature of ostensibly map-free neural-networks achieving strong performance, and (b) strengthen the evidence of mapping as a fundamental mechanism for navigation by intelligent embodied agents, whether they be biological or artificial. Unlike animal navigation, we can judiciously design the agent's perceptual system and control the learning paradigm to nullify alternative navigation mechanisms. Specifically, we train 'blind' agents -- with sensing limited to only egomotion and no other sensing of any kind -- to perform PointGoal navigation ('go to Δ x, Δ y') via reinforcement learning. Our agents are composed of navigation-agnostic components (fully-connected and recurrent neural networks), and our experimental setup provides no inductive bias towards mapping. Despite these harsh conditions, we find that blind agents are (1) surprisingly effective navigators in new environments (~95% success); (2) they utilize memory over long horizons (remembering ~1,000 steps of past experience in an episode); (3) this memory enables them to exhibit intelligent behavior (following walls, detecting collisions, taking shortcuts); (4) there is emergence of maps and collision detection neurons in the representations of the environment built by a blind agent as it navigates; and (5) the emergent maps are selective and task dependent (e.g. the agent 'forgets' exploratory detours). Overall, this paper presents no new techniques for the AI audience, but a surprising finding, an insight, and an explanation.
论文链接:https://arxiv.org/abs/2301.13261
内容中包含的图片若涉及版权问题,请及时与我们联系删除
评论
沙发等你来抢