- 简介多模态人工智能系统很可能会成为我们日常生活中无处不在的存在。使这些系统更具互动性的一种有前途的方法是将它们作为物理和虚拟环境中的代理体来体现。目前,系统利用现有的基础模型作为创建具有体现性代理体的基本构建块。将代理体嵌入这样的环境中有助于模型处理和解释视觉和上下文数据的能力,这对于创建更复杂和上下文感知的人工智能系统至关重要。例如,一个能够感知用户动作、人类行为、环境对象、音频表达和场景的集体情感的系统可以用于告知和指导给定环境内代理体的响应。为了加速基于代理的多模态智能研究,我们将“代理 AI”定义为一类交互式系统,它们可以感知视觉刺激、语言输入和其他环境相关数据,并能产生有意义的体现动作。特别是,我们探讨的系统旨在通过整合外部知识、多感官输入和人类反馈来改进基于下一体现动作预测的代理体。我们认为,通过在有基础的环境中开发代理 AI 系统,还可以缓解大型基础模型的幻觉和其生成环境不正确输出的倾向。代理 AI 这一新兴领域包含了更广泛的具体化和代理方面的多模态交互。除了代理体在物理世界中行动和交互外,我们预见未来人们可以轻松地创建任何虚拟现实或模拟场景,并与嵌入其中的代理体进行交互。
- 图表
- 解决问题Agent AI: Multimodal Embodied Agents
- 关键思路Developing agentic AI systems in grounded environments to improve agents based on next-embodied action prediction by incorporating external knowledge, multi-sensory inputs, and human feedback.
- 其它亮点The paper defines Agent AI as a class of interactive systems that can perceive visual stimuli, language inputs, and other environmentally-grounded data, and can produce meaningful embodied actions. It explores systems that aim to improve agents based on next-embodied action prediction by incorporating external knowledge, multi-sensory inputs, and human feedback. The paper argues that developing agentic AI systems in grounded environments can mitigate the hallucinations of large foundation models and their tendency to generate environmentally incorrect outputs.
- Related studies include broader embodied and agentic aspects of multimodal interactions, as well as the future potential of people creating virtual reality or simulated scenes and interacting with agents embodied within the virtual environment.
沙发等你来抢
去评论
评论
沙发等你来抢