Story-to-Motion: Synthesizing Infinite and Controllable Character Animation from Long Text

向作者提问

NEW

简介

从故事中生成自然的人类动作具有改变动画、游戏和电影产业格局的潜力。当角色需要根据长篇文本描述移动到不同位置并执行特定动作时，就出现了一个新的具有挑战性的任务——Story-to-Motion。这个任务需要融合低级控制（轨迹）和高级控制（动作语义）。以往在角色控制和文本到动作方面的研究已经涉及相关方面，但是全面的解决方案仍然难以实现：角色控制方法无法处理文本描述，而文本到动作的方法缺乏位置约束并且经常产生不稳定的动作。鉴于这些限制，我们提出了一个新颖的系统，可以生成可控的、无限长的动作和与输入文本对齐的轨迹。（1）我们利用当代的大型语言模型作为文本驱动的动作调度器，从长篇文本中提取一系列（文本、位置、持续时间）三元组。（2）我们开发了一种文本驱动的动作检索方案，将动作匹配与动作语义和轨迹约束相结合。（3）我们设计了一个渐进式遮罩变换器，解决了过渡动作中常见的不自然姿势和足部滑动等问题。除了作为Story-to-Motion的第一个全面解决方案的开创性角色外，我们的系统还在轨迹跟随、时间动作组合和动作混合等三个不同的子任务上进行了评估，在各方面都优于以往最先进的动作合成方法。主页：https://story2motion.github.io/.
作者讲解

目前尚无作者解读视频，你可点击下方【许愿开讲】按钮，许愿作者开讲~
图表
解决问题

The paper aims to solve the problem of generating natural human motion from a story, which requires a fusion of low-level control (trajectories) and high-level control (motion semantics). This is a new and challenging task in the animation, gaming, and film industries.
关键思路

The key idea of the paper is to propose a novel system that generates controllable, infinitely long motions and trajectories aligned with the input text. This is achieved through leveraging contemporary Large Language Models to act as a text-driven motion scheduler, developing a text-driven motion retrieval scheme that incorporates motion matching with motion semantic and trajectory constraints, and designing a progressive mask transformer that addresses common artifacts in the transition motion such as unnatural pose and foot sliding.
其它亮点

The paper's system undergoes evaluation across three distinct sub-tasks: trajectory following, temporal action composition, and motion blending, where it outperforms previous state-of-the-art motion synthesis methods across the board. The paper also provides a comprehensive solution for Story-to-Motion, which was previously elusive. The paper's homepage provides access to datasets and code. The proposed system has the potential to transform the landscape of animation, gaming, and film industries.
相关研究

Recent related work in this field includes 'Text2Gestures: A Transformer-Based Framework for Generating Gestures from Text' and 'Text2Action: Generative Adversarial Synthesis from Language to Action'.

许愿开讲

PDF

原文

点赞收藏

向作者提问

NEW

分享到Link

提问交流

提交问题，平台邀请作者，轻松获得权威解答～

向作者提问