斯坦福大学 | 基于人体运动的场景合成

来自今天的爱可可AI前沿推介

[GR] Scene Synthesis from Human Motion

S Ye, Y Wang, J Li, D Park, C. K Liu, H Xu, J Wu
[Stanford University]

基于人体运动的场景合成

要点:

提出SUMMON，一种基于人体运动合成语义合理、物理合理且多样化的场景的新框架；
作为SUMMON一部分的联系预测模块ContactFormer，通过对语义标签的时间一致性进行建模而优于现有方法；
证明了SUMMON合成的场景在可行性、合理性和多样性方面均优于现有方法，有潜力为社区生成广泛的人-场景交互数据。

一句话总结:
提出SUMMON，一种基于人体运动合成语义合理、物理合理且多样化的场景的新框架，提出联系预测模块ContactForme，与现有方法相比有显著优势。

摘要：
尽管大规模捕捉具有多样化、复杂场景的人体运动有很大用处，但通常被认为是过于昂贵的。与此同时，仅人体运动本身就包含了关于他们所居住的场景和相互作用的丰富信息。例如，坐着的人表明了椅子的存在，他们腿的位置进一步暗示了椅子的姿态。本文建议基于人体运动合成多样、语义合理和物理合理的场景。该框架“基于人体运动的场景合成(SUMMON)”包括两个步骤。首先，用新提出的联系预测器 ContactFormer 从人体运动中获取时间一致的联系标签。基于这些预测，SUMMON选择相互作用的物体并优化物理合理性损失；还用不与人体相互作用的物体填充场景。实验结果表明，SUMMON合成了可行、合理和多样的场景，并有可能为社区生成广泛的人-场景交互数据。

Large-scale capture of human motion with diverse, complex scenes, while immensely useful, is often considered prohibitively costly. Meanwhile, human motion alone contains rich information about the scene they reside in and interact with. For example, a sitting human suggests the existence of a chair, and their leg position further implies the chair's pose. In this paper, we propose to synthesize diverse, semantically reasonable, and physically plausible scenes based on human motion. Our framework, Scene Synthesis from HUMan MotiON (SUMMON), includes two steps. It first uses ContactFormer, our newly introduced contact predictor, to obtain temporally consistent contact labels from human motion. Based on these predictions, SUMMON then chooses interacting objects and optimizes physical plausibility losses; it further populates the scene with objects that do not interact with humans. Experimental results demonstrate that SUMMON synthesizes feasible, plausible, and diverse scenes and has the potential to generate extensive human-scene interaction data for the community.

论文链接：https://arxiv.org/abs/2301.01424

内容中包含的图片若涉及版权问题，请及时与我们联系删除

斯坦福大学 | 基于人体运动的场景合成

[GR] Scene Synthesis from Human Motion

评论列表

评论