Instant3D: Fast Text-to-3D with Sparse-View Generation and Large Reconstruction Model

2023年11月10日
  • 简介
    最近几年,使用扩散模型的文本到3D方法取得了显著进展。然而,现有的方法要么依赖于基于分数蒸馏的优化,这会导致推理速度慢、多样性低和Janus问题,要么是前馈方法,由于3D训练数据稀缺而生成低质量结果。本文提出了Instant3D,一种新颖的方法,以前馈方式从文本提示中生成高质量和多样化的3D模型。我们采用两阶段方法,首先使用经过微调的2D文本到图像扩散模型一次性生成四个结构化和一致的视图的稀疏集合,然后使用一种新型的基于Transformer的稀疏视图重构器直接回归生成的图像的NeRF。通过广泛的实验,我们证明我们的方法可以在20秒内生成高质量、多样化且没有Janus问题的3D模型,比先前基于优化的方法快两个数量级,后者可能需要1到10个小时。我们的项目网页:https://jiahao.ai/instant3d/。
  • 图表
  • 解决问题
    The paper proposes a novel method for generating high-quality and diverse 3D assets from text prompts in a feed-forward manner, aiming to solve the slow inference, low diversity and Janus problems existing in previous methods.
  • 关键思路
    The paper adopts a two-stage paradigm, which first generates a sparse set of four structured and consistent views from text in one shot with a fine-tuned 2D text-to-image diffusion model, and then directly regresses the NeRF from the generated images with a novel transformer-based sparse-view reconstructor.
  • 其它亮点
    The proposed method, called Instant3D, can generate high-quality, diverse and Janus-free 3D assets within 20 seconds, which is two orders of magnitude faster than previous optimization-based methods. The experiments show that Instant3D outperforms existing methods in terms of both quality and diversity of generated 3D assets. The paper also provides an open-source implementation of the proposed method and a project webpage for further research.
  • 相关研究
    Recent related works include Score-Based Generative Models for Text-to-3D and Feedforward 3D Shape Generation, among others.
PDF
原文
点赞 收藏 评论 分享到Link

沙发等你来抢

去评论