CMU三年级博士生David Chuan-En Lin的总结在Reddit Machine Learning热议。原文发表在Medium上。可以看到也是大模型以及AIGC占了很大比例。他曾经在Runway实习过,也算是比较一线的。
- Hierarchical Text-Conditional Image Generation with CLIP Latents (DALL-E 2)
- High-Resolution Image Synthesis with Latent Diffusion Models (Stable Diffusion)
- LAION-5B: An Open Large-Scale Dataset for Training Next Generation Image-Text Models
- An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion
- DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation
- Make-A-Video: Text-to-Video Generation without Text-Video Data
- FILM: Frame Interpolation for Large Motion
- YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors
- A ConvNet for the 2020s
- A Generalist Agent (Gato)
- MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge
- Human-level Play in the Game of Diplomacy by Combining Language Models with Strategic Reasoning (Cicero)
- Training Language Models to Follow Instructions with Human Feedback (InstructGPT and ChatGPT)
- LaMDA: Language Models for Dialog Applications
- Robust Speech Recognition via Large-Scale Weak Supervision (Whisper)
- Galactica: A Large Language Model for Science
- Instant Neural Graphics Primitives with a Multiresolution Hash Encoding
- Block-NeRF: Scalable Large Scene Neural View Synthesis
- DreamFusion: Text-to-3D using 2D Diffusion
- Point-E: A System for Generating 3D Point Clouds from Complex Prompts
评论
沙发等你来抢