CMU三年级博士生David Chuan-En Lin的总结在Reddit Machine Learning热议原文发表在Medium上。可以看到也是大模型以及AIGC占了很大比例。他曾经在Runway实习过,也算是比较一线的。

  1. Hierarchical Text-Conditional Image Generation with CLIP Latents (DALL-E 2)
  2. High-Resolution Image Synthesis with Latent Diffusion Models (Stable Diffusion)
  3. LAION-5B: An Open Large-Scale Dataset for Training Next Generation Image-Text Models
  4. An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion
  5. DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation
  6. Make-A-Video: Text-to-Video Generation without Text-Video Data
  7. FILM: Frame Interpolation for Large Motion
  8. YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors
  9. A ConvNet for the 2020s
  10. A Generalist Agent (Gato)
  11. MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge
  12. Human-level Play in the Game of Diplomacy by Combining Language Models with Strategic Reasoning (Cicero)
  13. Training Language Models to Follow Instructions with Human Feedback (InstructGPT and ChatGPT)
  14. LaMDA: Language Models for Dialog Applications
  15. Robust Speech Recognition via Large-Scale Weak Supervision (Whisper)
  16. Galactica: A Large Language Model for Science
  17. Instant Neural Graphics Primitives with a Multiresolution Hash Encoding
  18. Block-NeRF: Scalable Large Scene Neural View Synthesis
  19. DreamFusion: Text-to-3D using 2D Diffusion
  20. Point-E: A System for Generating 3D Point Clouds from Complex Prompts