清华大学基础模型研究中心将于8月25日下午15:00举办系列讲座第26期,英伟达的研究科学家:Han Cai(蔡涵)将以"Post-Training Model Acceleration for Large Foundation Model" 为题作报告。


本次讲座对公众开放,欢迎校内外人士参加并共同探讨基础模型领域的前沿话题,校外人士请自行解决入校报备事宜。


报告时间:2025年8月25日(星期一)15:00-16:00

报告地点:清华大学FIT楼1-315

主 讲 人:Han Cai(蔡涵)

主 持 人:黄民烈(清华大学计算机系教授)


报告信息


报告题目 

Post-Training Model Acceleration for Large Foundation Model


报告摘要

Large foundation models, such as large language models and diffusion models, have demonstrated remarkable capabilities but impose immense computational and memory demands, hindering their broad deployment. In this talk, I will present post-training model acceleration techniques that significantly speed up these models without sacrificing quality. I will introduce Jet-Nemotron, a new family of hybrid-architecture language models that surpass state-of-the-art open-source full-attention models—including Qwen3, Qwen2.5, Gemma3, and Llama3.2—while delivering dramatic efficiency gains, achieving up to 53.6× faster generation throughput on H100 GPUs with 256K context length and maximum batch size. In addition, I will discuss Deep Compression Autoencoder (DC-AE) and SANA, two methods designed to accelerate diffusion models efficiently. 


Bio

Han Cai(蔡涵)

Research Scientist, NVIDIA

英伟达研究科学家

Han Cai is a Research Scientist at NVIDIA. His research focuses on efficient foundation models, including large language models, diffusion models, visual generation and perception, and AutoML. His work has been published in top venues such as ICLR, ICML, NeurIPS, ICCV, CVPR, and ACL, and has received over 10,000 citations on Google Scholar. He is also a recipient of the Qualcomm Innovation Fellowship.


More info:

https://scholar.google.com/citations?user=x-AvvrYAAAAJ&hl=zh-CN


点击下方 关注我们

内容中包含的图片若涉及版权问题,请及时与我们联系删除