第五十七期

报告人:Shuchen Li, Yale University

时间:4月21日(星期二)3:00pm

地点:静园五院204

Host:疏彦凯  图灵班2023级


报告信息

Title

Learning Mixture Models via Efficient High-dimensional Sparse Fourier Transforms


Abstract

In this work, we give a poly(d,k) time and sample algorithm for efficiently learning the parameters of a mixture of k spherical distributions in d dimensions. Unlike all previous methods, our techniques apply to heavy-tailed distributions and include examples that do not even have finite covariances. Our method succeeds whenever the cluster distributions have a characteristic function with sufficiently heavy tails. Such distributions include the Laplace distribution but crucially exclude Gaussians.


All previous methods for learning mixture models relied implicitly or explicitly on the low-degree moments. Even for the case of Laplace distributions, we prove that any such algorithm must use super-polynomially many samples. Our method thus adds to the short list of techniques that bypass the limitations of the method of moments.


Somewhat surprisingly, our algorithm does not require any minimum separation between the cluster means. This is in stark contrast to spherical Gaussian mixtures where a minimum ℓ2-separation is provably necessary even information-theoretically [Regev and Vijayaraghavan '17]. Our methods compose well with existing techniques and allow obtaining ''best of both worlds" guarantees for mixtures where every component either has a heavy-tailed characteristic function or has a sub-Gaussian tail with a light-tailed characteristic function.


Our algorithm is based on a new approach to learning mixture models via efficient high-dimensional sparse Fourier transforms. We believe that this method will find more applications to statistical estimation. As an example, we give an algorithm for consistent robust mean estimation against noise-oblivious adversaries, a model practically motivated by the literature on multiple hypothesis testing. It was formally proposed in a recent Master's thesis by one of the authors, and has already inspired follow-up works.


Based on joint work with Alkis Kalavasis, Pravesh Kothari, and Manolis Zampetakis.


Biography


Shuchen Li is a second-year Ph.D. student in Computer Science at Yale University, advised by Manolis Zampetakis and Ilias Zadik. Before Yale, he received his B.S. in Computer Science from the Turing class, Peking University, and his M.S. in Computer Science from Carnegie Mellon University. His research interests lie broadly in the intersection of high-dimensional statistics, theoretical machine learning, and computational complexity. Recently, he is thinking about problems in algorithmic statistics, learning theory, and computational-statistical trade-offs.


about CS Peer Talk

作为活动的发起人,我们来自北京大学图灵班科研活动委员会,主要由图灵班各年级同学组成。我们希望搭建一个CS同学交流的平台,促进同学间的交流合作,帮助同学练习展示,同时增进友谊。


目前在计划中的系列包括但不限于:

  • 教程系列学生讲者为主,介绍自己的研究领域

  • 研究系列学生讲者为主,介绍自己的研究成果

  • 客座系列邀请老师做主题报告


除非报告人特别要求,报告默认是非公开的,希望营造一个自由放松但又互相激励的交流氛围。


主讲嘉宾招募

如果你愿意和大家分享你的学术成果、经历经验,总结回顾、触发新思,欢迎报名自荐


主讲人报名:发邮件至 cs_research_tc@163.com,并抄送 cfcs@pku.edu.cn,写明想讲的题目、内容及时间。



北京大学图灵班科研活动委员会



—   版权声明  —

本微信公众号所有内容,由北京大学前沿计算研究中心微信自身创作、收集的文字、图片和音视频资料,版权属北京大学前沿计算研究中心微信所有;从公开渠道收集、整理及授权转载的文字、图片和音视频资料,版权属原作者。本公众号内容原作者如不愿意在本号刊登内容,请及时通知本号,予以删除。

内容中包含的图片若涉及版权问题,请及时与我们联系删除