报告主题:Composing Global Optimizers to Reasoning Tasks via Algebraic Objects in Neural Nets

报告日期:10月30日(周三)10:30-11:30

报告要点:

We prove rich algebraic structures of the solution space for 2-layer neural networks with quadratic activation and L2 loss, trained on reasoning tasks in Abelian group (e.g., modular addition). Such a rich structure enables analytical construction of global optimal solutions from partial solutions that only satisfy part of the loss, despite its high nonlinearity. We coin the framework as CoGO (Composing Global Optimizers). Specifically, we show that the weight space over different numbers of hidden nodes of the 2-layer network is equipped with a semi-ring algebraic structure, and the loss function to be optimized consists of monomial potentials, which are ring homomorphism, allowing partial solutions to be composed into global ones by ring addition and multiplication. Our experiments show that around 95% of the solutions obtained by gradient descent match exactly our theoretical constructions. Although the global optimizers constructed only required a small number of hidden nodes, our analysis on gradient dynamics shows that over-parameterization asymptotically decouples training dynamics and is beneficial. We further show that training dynamics favors simpler solutions under weight decay, and thus high-order global optimizers such as perfect memorization are unfavorable.

报告嘉宾:

田渊栋是Meta AI研究(FAIR)研究科学家,负责大型语言模型(LLMs)推理、规划和决策小组。他是OpenGo项目的项目负责人,该项目使用单个GPU在推理期间曾击败专业玩家。他是Streaming LLM和GaLore的主要导师,这些项目改进了LLMs的训练和推理,他是2021年ICML杰出论文荣誉提名Direct Pred和2013年ICCV Marr Prize荣誉提名Hierarchical Data Driven Descent的第一作者,曾获得2022年CGO杰出论文奖Compiler Gym。在此之前,他于2013-2014年在谷歌自动驾驶汽车团队工作,并于2013年获得卡内基梅隆大学机器人研究所博士学位。他曾被任命为NeurIPS、ICML、AAAI、CVPR和AIStats的主席。

扫码报名


更多热门报告

内容中包含的图片若涉及版权问题,请及时与我们联系删除