4月25日,ICLR 2022公布杰出论文及提名奖名单,如下:

Analytic-DPM: an Analytic Estimate of the Optimal Reverse Variance in Diffusion Probabilistic Models

By Fan Bao, Chongxuan Li, Jun Zhu, Bo Zhang

Defusion probabilistic model (DPM), a class of powerful generative models, is a rapidly growing topic in machine learning. This paper aims to tackle the inherent limitation of the DPM models, which is the slow and expensive computation of the optimal reverse variance in DPMs. The authors first present a surprising result that both the optimal reverse variance and the corresponding optimal KL divergence of a DPM have analytic forms with respect to its score function. Then they propose Analytic-DPM, a novel and elegant training-free inference framework that estimates the analytic forms of the variance and KL divergence using the Monte Carlo method and a pretrained score-based model. This paper is significant both in terms of its theoretical contribution (showing that both the optimal reverse variance and KL divergence of a DPM have analytic forms) and its practical benefit (presenting a training-free inference applicable to various DPM models), and will likely influence future research on DPMs.

*This paper will be presented in the Oral Session 4 on Probabilistic Models & Vision on Apr 28 8am GMT (1am PST)

Hyperparameter Tuning with Renyi Differential Privacy

By Nicolas Papernot, Thomas Steinke

This paper provides new insights into an important blind spot of most of the prior analyses of the differential privacy of learning algorithms, namely the fact that the learning algorithm is run multiple times over the data in order to tune the hyperparameters. The authors show that there are situations in which part of the data can skew the optimal hyperparameters, henceforth leaking private information. Furthermore, the authors provide privacy guarantees for hyperparameter search procedures within the framework of Renyi Differential Privacy. This is an excellent paper considering the everyday use of learning algorithms and its implications in terms of privacy for society, and proposing ways to address this issue. This work will provide the foundation for many follow-up works on differentially private machine learning algorithms. 

*This paper will be presented in the Oral Session 1 on Learning in the Wild & RL on Apr 26 12am GMT (Apr 25 5pm PST).

Learning Strides in Convolutional Neural Networks

By Rachid Riad, Olivier Teboul, David Grangier, Neil Zeghidour

This paper addresses an important problem that anyone using convolutional networks has faced, namely setting the strides in a principled way as opposed to trials and errors. The authors propose a novel and very clever mathematical formulation for learning strides and demonstrate a practically useful method that achieves state-of-the-art experimental results in comprehensive benchmarks. The main idea is DiffStride, the first downsampling layer with learnable strides that allows one to learn the size of a cropping mask in the Fourier domain, effectively performing resizing in a way that is amenable to differentiable programming. This is an excellent paper that proposes a method that will likely be part of commonly used tool boxes as well as courses on deep learning.

*This paper will be presented in the Oral Session 2 on Understanding Deep Learning on Apr 26 8am GMT (1am PST).

Expressiveness and Approximation Properties of Graph Neural Networks

By Floris Geerts, Juan L Reutter

This elegant theoretical paper shows how questions regarding the expressiveness and separability of different graph neural networks GNN architectures can be reduced to (and sometimes substantially simplified by) examining their computations in tensor language, where these questions connect to well-known combinatorial notions such as the treewidth. In particular, this paper provides an elegant way to easily obtain bounds on the separation power of GNNs in terms of the Weisfeiler-Leman (WL) tests, which have become the yardstick to measure the separation power of GNNs. The proposed framework also has implications for studying approximability of functions through GNNs. This paper has the potential to make a significant impact for future research by providing a general framework for describing, comparing and analyzing GNN architectures. In addition, this paper provides a toolbox with which GNN architecture designers can analyze the separation power of their GNNs, without needing to know the intricacies of the WL-tests.

*This paper will be presented in the Oral Session 2 on Understanding Deep Learning on Apr 26 8am GMT (1am PST).

Comparing Distributions by Measuring Differences that Affect Decision Making

By Shengjia Zhao, Abhishek Sinha, Yutong (Kelly) He, Aidan Perreault, Jiaming Song, Stefano Ermon

This paper proposes a new class of discrepancies that can compare two probability distributions based on the optimal loss for a decision task. By suitably choosing the decision task, the proposed method generalizes the Jensen-Shannon divergence and the maximum mean discrepancy family. The authors demonstrate that the proposed approach achieves superior test power compared to competitive baselines on various benchmarks, with compelling use cases for understanding the effects of climate change on different social and economic activities, evaluating sample quality, and selecting features targeting different decision tasks. Not only is the proposed method intellectually elegant, the committee finds that the paper is exceptional for its empirical significance, as the fact that the method allows a user to directly specify their preferences when comparing distributions through the decision loss implies an increased level of interpretability for practitioners. 

*This paper will be presented in the Oral Session 4 on Probabilistic Models & Vision on Apr 28 8am GMT (1am PST).

Neural Collapse Under MSE Loss: Proximity to and Dynamics on the Central Path

By X.Y. Han, Vardan Papyan, David L. Donoho

This paper presents new theoretical insights on the “neural collapse” phenomenon that occurs pervasively in today’s deep net training paradigm. During neural collapse, last-layer features collapse to their class-means, both classifiers and class-means collapse to the same Simplex Equiangular Tight Frame, and classifier behavior collapses to the nearest-class-mean decision rule. Instead of the cross-entropy loss that is mathematically harder to analyze, the paper demonstrates a new decomposition of the mean squared error (MSE) loss in order to analyze each component of the loss under neural collapse, which in turn, leads to a new theoretical construct of “central path”, where the linear classifier stays MSE-optimal for feature activations throughout the dynamics. Finally, by studying renormalized gradient flow along the central path, the authors derive exact dynamics that predict neural collapse. In sum, this paper provides novel and highly inspiring theoretical insights for understanding the empirical training dynamics of deep networks.

*This paper will be presented in the Oral Session 2 on Understanding Deep Learning on Apr 26 8am GMT (1am PST).

Bootstrapped Meta-Learning

By Sebastian Flennerhag, Yannick Schroecker, Tom Zahavy, Hado van Hasselt, David Silver, Satinder Singh

Meta-learning, or learning to learn, has the potential to empower artificial intelligence, yet meta-optimization has been a considerable challenge to unlocking this potential. This paper opens a new direction in meta-learning, beautifully inspired from TD learning, that bootstraps the meta-learner from itself or another update rule. The theoretical analysis is thorough, and the empirical results are compelling, with a new state-of-the art for model-free agents on the Atari ALE benchmark and demonstrate that it yields both performance and efficiency gains in multi-task meta-learning. The committee believes that this paper will inspire a lot of people.

*This paper will be presented in the Oral Session 3 on Meta Learning and Adaptation on Apr 27 4pm GMT (9am PST).

Outstanding Paper Honorable Mentions

Understanding over-squashing and bottlenecks on graphs via curvature

By Jake Topping, Francesco Di Giovanni, Benjamin Paul Chamberlain, Xiaowen Dong, Michael M. Bronstein

Most graph neural networks (GNNs) use the message passing paradigm, which suffers from the “over-squashing” phenomenon, where the distortion of information flowing from distant nodes limits the efficiency of message passing, which in turn, has been heuristically attributed to graph bottlenecks. Drawing insights from discrete differential geometry, this paper provides a precise description of the over-squashing phenomenon in GNNs and analyzes how it arises from bottlenecks in the graph. In particular, the authors introduce a new edge-based combinatorial curvature and prove that negatively curved edges are responsible for the over-squashing issue. Moreover, the authors demonstrate an elegant approach to reducing these negative effects by rewiring the graph according to this curvature notion. The paper has a potential to make considerable impact by importing tools from differential geometry for analyzing GNNs, and the rewiring approach may suggest new directions for improving the empirical performance of GNNs.

*This paper will be presented in the Oral Session 2 on Structured Learning on Apr 26 8am GMT (1am PST).

Efficiently Modeling Long Sequences with Structured State Spaces

By Albert Gu, Karan Goel, Christopher Re

Modeling long sequences is a central challenge in representation learning across various tasks and modalities, and the dominant architecture over the past years has been Transformers. This paper investigates a surprising alternative to Transformers, by proposing the Structured State Space Sequence model (S4). The S4 model is basically a new and clever parameterization of the state space model (SSM), that can address the prohibitive computation and memory requirements of SSM, while maintaining the theoretical strengths of SSM for handling long-range dependencies. S4 demonstrates impressive empirical results on multiple domains including vision, text, and audio. Unlike most work that tries to make Transformers more efficient, this work takes a completely different approach by focusing on the less studied state space models. The technical elegance and empirical strengths of the proposed approach inspire a new research direction.

*This paper will be presented in the Oral Session 2 on Structured Learning on Apr 26 8am GMT (1am PST).

PiCO: Contrastive Label Disambiguation for Partial Label Learning

By Haobo Wang, Ruixuan Xiao, Yixuan (Sharon) Li, Lei Feng, Gang Niu, Gang Chen, Junbo Zhao

This paper studies Partial Label Learning (PLL), an important problem in real-world applications where each training example is labeled with a coarse candidate set due to label ambiguity. This paper aims to reduce the performance gap between PLL and the supervised counterpart, by addressing two key challenges in PLL—representation learning and label disambiguation—in one coherent framework. Specifically, the authors propose PiCO, a new framework that combines contrastive learning with prototype-based label disambiguation. The paper includes an interesting theoretical interpretation to justify their framework from an expectation-maximization (EM) perspective. The empirical results are particularly impressive, as PiCO significantly outperforms current state-of-the-art in PLL and even achieves comparable results to fully supervised learning. 

*This paper will be presented in the Oral Session 1 on Learning in the Wild & RL on Apr 26 12am GMT (Apr 25 5pm PST).

在ICLR大会期间(4月25日至4月29日),以上论文均将举行口头报告。

内容中包含的图片若涉及版权问题,请及时与我们联系删除