爱可可AI前沿推介(9.19)

LG - 机器学习 CV - 计算机视觉 CL - 计算与语言 AS - 音频与语音 RO - 机器人

转自爱可可爱生活

摘要：变分自编码器的几何学视角、掩码自编码器的测试时训练、统一半监督学习基准、机器学习模型预测解释、排列不变性在神经网络线性模式连接中的作用、承受域的单样本迁移学习、基于Conformer自动语音识别的自注意力头多样性分析、次二次Kronecker回归及其张量分解应用、基于Explaining away的结构化生成模型摊销推理

1、[LG] A Geometric Perspective on Variational Autoencoders

C Chadebec, S Allassonnière
[INRIA]
变分自编码器的几何学视角。本文从完全几何学的角度，介绍了对变分自编码器框架的新解释。本文认为，vanilla VAE模型在其潜空间中自然地揭示了黎曼结构，考虑到这些几何方面可以导致更好的内插和改进的生成程序。这个新提出的抽样方法包括从学到的黎曼潜空间的内在统一分布中抽样，使用该方案可以使vanilla VAE具有竞争力，甚至在几个基准数据集上比更先进的版本更好。由于生成模型对训练样本的数量很敏感，本文还强调了该方法在低数据场景的稳健性。

This paper introduces a new interpretation of the Variational Autoencoder framework by taking a fully geometric point of view. We argue that vanilla VAE models unveil naturally a Riemannian structure in their latent space and that taking into consideration those geometrical aspects can lead to better interpolations and an improved generation procedure. This new proposed sampling method consists in sampling from the uniform distribution deriving intrinsically from the learned Riemannian latent space and we show that using this scheme can make a vanilla VAE competitive and even better than more advanced versions on several benchmark datasets. Since generative models are known to be sensitive to the number of training samples we also stress the method’s robustness in the low data regime.

https://arxiv.org/abs/2209.07370

2、[CV] Test-Time Training with Masked Autoencoders

Y Gandelsman, Y Sun, X Chen, A A. Efros
[UC Berkeley & Meta AI]
掩码自编码器的测试时训练。测试时训练通过用自监督对每个测试输入的模型进行优化，来自适应新的测试分布。本文使用掩码自编码器来解决这个单样本学习问题。从经验上看，所提出的简单方法在许多视觉基准上提高了分布迁移的泛化性。在理论上，本文用通过-方差权衡来描述此改进。

Test-time training adapts to a new test distribution on the fly by optimizing a model for each test input using self-supervision. In this paper, we use masked autoencoders for this one-sample learning problem. Empirically, our simple method improves generalization on many visual benchmarks for distribution shifts. Theoretically, we characterize this improvement in terms of the bias-variance trade-off.

https://arxiv.org/abs/2209.07522

3、[LG] USB: A Unified Semi-supervised Learning Benchmark

Y Wang, H Chen, Y Fan, W Sun...
[Tokyo Institute of Technology & CMU & Max-Planck-Institut für Informatik & ...]
USB：统一半监督学习基准。半监督学习(SSL)通过利用大量未标记数据来增加有限标记样本，以提高模型的通用性。然而，目前，流行的半监督学习评估协议通常被限制在计算机视觉(CV)任务上。此外，之前的工作通常从头开始训练深度神经网络，很耗时，而且对环境不友好。为了解决上述问题，本文从计算机视觉、自然语言处理(NLP)和音频处理(Audio)中选择了15个不同的、具有挑战性的和全面的任务，构建了一个统一的SSL基准(USB)，在这些任务上系统地评估了主流SSL方法，还开源了一个模块化和可扩展的代码库，以便对这些SSL方法进行公平评估。本文进一步为CV任务提供最先进的神经模型的预训练版本，以使进一步微调的成本可以承受。USB使单一的SSL算法可以在多个域的更多任务上进行评估，但成本更低。具体来说，在单个NVIDIA V100上，对USB中的15个任务进行FixMatch评估只需要37个GPU天，而使用典型协议对5个CV任务进行评估则需要335个GPU天（除ImageNet外的4个CV数据集需要279个GPU天）。

Semi-supervised learning (SSL) improves model generalization by leveraging massive unlabeled data to augment limited labeled samples. However, currently, popular SSL evaluation protocols are often constrained to computer vision (CV) tasks. In addition, previous work typically trains deep neural networks from scratch, which is time-consuming and environmentally unfriendly. To address the above issues, we construct a Unified SSL Benchmark (USB) by selecting 15 diverse, challenging, and comprehensive tasks from CV, natural language processing (NLP), and audio processing (Audio), on which we systematically evaluate dominant SSL methods, and also open-source a modular and extensible codebase for fair evaluation on these SSL methods. We further provide pre-trained versions of the state-of-the-art neural models for CV tasks to make the cost affordable for further tuning. USB enables the evaluation of a single SSL algorithm on more tasks from multiple domains but with less cost. Specifically, on a single NVIDIA V100, only 37 GPU days are required to evaluate FixMatch on 15 tasks in USB while 335 GPU days (279 GPU days on 4 CV datasets except for ImageNet) are needed on 5 CV tasks with the typical protocol.

https://arxiv.org/abs/2208.07204

4、[LG] Explaining Predictions from Machine Learning Models: Algorithms, Users, and Pedagogy

A Lucic
[Universiteit van Amsterdam]
机器学习模型预测解释：算法、用户与教育学。由于算法预测对人类的影响越来越大，模型的可解释性已经成为机器学习(ML)的一个重要问题。解释不仅可以帮助用户理解为什么ML模型会做出某些预测，还可以帮助用户理解如何改变这些预测。本文从算法、用户和教育学这三个有利的角度来研究ML模型的可解释性，并对可解释性问题提出了几个新的解决方案。

Model explainability has become an important problem in machine learning (ML) due to the increased effect that algorithmic predictions have on humans. Explanations can help users understand not only why ML models make certain predictions, but also how these predictions can be changed. In this thesis, we examine the explainability of ML models from three vantage points: algorithms, users, and pedagogy, and contribute several novel solutions to the explainability problem.

https://arxiv.org/abs/2209.05084

5、[LG] The Role of Permutation Invariance in Linear Mode Connectivity of Neural Networks

R Entezari, H Sedghi, O Saukh, B Neyshabur
[TU Graz & Google Research]
排列不变性在神经网络线性模式连接中的作用。本文猜想，如果考虑到神经网络的排列不变性，SGD求解方案在其之间的线性插值将可能没有障碍。虽然这是一个大胆的猜想，但本文展示了大量的经验性尝试是如何没有反驳它的。本文进一步提供一个初步的理论结果来支持该猜想。所提猜想对彩票假说、分布式训练和集成方法都有影响。

In this paper, we conjecture that if the permutation invariance of neural networks is taken into account, SGD solutions will likely have no barrier in the linear interpolation between them. Although it is a bold conjecture, we show how extensive empirical attempts fall short of refuting it. We further provide a preliminary theoretical result to support our conjecture. Our conjecture has implications for lottery ticket hypothesis, distributed training and ensemble methods.

https://arxiv.org/abs/2110.06296

另外几篇值得关注的论文：

[CV] One-Shot Transfer of Affordance Regions? AffCorrs!

AffCorrs：承受域的单样本迁移学习
D Hadjivelichkov, S Zwane, M Deisenroth, L Agapito, D Kanoulas
[University College London]
https://arxiv.org/abs/2209.07147

[CL] Analysis of Self-Attention Head Diversity for Conformer-based Automatic Speech Recognition

基于Conformer自动语音识别的自注意力头多样性分析
K Audhkhasi, Y Huang, B Ramabhadran, P J. Moreno
[Google]
https://arxiv.org/abs/2209.06096

[LG] Subquadratic Kronecker Regression with Applications to Tensor Decomposition

次二次Kronecker回归及其张量分解应用
M Fahrbach, T Fu, M Ghadiri
[Google Research]
https://arxiv.org/abs/2209.04876

[LG] Amortised Inference in Structured Generative Models with Explaining Away

基于Explaining away的结构化生成模型摊销推理
C Yu, H Soulat, N Burgess, M Sahani
[UCL] https://arxiv.org/abs/2209.05212

内容中包含的图片若涉及版权问题，请及时与我们联系删除