爱可可AI前沿推介(11.3)

LG - 机器学习 CV - 计算机视觉 CL - 计算与语言 AS - 音频与语音 RO - 机器人

1、[LG] Generalized Shape Metrics on Neural Representations

A H. Williams, E Kunz, S Kornblith, S W. Linderman

[Stanford University & Google Research]

神经表示的广义形状度量。理解生物和人工网络的运作仍然是一个困难而重要的挑战。为了确定一般原则，研究人员越来越有兴趣调研在类似任务上训练或在生物学上适应的大型网络集合。现在需要一套标准化的分析工具，来确定网络层面的协变量——如结构、解剖学脑区和模型组织——如何影响神经表示(隐层激活)。本文为这些分析提供了一个严格的基础，定义了一个广泛的度量空间族，对表示的不相似性进行量化。利用该框架，本文修改了现有的基于典型相关分析的表示相似性度量，以满足三角不等式，制定了一个尊重卷积层归纳偏差的新度量，并确定了近似的欧氏嵌入，使网络表示能纳入基本上任何现有的机器学习方法中。在生物学(Allen Institute Brain Observatory)和深度学习(NAS-Bench-101)的大规模数据集上展示了这些方法。确定了可从解剖学特征和模型性能方面解释的神经表示间的关系。

Understanding the operation of biological and artificial networks remains a difficult and important challenge. To identify general principles, researchers are increasingly interested in surveying large collections of networks that are trained on, or biologically adapted to, similar tasks. A standardized set of analysis tools is now needed to identify how network-level covariates—such as architecture, anatomical brain region, and model organism—impact neural representations (hidden layer activations). Here, we provide a rigorous foundation for these analyses by defining a broad family of metric spaces that quantify representational dissimilarity. Using this framework we modify existing representational similarity measures based on canonical correlation analysis to satisfy the triangle inequality, formulate a novel metric that respects the inductive biases in convolutional layers, and identify approximate Euclidean embeddings that enable network representations to be incorporated into essentially any off-the-shelf machine learning method. We demonstrate these methods on large-scale datasets from biology (Allen Institute Brain Observatory) and deep learning (NAS-Bench-101). In doing so, we identify relationships between neural representations that are interpretable in terms of anatomical features and model performance.

https://weibo.com/1402400261/KFFBWF1Hd

2、[CV] Projected GANs Converge Faster

A Sauer, K Chitta, J Müller, A Geiger

[University of Tübingen & University Heidelberg]

基于投影的GAN快速收敛。生成对抗网络(GAN)能产生高质量的图像，但训练起来却很困难，需要仔细的正则化、大量的计算和昂贵的超参数扫描。本文通过将生成的和真实的样本投射到一个固定的、预训练过的特征空间，在这些问题上取得了重大进展。由于发现判别器不能完全利用来自预训练模型更深层的特征，本文提出一个更有效的策略，将不同通道和分辨率的特征混合起来。所提出的Projected GAN提高了图像质量、采样效率和收敛速度。进一步兼容高达100万像素的分辨率，并在22个基准数据集上推进了最先进的Fréchet Inception Distance(FID)。Projected GANs与之前最低的FID相比，速度快了40倍，在相同的计算资源下，历经时间从5天缩短到3小时以内。

Generative Adversarial Networks (GANs) produce high-quality images but are challenging to train. They need careful regularization, vast amounts of compute, and expensive hyper-parameter sweeps. We make significant headway on these issues by projecting generated and real samples into a fixed, pretrained feature space. Motivated by the finding that the discriminator cannot fully exploit features from deeper layers of the pretrained model, we propose a more effective strategy that mixes features across channels and resolutions. Our Projected GAN improves image quality, sample efficiency, and convergence speed. It is further compatible with resolutions of up to one Megapixel and advances the state-of-the-art Fréchet Inception Distance (FID) on twenty-two benchmark datasets. Importantly, Projected GANs match the previously lowest FIDs up to 40 times faster, cutting the wall-clock time from 5 days to less than 3 hours given the same computational resources.

https://weibo.com/1402400261/KFFHDt3B2

3、[LG] Mastering Atari Games with Limited Data

W Ye, S Liu, T Kurutach, P Abbeel, Y Gao

[Tsinghua University & UC Berkeley & Shanghai Qi Zhi Institute]

用有限数据掌握雅达利游戏。强化学习在许多应用中都取得了巨大的成功。然而，样本效率仍然是一个关键的挑战，著名的方法需要数百万(甚至数十亿)环境步骤来训练。最近，在基于样本效率的图像强化学习算法方面取得了重大进展；然而，在Atari游戏基准上的一致的人类水平表现仍然是一个难以实现的目标。本文提出一种建立在MuZero基础上的高效基于模型的视觉强化学习EfficientZero。该方法在Atari 100k基准上实现了190.4%的平均人类性能和116.0%的中位性能，只用了两个小时的实时游戏经验，并且在DMControl 100k基准的一些任务中超过了State SAC。这是第一次有算法在数据如此少的情况下在Atari游戏上达到超过人类的性能。EfficientZero的性能也接近于DQN在2亿帧时的性能，而消耗的数据却少了500倍。EfficientZero的低样本复杂度和高性能可以使强化学习更接近于现实世界的适用性。

Reinforcement learning has achieved great success in many applications. However, sample efficiency remains a key challenge, with prominent methods requiring millions (or even billions) of environment steps to train. Recently, there has been significant progress in sample efficient image-based RL algorithms; however, consistent human-level performance on the Atari game benchmark remains an elusive goal. We propose a sample efficient model-based visual RL algorithm built on MuZero, which we name EfficientZero. Our method achieves 190.4% mean human performance and 116.0% median performance on the Atari 100k benchmark with only two hours of real-time game experience and outperforms the state SAC in some tasks on the DMControl 100k benchmark. This is the first time an algorithm achieves super-human performance on Atari games with such little data. EfficientZero’s performance is also close to DQN’s performance at 200 million frames while we consume 500 times less data. EfficientZero’s low sample complexity and high performance can bring RL closer to real-world applicability. We implement our algorithm in an easy-to-understand manner and it is available at https://github.com/YeWR/EfficientZero. We hope it will accelerate the research of MCTS-based RL algorithms in the wider community. Figure 1: Our proposed method EfficientZero is 170% and 180% better than the previous SoTA performance in mean and median human normalized score and is the first to outperform the average human performance on the Atari 100k benchmark. The high sample efficiency and performance of EfficientZero can bring RL closer to the real-world applications.

https://weibo.com/1402400261/KFFLJnc27

4、[CL] Pseudo-Labeling for Massively Multilingual Speech Recognition

L Lugosch, T Likhomanenko, G Synnaeve, R Collobert

[McGill University & Facebook AI Research]

基于伪标签的大规模多语言语音识别。通过伪标签进行半监督学习已经成为最先进的单语言语音识别系统的主要内容。本文将伪标签扩展到60种语言的大规模多语言语音识别。提出一种简单的伪标签方法，即使是低资源语言也能很好地工作：训练一个有监督的多语言模型，在目标语言上用半监督学习对其进行微调，为该语言生成伪标签，并使用所有语言的伪标签训练一个最终模型，无论是从头开始还是通过微调进行。在有标签的Common Voice和无标签的VoxPopuli数据集上的实验表明，所提出方法可以产生一个对许多语言有更好性能的模型，并且可以很好地迁移到LibriSpeech。

Semi-supervised learning through pseudo-labeling has become a staple of state-of-the-art monolingual speech recognition systems. In this work, we extend pseudo-labeling to massively multilingual speech recognition with 60 languages. We propose a simple pseudo-labeling recipe that works well even with low-resource languages: train a supervised multilingual model, fine-tune it with semi-supervised learning on a target language, generate pseudo-labels for that language, and train a final model using pseudo-labels for all languages, either from scratch or by fine-tuning. Experiments on the labeled Common Voice and unlabeled VoxPopuli datasets show that our recipe can yield a model with better performance for many languages that also transfers well to LibriSpeech.

https://weibo.com/1402400261/KFFOtgBRd

5、[LG] Identifying and Benchmarking Natural Out-of-Context Prediction Problems

D Madras, R Zemel

[University of Toronto]

识别和测定自然上下文外预测问题。深度学习系统在上下文以外(OOC)预测方面经常失败，这是一个对不常见或不寻常的输入或训练分布的子群进行可靠预测的问题。为此，最近推出了一些衡量OOC性能的基准。本文引入一个框架，统一了关于OOC性能测量的文献，并展示了如何利用丰富的辅助信息来识别现有数据集中的OOC实例的候选集。提出NOOCH：一套自然发生的"挑战集"，展示了如何利用不同的上下文概念来探测特定OOC失败模式。通过实验，在这些挑战集上探索了各种学习方法之间的权衡，展示了在设计OOC基准时所作的选择如何能够产生不同的结论。

Deep learning systems frequently fail at out-of-context (OOC) prediction, the problem of making reliable predictions on uncommon or unusual inputs or subgroups of the training distribution. To this end, a number of benchmarks for measuring OOC performance have recently been introduced. In this work, we introduce a framework unifying the literature on OOC performance measurement, and demonstrate how rich auxiliary information can be leveraged to identify candidate sets of OOC examples in existing datasets. We present NOOCH: a suite of naturallyoccurring “challenge sets”, and show how varying notions of context can be used to probe specific OOC failure modes. Experimentally, we explore the tradeoffs between various learning approaches on these challenge sets and demonstrate how the choices made in designing OOC benchmarks can yield varying conclusions.

https://weibo.com/1402400261/KFFRZjRL7