爱可可AI前沿推介(12.11)

LG - 机器学习 CV - 计算机视觉 CL - 计算与语言 AS - 音频与语音 RO - 机器人

转自爱可可爱生活

1、[CL] Scaling Language Models: Methods, Analysis & Insights from Training Gopher

J W. Rae, S Borgeaud, T Cai...

[DeepMind]

语言模型规模化：方法、分析和Gopher训练带来的启示。语言建模通过利用大型人工撰写知识库来更好地预测和理解世界，向实现智能通信系统迈进了一步。本文介绍了对基于Transformer的语言模型在各种模型规模下的性能分析——从具有数千万参数的模型到具有2800亿参数的Gopher模型。这些模型在152个不同的任务上进行了评估，在大多数情况下实现了最先进的性能。在阅读理解、事实核查和不良语言识别等领域，规模的收益最大，但在逻辑和数学推理领域收益较小。对训练数据集和模型的行为进行了全面分析，涵盖了模型规模与偏差和毒性的交叉点。最后，讨论了语言模型在人工智能安全和减轻下游危害方面的应用。

Language modelling provides a step towards intelligent communication systems by harnessing large repositories of written human knowledge to better predict and understand the world. In this paper, we present an analysis of Transformer-based language model performance across a wide range of model scales — from models with tens of millions of parameters up to a 280 billion parameter model called Gopher. These models are evaluated on 152 diverse tasks, achieving state-of-the-art performance across the majority. Gains from scale are largest in areas such as reading comprehension, fact-checking, and the identification of toxic language, but logical and mathematical reasoning see less benefit. We provide a holistic analysis of the training dataset and model’s behaviour, covering the intersection of model scale with bias and toxicity. Finally we discuss the application of language models to AI safety and the mitigation of downstream harms.

2、[CV] Plenoxels: Radiance Fields without Neural Networks

A Yu, S Fridovich-Keil, M Tancik, Q Chen, B Recht, A Kanazawa

[UC Berkeley]

Plenoxels: 非神经网络辐射场。本文提出Plenoxels(Plenoptic voxels)，一种用于逼真视图合成的系统。Plenoxels将一个场景表示为具有球面谐波的稀疏3D网格。这种表示可以通过梯度方法和正则化从校准图像中优化，而不需要任何神经(网络)组件。在标准基准任务中，Plenoxels的优化速度比神经辐射场快两个数量级，而且没有视觉质量的损失。

We introduce Plenoxels (plenoptic voxels), a system for photorealistic view synthesis. Plenoxels represent a scene as a sparse 3D grid with spherical harmonics. This representation can be optimized from calibrated images via gradient methods and regularization without any neural components. On standard, benchmark tasks, Plenoxels are optimized two orders of magnitude faster than Neural Radiance Fields with no loss in visual quality. For video and code, please see https://alexyu.net/plenoxels.

3、[CV] GAN-Supervised Dense Visual Alignment

W Peebles, J Zhu, R Zhang, A Torralba, A Efros, E Shechtman

[UC Berkeley & CMU & Adobe Research & MIT CSAIL]

GAN-监督密集视觉对齐。本文提出GAN-监督学习，一种判别模型学习及其GAN生成训练数据的端到端联合框架。将所提出框架用于密集视觉对齐问题。受经典的Congealing方法的启发，用GANgealing算法训练了一个空间Transformer，将未对齐数据上训练的GAN随机样本映射到一个共同的、联合学习的目标模式。展示了八个数据集的结果，表明所提出方法成功地对齐了复杂的数据，并发现了密集的对应关系。GANgealing明显优于过去的自监督对应算法，在几个数据集上的表现与最先进的监督对应算法相当(有时甚至超过)——尽管没有用任何对应监督或数据增强，而且完全是在GAN生成的数据上训练。对于精确的对应关系，比最先进的监督方法提高了3倍之多。展示了该方法在增强现实、图像编辑和为下游GAN训练自动预处理图像数据集方面的应用。

We propose GAN-Supervised Learning, a framework for learning discriminative models and their GAN-generated training data jointly end-to-end. We apply our framework to the dense visual alignment problem. Inspired by the classic Congealing method, our GANgealing algorithm trains a Spatial Transformer to map random samples from a GAN trained on unaligned data to a common, jointly-learned target mode. We show results on eight datasets, all of which demonstrate our method successfully aligns complex data and discovers dense correspondences. GANgealing significantly outperforms past self-supervised correspondence algorithms and performs on-par with (and sometimes exceeds) state-of-the-art supervised correspondence algorithms on several datasets—without making use of any correspondence supervision or data augmentation and despite being trained exclusively on GAN-generated data. For precise correspondence, we improve upon state-of-the-art supervised methods by as much as 3×. We show applications of our method for augmented reality, image editing and automated preprocessing of image datasets for downstream GAN training.

4、[CL] Bidimensional Leaderboards: Generate and Evaluate Language Hand in Hand

J Kasai, K Sakaguchi, R L Bras, L Dunagan, J Morrison, A R. Fabbri, Y Choi, N A. Smith

[University of Washington & Allen Institute for AI & Salesforce Research]

双维度排行榜：兼顾语言的生成和评估。自然语言处理研究人员已经发现了生成任务评估方法的局限，并提出了关于自动衡量标准和众人判断的有效性的新问题。同时，改进生成模型的努力往往集中在简单的n-gram重叠度量上（例如BLEU、ROUGE）。本文认为，在模型和指标方面的新进展应该使彼此更直接地受益并为对方提供信息。因此，本文提出一种排行榜设计，即双维度排行榜(BILLBOARD)，同时跟踪语言生成任务的进展和对其评估的度量。与传统的单维度排行榜不同的是，BILLBOARD通过预先确定的指标对提交的系统进行排序，同时接受生成器和评估指标作为竞争条目。BILLBOARD自动创建一个组合指标，根据对生成器的全局分析，选择并线性地组合一些指标。此外，指标的排名是基于它们与人工判断的相关性。发布了四个用于机器翻译、摘要和图像说明的BILLBOARD。本文证明，几个不同指标的线性组合有时会大大超过孤立的现有指标。混合效应模型分析表明，大多数自动指标，特别是基于参考的指标，对机器的评价高于人工评价，这表明随着生成模型在未来变得更强大(也许与人工更相似)，更新指标的重要性。

Natural language processing researchers have identified limitations of evaluation methodology for generation tasks, with new questions raised about the validity of automatic metrics and of crowdworker judgments. Meanwhile, efforts to improve generation models tend to focus on simple n-gram overlap metrics (e.g., BLEU, ROUGE). We argue that new advances on models and metrics should each more directly benefit and inform the other. We therefore propose a generalization of leaderboards, bidimensional leaderboards (BILLBOARDs), that simultaneously tracks progress in language generation tasks and metrics for their evaluation. Unlike conventional unidimensional leaderboards that sort submitted systems by predetermined metrics, a BILLBOARD accepts both generators and evaluation metrics as competing entries. A BILLBOARD automatically creates an ensemble metric that selects and linearly combines a few metrics based on a global analysis across generators. Further, metrics are ranked based on their correlations with human judgments. We release four BILLBOARDs for machine translation, summarization, and image captioning.1 We demonstrate that a linear ensemble of a few diverse metrics sometimes substantially outperforms existing metrics in isolation. Our mixed-effects model analysis shows that most automatic metrics, especially the reference-based ones, overrate machine over human generation, demonstrating the importance of updating metrics as generation models become stronger (and perhaps more similar to humans) in the future.

5、[LG] Neural population geometry: An approach for understanding biological and artificial neural networks

S Chung, L. F. Abbott

[Columbia University]

神经群体几何学：了解生物和人工神经网络的方法。实验神经科学的进步改变了我们探索神经回路结构和功能的能力。同时，机器学习的进展释放了人工神经网络(ANN)的显著计算能力。虽然这两个领域有不同的工具和应用，但它们提出了一个类似的挑战：即了解信息如何通过高维表表示嵌入和处理，以解决复杂任务。解决这一挑战的方法之一是利用数学和计算工具来分析这些高维表示的几何学，即神经群体几何学。本文回顾了对生物和人工神经网络功能有深入了解的几何学方法的例子：感知中的表示解缠，认知系统中的分类能力、解缠和抽象几何学理论，认知地图拓扑表示，运动系统动态解缠，以及认知的动态方法。这些发现共同说明了机器学习、神经科学和几何学交汇处的一个令人兴奋的趋势，其中神经群体几何学提供了一个有用的群体层面的机械性描述，是任务实施的基础。重要的是，几何描述适用于各种感觉模式、大脑区域、网络结构和时间尺度。因此，神经群体几何学有可能统一我们对生物和人工神经网络的结构和功能的理解，弥合单个神经元、群体活动和行为之间的差距。

Advances in experimental neuroscience have transformed our ability to explore the structure and function of neural circuits. At the same time, advances in machine learning have unleashed the remarkable computational power of artificial neural networks (ANNs). While these two fields have different tools and applications, they present a similar challenge: namely, understanding how information is embedded and processed through high-dimensional representations to solve complex tasks. One approach to addressing this challenge is to utilize mathematical and computational tools to analyze the geometry of these high-dimensional representations, i.e., neural population geometry. We review examples of geometrical approaches providing insight into the function of biological and artificial neural networks: representation untangling in perception, a geometric theory of classification capacity, disentanglement, and abstraction in cognitive systems, topological representations underlying cognitive maps, dynamic untangling in motor systems, and a dynamical approach to cognition. Together, these findings illustrate an exciting trend at the intersection of machine learning, neuroscience, and geometry, in which neural population geometry provides a useful population-level mechanistic descriptor underlying task implementation. Importantly, geometric descriptions are applicable across sensory modalities, brain regions, network architectures, and timescales. Thus, neural population geometry has the potential to unify our understanding of structure and function in biological and artificial neural networks, bridging the gap between single neurons, population activities, and behavior.

另外几篇值得关注的论文：

[LG] Machine Learning in the Search for New Fundamental Physics

用机器学习探索新基础物理

G Karagiorgi, G Kasieczka, S Kravitz, B Nachman, D Shih

[Columbia University & Universität Hamburg & Lawrence Berkeley National Laboratory & Rutgers University]

[CV] Label-Efficient Semantic Segmentation with Diffusion Models

基于扩散模型的标签高效语义分割

D Baranchuk, I Rubachev, A Voynov, V Khrulkov, A Babenko

[Yandex]

[LG] ALX: Large Scale Matrix Factorization on TPUs

ALX：TPU上的大规模矩阵分解

H Mehta, S Rendle, W Krichene, L Zhang

[Google Research]

[LG] Hierarchical Reinforcement Learning with Timed Subgoals

基于定时子目标的分层强化学习

N Gürtler, D Büchler, G Martius

[Max Planck Institute for Intelligent Systems]

内容中包含的图片若涉及版权问题，请及时与我们联系删除