爱可可AI前沿推介(2.12)

LG - 机器学习 CV - 计算机视觉 CL - 计算与语言 AS - 音频与语音 RO - 机器人

转自爱可可爱生活

1、[CV] Block-NeRF: Scalable Large Scene Neural View Synthesis

M Tancik, V Casser, X Yan, S Pradhan, B Mildenhall, P P. Srinivasan, J T. Barron, H Kretzschmar

[UC Berkeley & Waymo & Google Research]

Block-NeRF：可扩展大规模场景神经视图合成。本文提出Block-NeRF，神经辐射场的一种变体，可以对大规模环境进行表示和重建。证明了在扩展NeRF来渲染跨多区块城市规模场景时，将场景分解为单独训练的NeRF是至关重要的。分解将渲染时长与场景大小解耦，使渲染能扩展到任意大环境，允许单块环境的更新。采用了一些架构上的变化，使NeRF对不同环境条件下数月内捕获的数据具有鲁棒性。为每个单独的NeRF增加了外观嵌入、习得姿态细化和可控曝光，引入相邻NeRF间的外观对齐程序，使其可无缝结合起来。从280万张图像中建立了一个Block-NeRFs网格，创造了迄今为止最大的神经场景表征，能渲染旧金山的整个街区。在这样的规模下，收集的数据必然会有瞬时的物体和外观的变化，本文通过修改底层的NeRF架构来处理这些问题。

We present Block-NeRF, a variant of Neural Radiance Fields that can represent large-scale environments. Specifically, we demonstrate that when scaling NeRF to render city-scale scenes spanning multiple blocks, it is vital to decompose the scene into individually trained NeRFs. This decomposition decouples rendering time from scene size, enables rendering to scale to arbitrarily large environments, and allows per-block updates of the environment. We adopt several architectural changes to make NeRF robust to data captured over months under different environmental conditions. We add appearance embeddings, learned pose refinement, and controllable exposure to each individual NeRF, and introduce a procedure for aligning appearance between adjacent NeRFs so that they can be seamlessly combined. We build a grid of Block-NeRFs from 2.8 million images to create the largest neural scene representation to date, capable of rendering an entire neighborhood of San Francisco.

2、[LG] EvoJAX: Hardware-Accelerated Neuroevolution

Y Tang, Y Tian, D Ha

[Google Brain]

EvoJAX：硬件加速神经进化。进化计算已被证明是训练神经网络的一种非常有效的方法，特别是在CPU集群上大规模使用时。最近的工作也展示了它们在硬件加速器(如GPU)上的有效性，但到目前为止，这种演示是针对非常具体任务的，限制了对其他领域的适用性。本文提出EvoJAX，一种可扩展、通用的硬件加速神经进化工具包。在JAX库的基础上，该工具包使神经进化算法能与在多个TPU/GPU上并行运行的神经网络一起工作。EvoJAX通过在NumPy中实现进化算法、神经网络和任务，实现了非常高的性能，NumPy被及时编译并在加速器上运行。本项目为广泛的任务提供了EvoJAX的可扩展例子，包括监督学习、强化学习和生成艺术。由于EvoJAX可以在一个加速器上几分钟内找到大多数这些任务的解决方案，而用CPU时则需要几小时或几天，相信该工具包可大大缩短研究人员进行进化计算实验的迭代时间。

Evolutionary computation has been shown to be a highly effective method for training neural networks, particularly when employed at scale on CPU clusters. Recent work have also showcased their effectiveness on hardware accelerators, such as GPUs, but so far such demonstrations are tailored for very specific tasks, limiting applicability to other domains. We present EvoJAX, a scalable, general purpose, hardware-accelerated neuroevolution toolkit. Building on top of the JAX library, our toolkit enables neuroevolution algorithms to work with neural networks running in parallel across multiple TPU/GPUs. EvoJAX achieves very high performance by implementing the evolution algorithm, neural network and task all in NumPy, which is compiled just-in-time to run on accelerators. We provide extensible examples of EvoJAX for a wide range of tasks, including supervised learning, reinforcement learning and generative art. Since EvoJAX can find solutions to most of these tasks within minutes on a single accelerator, compared to hours or days when using CPUs, we believe our toolkit can significantly shorten the iteration time of conducting experiments for researchers working with evolutionary computation. Our project is available at https://github.com/google/evojax

3、[LG] Outracing champion Gran Turismo drivers with deep reinforcement learning

P R. Wurman, S Barrett, K Kawamoto...

[Sony AI]

用深度强化学习打败"GT赛车"冠军车手。人工智能的许多潜在应用涉及在物理系统中做出实时决策，同时与人互动。赛车是这些条件的一个极端例子；车手们必须执行复杂的战术动作，以超过或阻挡对手，同时在其拉力极限下操作车辆。赛车模拟，如PlayStation游戏《Gran Turismo(GT赛车)》，忠实再现了真实赛车的非线性控制挑战，同时包含复杂的多个体互动。本文描述了如何为《Gran Turismo》训练出能够与世界上最好的电子竞技车手竞争的智能体。将最先进的、无模型深度强化学习算法与混合场景训练相结合，学习一种综合控制策略，将卓越的速度与令人印象深刻的战术相结合。构建了一个奖励函数，使智能体能在遵守赛车的重要但不明确的体育精神规则的同时具有竞争力。通过在与世界上最好的四位"Gran Turismo"车手的正面竞争中获胜，证明了Gran Turismo Sophy智能体的能力。通过描述如何训练冠军级的赛车手，展示了在智能体必须尊重不精确定义的人类规范的领域中，使用这些技术来控制复杂动态系统的可能性和挑战。

Many potential applications of artificial intelligence involve making real-time decisions in physical systems while interacting with humans. Automobile racing represents an extreme example of these conditions; drivers must execute complex tactical manoeuvres to pass or block opponents while operating their vehicles at their traction limits. Racing simulations, such as the PlayStation game Gran Turismo, faithfully reproduce the non-linear control challenges of real race cars while also encapsulating the complex multi-agent interactions. Here we describe how we trained agents for Gran Turismo that can compete with the world’s best e-sports drivers. We combine state-of-the-art, model-free, deep reinforcement learning algorithms with mixed-scenario training to learn an integrated control policy that combines exceptional speed with impressive tactics. In addition, we construct a reward function that enables the agent to be competitive while adhering to racing’s important, but under-specified, sportsmanship rules. We demonstrate the capabilities of our agent, Gran Turismo Sophy, by winning a head-to-head competition against four of the world’s best Gran Turismo drivers. By describing how we trained championship-level racers, we demonstrate the possibilities and challenges of using these techniques to control complex dynamical systems in domains where agents must respect imprecisely defined human norms.

4、[CL] Locating and Editing Factual Knowledge in GPT

K Meng, D Bau, A Andonian, Y Belinkov

[MIT CSAIL & Northeastern University & Technion]

在GPT中定位和编辑事实性知识。本文研究自回归Transformer语言模型中事实性知识记忆的机制。提出了一种因果干预方法，用于识别能改变模型事实预测的神经元激活。在大型GPT风格的模型中，揭示了两组不同的神经元，假设它们分别对应于知道一个抽象的事实和说出一个具体的词。这一洞察激发了ROME的发展，一种编辑存储在模型权重中的事实的新方法。为了进行评估，组建了COUNTERFACT，一个由两万多个反事实和工具组成的数据集，以促进对知识编辑的敏感测量。使用COUNTERFACT，确认了"说"和"知道"神经元之间的区别，与其他方法相比，ROME在知识编辑方面达到了最先进的性能。

We investigate the mechanisms underlying factual knowledge recall in autoregressive transformer language models. First, we develop a causal intervention for identifying neuron activations capable of altering a model’s factual predictions. Within large GPT-style models, this reveals two distinct sets of neurons that we hypothesize correspond to knowing an abstract fact and saying a concrete word, respectively. This insight inspires the development of ROME, a novel method for editing facts stored in model weights. For evaluation, we assemble COUNTERFACT, a dataset of over twenty thousand counterfactuals and tools to facilitate sensitive measurements of knowledge editing. Using COUNTERFACT, we confirm the distinction between saying and knowing neurons, and we find that ROME achieves state-of-the-art performance in knowledge editing compared to other methods. An interactive demo notebook, full code implementation, and the dataset are available at https://rome.baulab.info/.

5、[CL] Survey of Hallucination in Natural Language Generation

Z Ji, N Lee, R Frieske, T Yu, D Su, Y Xu, E Ishii, Y Bang, A Madotto, P Fung

[Hong Kong University of Science and Technology]

自然语言生成错觉问题综述。近年来，由于深度学习技术的发展，如基于Transformer的语言模型，自然语言生成(NLG)得到了指数级的改善。这种进步使得自然语言生成更加流畅和连贯，带动了下游任务的发展，如抽象摘要、对话生成和数据到文本生成。然而，据调查，这种生成包括错觉文本，这使得文本生成的性能在许多现实世界的场景中不能满足用户的期望。为解决该问题，在各种任务中都提出了对错觉的评估和缓解方法的研究，但还没有以一种综合的方式进行审视。在这份综述中，对NLG错觉问题的研究进展和挑战做了一个广泛的概述。主要分为两大部分：(i)关于度量标准、缓解方法和未来方向的总体概述；(ii)在一大批下游任务中针对错觉的具体研究进展：抽象摘要、对话生成、生成性问答、数据到文本生成和机器翻译。

Natural Language Generation (NLG) has improved exponentially in recent years thanks to the development of deep learning technologies such as Transformer-based language models. This advancement has led to more fluent and coherent natural language generation, naturally leading to development in downstream tasks such as abstractive summarization, dialogue generation and data-to-text generation. However, it is also investigated that such generation includes hallucinated texts, which makes the performances of text generation fail to meet users' expectations in many real-world scenarios. In order to address this issue, studies in evaluation and mitigation methods of hallucinations have been presented in various tasks, but have not been reviewed in a combined manner. In this survey, we provide a broad overview of the research progress and challenges in the hallucination problem of NLG. The survey is organized into two big divisions: (i) a general overview of metrics, mitigation methods, and future directions; (ii) task-specific research progress for hallucinations in a large set of downstream tasks: abstractive summarization, dialogue generation, generative question answering, data-to-text generation, and machine translation. This survey could facilitate collaborative efforts among researchers in these tasks.