爱可可AI前沿推介(10.10)

LG - 机器学习 CV - 计算机视觉 CL - 计算与语言 AS - 音频与语音 RO - 机器人

转自爱可可爱生活

摘要：背诵-增强语言模型、轮回强化学习、语言模型是多语言思维链推理器、校准泛化差距、用知识解除学习减轻语言模型隐私风险、深度生成模型的基于内容搜索、目标误泛化、从预训练语言模型中学习可解释的问答管线、生成式建模的流匹配

1、[CL] Recitation-Augmented Language Models

Z Sun, X Wang, Y Tay, Y Yang, D Zhou
[Google Research & CMU]
背诵-增强语言模型。本文提出一种新范式，帮助大型语言模型(LLM)在不从外部语料库检索的情况下生成更准确的事实知识，称为RECITation-augmented gEneration(RECITE)，即背诵-增强生成。与检索增强语言模型在生成输出之前检索相关文档不同，给定一个输入，RECITE首先通过抽样从LLM自己的记忆中背诵一个或几个相关段落，然后生成最终答案。实验表明，RECITE是知识密集型NLP任务的一个强大范式。通过利用背诵作为中间步骤，背诵和回答方案可以在各种闭卷答题(CBQA)任务中取得新的最先进的性能。通过实验验证了RECITE在三个预训练模型(PaLM、UL2和OPT)和三个CBQA任务(Natural Questions、TriviaQA和HotpotQA)中的有效性。

We propose a new paradigm to help Large Language Models (LLMs) generate more accurate factual knowledge without retrieving from an external corpus, called RECITation-augmented gEneration (RECITE). Different from retrieval-augmented language models that retrieve relevant documents before generating the outputs, given an input, RECITE first recites one or several relevant passages from LLMs' own memory via sampling, and then produces the final answers. We show that RECITE is a powerful paradigm for knowledge-intensive NLP tasks. Specifically, we show that by utilizing recitation as the intermediate step, a recite-and-answer scheme can achieve new state-of-the-art performance in various closed-book question answering (CBQA) tasks. In experiments, we verify the effectiveness of RECITE on three pre-trained models (PaLM, UL2, and OPT) and three CBQA tasks (Natural Questions, TriviaQA, and HotpotQA).

https://arxiv.org/abs/2210.01296

2、[LG] Reincarnating Reinforcement Learning: Reusing Prior Computation to Accelerate Progress

R Agarwal, M Schwarzer, P S Castro, A Courville, M G. Bellemare
[Google Research & MILA]
轮回强化学习：重用之前计算以加速进度。在强化学习(RL)研究中，普遍的工作流程是在没有任何先验知识的情况下学习tabula rasa。然而，当RL系统应用于大规模环境时，很少会操作tabula rasa。这样的大规模系统在其开发周期中经历了多种设计或算法的变化，并使用特别的方法来纳入这些变化，而不从头开始重新训练，这将是非常昂贵的。此外，深度RL的低效率通常将无法获得工业规模资源的研究人员排除在处理计算要求高的问题之外。为解决该问题，本文提出轮回RL，作为一种可供选择的工作流程或问题设置类别，其中之前的计算工作(例如，学到的策略)在RL智能体的设计迭代之间被重用或迁移，或者从一个RL智能体到另一个。作为实现从任何智能体到任何其他智能体的RL轮回的一个步骤，本文专注于有效地将现有的次优策略迁移到一个独立的基于价值的RL智能体的具体设置。本文发现，现有的方法在这种情况下是失败的，并提出了一种简单的算法来解决其局限性。借助该算法，本文在Atari 2600游戏、一个具有挑战性的运动任务以及为平流层气球导航的现实世界问题上证明了轮回RL比tabula rasa RL的优势。总的来说，本文工作论证了另一种RL研究方法，相信它可以大大改善现实世界的RL应用，并帮助它进一步大众化。

Learning tabula rasa, that is without any prior knowledge, is the prevalent workflow in reinforcement learning (RL) research. However, RL systems, when applied to large-scale settings, rarely operate tabula rasa. Such large-scale systems undergo multiple design or algorithmic changes during their development cycle and use ad hoc approaches for incorporating these changes without re-training from scratch, which would have been prohibitively expensive. Additionally, the inefficiency of deep RL typically excludes researchers without access to industrial-scale resources from tackling computationally-demanding problems. To address these issues, we present reincarnating RL as an alternative workflow or class of problem settings, where prior computational work (e.g., learned policies) is reused or transferred between design iterations of an RL agent, or from one RL agent to another. As a step towards enabling reincarnating RL from any agent to any other agent, we focus on the specific setting of efficiently transferring an existing sub-optimal policy to a standalone value-based RL agent. We find that existing approaches fail in this setting and propose a simple algorithm to address their limitations. Equipped with this algorithm, we demonstrate reincarnating RL's gains over tabula rasa RL on Atari 2600 games, a challenging locomotion task, and the real-world problem of navigating stratospheric balloons. Overall, this work argues for an alternative approach to RL research, which we believe could significantly improve real-world RL adoption and help democratize it further. Open-sourced code and trained agents at this https URL.

https://arxiv.org/abs/2206.01626

3、[CL] Language Models are Multilingual Chain-of-Thought Reasoners

F Shi, M Suzgun, M Freitag, X Wang, S Srivats, S Vosoughi, H W Chung, Y Tay, S Ruder, D Zhou, D Das, J Wei
[Google Research & Dartmouth College]
语言模型是多语言思维链推理器。本文评估了大型语言模型在多语言环境中的推理能力，引入了多语言小学数学(MGSM)基准，将GSM8K数据集中的250个小学数学问题手动翻译成10种不同类型的语言。通过思维链提示解决MGSM问题的能力随着模型规模的增加而出现，而且模型具有惊人的强大的多语言推理能力，甚至在孟加拉语和斯瓦希里语等代表不足的语言中也是如此。语言模型的多语言推理能力延伸到其他任务中，如常识推理和语境中的词义判断。

We evaluate the reasoning abilities of large language models in multilingual settings. We introduce the Multilingual Grade School Math (MGSM) benchmark, by manually translating 250 grade-school math problems from the GSM8K dataset (Cobbe et al., 2021) into ten typologically diverse languages. We find that the ability to solve MGSM problems via chain-of-thought prompting emerges with increasing model scale, and that models have strikingly strong multilingual reasoning abilities, even in underrepresented languages such as Bengali and Swahili. Finally, we show that the multilingual reasoning abilities of language models extend to other tasks such as commonsense reasoning and word-in-context semantic judgment. The MGSM benchmark is publicly available at this https URL.

https://arxiv.org/abs/2210.03057

4、[LG] The Calibration Generalization Gap

A. M Carrell, N Mallinar, J Lucas, P Nakkiran
[University of Cambridge & UC San Diego & NVIDIA & Apple]
校准泛化差距。校准是一个好的预测模型的基本属性：它要求模型的预测正确率与它的置信成正比。然而，现代神经网络对其校准没有提供强有力的保证——根据不同的设置，既可能是校准不良的，也可能是校准良好的。目前还不清楚哪些因素有助于良好的校准(架构、数据增强、过参数化等)，尽管文献中存在各种说法。本文提出一种系统方法来研究校准误差：将其分解为 (1)训练集上的校准误差，以及(2)校准泛化差距。这反映了泛化的基本分解。对这些项进行研究，并给出经验证据：(1)DNN通常总是在其训练集上进行校准，(2)校准泛化差距是由标准泛化差距所限定的。综上所述，这意味着具有较小泛化差距(|测试误差-训练误差|)的模型是经过良好校准的。这一观点统一了文献中的许多结果，并表明减少泛化差距的干预措施(如增加数据、加大增强力度或较小模型规模)也能改善校准。希望本文的初步研究能为更系统和全面地理解校准、泛化和优化之间的关系提供基础。

Calibration is a fundamental property of a good predictive model: it requires that the model predicts correctly in proportion to its confidence. Modern neural networks, however, provide no strong guarantees on their calibration -- and can be either poorly calibrated or well-calibrated depending on the setting. It is currently unclear which factors contribute to good calibration (architecture, data augmentation, overparameterization, etc), though various claims exist in the literature.

We propose a systematic way to study the calibration error: by decomposing it into (1) calibration error on the train set, and (2) the calibration generalization gap. This mirrors the fundamental decomposition of generalization. We then investigate each of these terms, and give empirical evidence that (1) DNNs are typically always calibrated on their train set, and (2) the calibration generalization gap is upper-bounded by the standard generalization gap. Taken together, this implies that models with small generalization gap (|Test Error - Train Error|) are well-calibrated. This perspective unifies many results in the literature, and suggests that interventions which reduce the generalization gap (such as adding data, using heavy augmentation, or smaller model size) also improve calibration. We thus hope our initial study lays the groundwork for a more systematic and comprehensive understanding of the relation between calibration, generalization, and optimization.

https://arxiv.org/abs/2210.01964

5、[CL] Knowledge Unlearning for Mitigating Privacy Risks in Language Models

J Jang, D Yoon, S Yang, S Cha, M Lee, L Logeswaran, M Seo
[KAIST & LG AI Research & Konkuk University & Seoul National University]
用知识解除学习减轻语言模型隐私风险。预训练语言模型(LM)在最初的预训练中会记忆大量的知识，包括可能侵犯个人生活和身份隐私的信息。以前解决语言模型隐私问题的工作大多集中在数据预处理和差分隐私方法上，这两种方法都需要重新训练基础LM。本文提出知识解除学习作为一种替代方法来减少LM的事后隐私风险。简单地将非可能性训练目标应用于目标标记序列，就能有效地遗忘它们，而几乎不降低一般语言建模的性能；有时甚至只需几次迭代就能大幅提高基础LM。本文还发现，依次解除学习比试图一次解除所有数据的学习要好，而且解除学习在很大程度上取决于被遗忘的是哪种数据(领域)。通过展示与之前已知的减轻LM隐私风险的数据预处理方法的比较，本文表明，在易受提取攻击的数据事先已知的情况下，解除学习可以提供更强的经验隐私保证，同时在计算上更高效。

Pretrained Language Models (LMs) memorize a vast amount of knowledge during initial pretraining, including information that may violate the privacy of personal lives and identities. Previous work addressing privacy issues for language models has mostly focused on data preprocessing and differential privacy methods, both requiring re-training the underlying LM. We propose knowledge unlearning as an alternative method to reduce privacy risks for LMs post hoc. We show that simply applying the unlikelihood training objective to target token sequences is effective at forgetting them with little to no degradation of general language modeling performances; it sometimes even substantially improves the underlying LM with just a few iterations. We also find that sequential unlearning is better than trying to unlearn all the data at once and that unlearning is highly dependent on which kind of data (domain) is forgotten. By showing comparisons with a previous data preprocessing method known to mitigate privacy risks for LMs, we show that unlearning can give a stronger empirical privacy guarantee in scenarios where the data vulnerable to extraction attacks are known a priori while being orders of magnitude more computationally efficient. We release the code and dataset needed to replicate our results at this https URL .

https://arxiv.org/abs/2210.01504

另外几篇值得关注的论文：

[CV] Content-Based Search for Deep Generative Models

深度生成模型的基于内容搜索
D Lu, S Wang, N Kumari, R Agarwal, D Bau, J Zhu
[CMU & Northeastern University]
https://arxiv.org/abs/2210.03116

[LG] Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals

目标误泛化：为什么正确定制不足以实现正确目标
R Shah, V Varma, R Kumar, M Phuong, V Krakovna, J Uesato, Z Kenton
[Deepmind]
https://arxiv.org/abs/2210.01790

[CL] Honest Students from Untrusted Teachers: Learning an Interpretable Question-Answering Pipeline from a Pretrained Language Model

来自不受信任老师的诚实学生：从预训练语言模型中学习可解释的问答管线
J Eisenstein, D Andor, B Bohnet, M Collins, D Mimno
[Google Research]
https://arxiv.org/abs/2210.02498

[LG] Flow Matching for Generative Modeling

生成式建模的流匹配
Y Lipman, R T. Q. Chen, H Ben-Hamu, M Nickel, M Le
[Meta AI & Weizmann Institute of Science]
https://arxiv.org/abs/2210.02747

内容中包含的图片若涉及版权问题，请及时与我们联系删除