爱可可AI前沿推介(11.20)

LG - 机器学习 CV - 计算机视觉 CL - 计算与语言 AS - 音频与语音 RO - 机器人

转自爱可可爱生活

1、[CV] ClimateGAN: Raising Climate Change Awareness by Generating Images of Floods

V Schmidt, A S Luccioni, M Teng, T Zhang, A Reynaud, S Raghupathi, G Cosne, A Juraver, V Vardanyan, A Hernandez-Garcia, Y Bengio

[Mila Quebec AI Institute & CDRIN]

ClimateGAN：用洪水图像生成提高气候变化感知。气候变化是对人类的一个主要威胁，防止其灾难性后果所需的行动包括策略制定和个人行为的改变。然而，采取行动需要了解气候变化的影响，尽管它们可能看起来很抽象和遥远。预测极端气候事件的潜在后果，如在熟悉的地方发生洪水，可以帮助使气候变化的抽象影响更具体，并鼓励采取行动。作为建立一个将极端气候事件投射到用户选择的照片上的网站的更大倡议的一部分，本文提出了解决方案，在真实图像上模拟照片里逼真的洪水。为了在没有合适训练数据的情况下解决这个复杂的任务，本文提出了ClimateGAN，一个利用模拟和真实数据进行无监督域适应和有条件图像生成的模型。本文描述了该框架的细节，彻底评估了架构的组成部分，并证明模型能鲁棒地生成照片里的逼真洪水。

Climate change is a major threat to humanity, and the actions required to prevent its catastrophic consequences include changes in both policy-making and individual behaviour. However, taking action requires understanding the effects of climate change, even though they may seem abstract and distant. Projecting the potential consequences of extreme climate events such as flooding in familiar places can help make the abstract impacts of climate change more concrete and encourage action. As part of a larger initiative to build a website that projects extreme climate events onto user-chosen photos, we present our solution to simulate photo-realistic floods on authentic images. To address this complex task in the absence of suitable training data, we propose ClimateGAN, a model that leverages both simulated and real data for unsupervised domain adaptation and conditional image generation. In this paper, we describe the details of our framework, thoroughly evaluate components of our architecture and demonstrate that our model is capable of robustly generating photo-realistic flooding.

https://weibo.com/1402400261/KDxG1l8wQ

2、[LG] Differentially Private Fine-tuning of Language Models

D Yu, S Naik, A Backurs, S Gopi, H A. Inan, G Kamath, J Kulkarni, Y T Lee, A Manoel, L Wutschitz, S Yekhanin, H Zhang

[Microsoft Research Asia & Microsoft & Microsoft Research & University of Waterloo]

语言模型的可微隐私微调。本文给出了更简单、更稀疏、更快速的算法，用于大规模预训练语言模型的可微隐私微调，在许多标准NLP任务上实现了最先进的隐私与效用权衡。本文为这个问题提出了一种元框架，其灵感来自于最近高参数效率微调方法的成功。实验表明，这些方法的可微隐私适应性在三个重要维度上优于之前的隐私算法：效用、隐私以及隐私训练的计算和内存成本。在许多通常研究的数据集上，隐私模型的效用接近于非隐私模型的效用。例如，在MNLI数据集上，用RoBERTa-Large达到87.8%的准确率，使用RoBERTa-Base达到83.5%，隐私预算为ϵ=6.7。相比之下，在没有隐私约束的情况下，RoBERTa-Large实现了90.2%的准确性。对于自然语言生成任务也是类似的。使用DART进行隐私微调，GPT-2-Small、GPT-2-Medium、GPT-2-Large和GPT-2-XL的BLEU分数分别为38.5、42.0、43.1和43.8(隐私预算为ϵ=6.8,δ=1e-5)，而非隐私基线为48.1。所有实验都表明，较大的模型更适合于隐私微调：虽然众所周知它们在非隐私情况下能取得较好的精度，当引入隐私时，它们也能更好地保持其精度。

We give simpler, sparser, and faster algorithms for differentially private fine-tuning of large-scale pre-trained language models, which achieve the state-of-the-art privacy versus utility tradeoffs on many standard NLP tasks. We propose a meta-framework for this problem, inspired by the recent success of highly parameter-efficient methods for fine-tuning. Our experiments show that differentially private adaptations of these approaches outperform previous private algorithms in three important dimensions: utility, privacy, and the computational and memory cost of private training. On many commonly studied datasets, the utility of private models approaches that of non-private models. For example, on the MNLI dataset we achieve an accuracy of 87.8% using RoBERTa-Large and 83.5% using RoBERTa-Base with a privacy budget of ϵ=6.7. In comparison, absent privacy constraints, RoBERTa-Large achieves an accuracy of 90.2%. Our findings are similar for natural language generation tasks. Privately fine-tuning with DART, GPT-2-Small, GPT-2-Medium, GPT-2-Large, and GPT-2-XL achieve BLEU scores of 38.5, 42.0, 43.1, and 43.8 respectively (privacy budget of ϵ=6.8,δ= 1e-5) whereas the non-private baseline is 48.1. All our experiments suggest that larger models are better suited for private fine-tuning: while they are well known to achieve superior accuracy non-privately, we find that they also better maintain their accuracy when privacy is introduced.

https://weibo.com/1402400261/KDxJmuzmJ

3、[LG] Learning in High Dimension Always Amounts to Extrapolation

R Balestriero, J Pesenti, Y LeCun

[Facebook AI Research]

高维学习总是相当于外推。内插和外推的概念在从深度学习到函数近似的各个领域都是很重要。当一个样本x落在给定数据集的凸壳内或边界上时，就会发生插值。当x落在凸壳之外时，就会发生外推。一个基本的(错误)概念是，最先进的算法之所以工作得这么好，是因为它们能够正确地插值训练数据。第二个(错误)概念是，插值发生在整个任务和数据集中，事实上，许多直觉和理论都依赖于这个假设。本文从经验和理论上反对这两点，并证明在任何高维(>100)数据集上，内插几乎肯定不会发生。这些结果挑战了目前的内插/外推定义作为泛化性能指标的有效性。

The notion of interpolation and extrapolation is fundamental in various fields from deep learning to function approximation. Interpolation occurs for a sample x whenever this sample falls inside or on the boundary of the given dataset's convex hull. Extrapolation occurs when x falls outside of that convex hull. One fundamental (mis)conception is that state-of-the-art algorithms work so well because of their ability to correctly interpolate training data. A second (mis)conception is that interpolation happens throughout tasks and datasets, in fact, many intuitions and theories rely on that assumption. We empirically and theoretically argue against those two points and demonstrate that on any high-dimensional (>100) dataset, interpolation almost surely never happens. Those results challenge the validity of our current interpolation/extrapolation definition as an indicator of generalization performances.

https://weibo.com/1402400261/KDxM6kGI8

4、[CV] Understanding Dimensional Collapse in Contrastive Self-supervised Learning

L Jing, P Vincent, Y LeCun, Y Tian

[Facebook AI Research]

理解对比自监督学习中的维度坍缩。自监督视觉表示学习旨在不依赖人工注释学习有用的表示。联合嵌入方法的基础是使同一图像的不同视图的嵌入向量之间的一致最大化。人们提出了各种方法来解决坍缩问题，即所有嵌入向量都坍缩为一个微不足道的常数解。在这些方法中，对比学习通过负样本对防止坍缩。有研究表明，非对比方法受到不同性质的坍缩问题的影响较小：维度坍缩，即嵌入向量最终跨越一个较低维度的子空间而不是整个可用的嵌入空间。本文表明，维度坍缩也发生在对比学习中。本文阐明了在对比性学习中导致维度坍缩的动力学作用。在理论启发下，提出了一种新的对比学习方法DirectCLR，直接优化了表示空间，而不依赖于可训练的投影器。实验表明，DirectCLR在ImageNet上的表现优于带有可训练线性投影器的SimCLR。

Self-supervised visual representation learning aims to learn useful representations without relying on human annotations. Joint embedding approach bases on maximizing the agreement between embedding vectors from different views of the same image. Various methods have been proposed to solve the collapsing problem where all embedding vectors collapse to a trivial constant solution. Among these methods, contrastive learning prevents collapse via negative sample pairs. It has been shown that non-contrastive methods suffer from a lesser collapse problem of a different nature: dimensional collapse, whereby the embedding vectors end up spanning a lower-dimensional subspace instead of the entire available embedding space. Here, we show that dimensional collapse also happens in contrastive learning. In this paper, we shed light on the dynamics at play in contrastive learning that leads to dimensional collapse. Inspired by our theory, we propose a novel contrastive learning method, called DirectCLR, which directly optimizes the representation space without relying on a trainable projector. Experiments show that DirectCLR outperforms SimCLR with a trainable linear projector on ImageNet.

https://weibo.com/1402400261/KDxOzoycM

5、[LG] Cross-Domain Imitation Learning via Optimal Transport

A Fickinger, S Cohen, S Russell, B Amos

[Berkeley AI Research & University College London & Facebook AI]

基于最优传输的跨域模仿学习。跨域模仿学习研究如何用一个智能体的专家示范来训练具有不同体现或形态的模仿智能体。比较专家和模仿者之间的轨迹和静止分布具有挑战性，因为它们生活在不同的系统中，甚至可能没有相同的维度。本文提出了Gromov-Wasserstein模仿学习(GWIL)，一种跨域模仿的方法，使用Gromov-Wasserstein距离来调整和比较智能体的不同空间的状态。所提出理论正式描述了GWIL保持最优性的情况，揭示了其可能性和局限性。证明了GWIL在非平凡的连续控制域中的有效性，从专家域的简单刚性转换到状态动作空间的任意转换。

Cross-domain imitation learning studies how to leverage expert demonstrations of one agent to train an imitation agent with a different embodiment or morphology. Comparing trajectories and stationary distributions between the expert and imitation agents is challenging because they live on different systems that may not even have the same dimensionality. We propose Gromov-Wasserstein Imitation Learning (GWIL), a method for cross-domain imitation that uses the Gromov-Wasserstein distance to align and compare states between the different spaces of the agents. Our theory formally characterizes the scenarios where GWIL preserves optimality, revealing its possibilities and limitations. We demonstrate the effectiveness of GWIL in non-trivial continuous control domains ranging from simple rigid transformation of the expert domain to arbitrary transformation of the state-action space.

https://weibo.com/1402400261/KDxRxgNTH

另外几篇值得关注的论文：

[CV] TLDR: Twin Learning for Dimensionality Reduction

TLDR：面向降维的孪生学习

Y Kalantidis, C Lassance, J Almazan, D Larlus

[NAVER LABS Europe]

https://weibo.com/1402400261/KDxTUFraf

[LG] Medical Dead-ends and Learning to Identify High-risk States and Treatments

医疗"死胡同"与学习识别高风险状态和治疗

M Fatemi, T W. Killian, J Subramanian, M Ghassemi

[Microsoft Research & University of Toronto & Adobe Research & MIT]

https://weibo.com/1402400261/KDxWapp81

[LG] Discovering and Achieving Goals via World Models

基于世界模型的目标发现与实现

R Mendonca, O Rybkin, K Daniilidis, D Hafner, D Pathak

[CMU & University of Pennsylvania & University of Toronto]

https://weibo.com/1402400261/KDxY71rzC

[LG] Improving Robustness using Generated Data

用生成数据提高鲁棒性

S Gowal, S Rebuffi, O Wiles, F Stimberg, D A Calian, T Mann

[DeepMind]

https://weibo.com/1402400261/KDy030KfR

内容中包含的图片若涉及版权问题，请及时与我们联系删除