爱可可AI前沿推介(10.16)

LG - 机器学习 CV - 计算机视觉 CL - 计算与语言 AS - 音频与语音 RO - 机器人

转自：爱可可爱生活

1. [LG] Self-supervised Learning is More Robust to Dataset Imbalance

H Liu, J Z. HaoChen, A Gaidon, T Ma

[Stanford University & Toyota Research Institute]

自监督学习对数据不均衡更鲁棒。自监督学习(SSL)是一种可扩展的学习一般视觉表示的方式，在没有标签的情况下进行学习。然而，实际场景大规模无标签数据集往往有长尾标签分布，SSL的行为并不为人所知。本文系统地研究了数据集不平衡情况下的自监督学习。通过大量实验发现，现有的自监督表示已经比监督表示对类的不平衡更加鲁棒。在不同样本量下，用SSL进行平衡和非平衡的预训练，其性能差距明显小于监督学习的差距，对域内和域外的评估都是如此。为理解SSL的鲁棒性，假设SSL从频繁的数据中学习到更丰富的特征：可能学习到标签不相关但可迁移的特征，帮助对罕见类别和下游任务进行分类。相反，监督学习没有动力从频繁的样本中学习与标签无关的特征。通过半合成实验和对一个简化环境的理论分析来验证这一假说。在理论见解的启发下，设计了一种重加权的正则化技术，在不平衡的数据集上持续改善SSL表示质量，在几个评价标准上缩小了平衡数据集和不平衡数据集在相同样本数量上的小差距。

Self-supervised learning (SSL) is a scalable way to learn general visual representations since it learns without labels. However, large-scale unlabeled datasets in the wild often have long-tailed label distributions, where we know little about the behavior of SSL. In this work, we systematically investigate self-supervised learning under dataset imbalance. First, we find out via extensive experiments that off-the-shelf selfsupervised representations are already more robust to class imbalance than supervised representations. The performance gap between balanced and imbalanced pre-training with SSL is significantly smaller than the gap with supervised learning, across sample sizes, for both in-domain and, especially, out-ofdomain evaluation. Second, towards understanding the robustness of SSL, we hypothesize that SSL learns richer features from frequent data: it may learn label-irrelevant-but-transferable features that help classify the rare classes and downstream tasks. In contrast, supervised learning has no incentive to learn features irrelevant to the labels from frequent examples. We validate this hypothesis with semi-synthetic experiments and theoretical analyses on a simplified setting. Third, inspired by the theoretical insights, we devise a re-weighted regularization technique that consistently improves the SSL representation quality on imbalanced datasets with several evaluation criteria, closing the small gap between balanced and imbalanced datasets with the same number of examples.

https://weibo.com/1402400261/KCVTopgdp

2. [CL] Back to Square One: Artifact Detection, Training and Commonsense Disentanglement in the Winograd Schema

Y Elazar, H Zhang, Y Goldberg, D Roth

[Bar Ilan University & HKUST & UPenn]

Winograd Schema伪命题检测、训练与常识解缠。Winograd Schema(WS)被提出作为衡量模型常识能力的测试。最近，基于预训练语言模型的方法提高了一些WS基准的性能，但改进的来源仍然不清楚。本文提出，WS上的明显进步不一定反映常识推理的进步。为支持这一说法，首先表明目前WS的评估方法是次优的，并提出了一种使用孪生句子进行评估的修改。提出了两个新的基线，表明在WS的基准中存在着伪命题。本文提出一种方法，用于在零样本设置中评估类似WS的句子，以说明在预训练中获得的常识推理能力，并观察到流行的语言模型在这种设置中使用更严格的评估时表现随机。观察到的进步主要是由于在训练WS模型时使用了监督，这不可能成功地支持所有需要的常识性推理技能和知识。

The Winograd Schema (WS) has been proposed as a test for measuring commonsense capabilities of models. Recently, pre-trained language model-based approaches have boosted performance on some WS benchmarks but the source of improvement is still not clear. This paper suggests that the apparent progress on WS may not necessarily reflect progress in commonsense reasoning. To support this claim, we first show that the current evaluation method of WS is sub-optimal and propose a modification that uses twin sentences for evaluation. We also propose two new baselines that indicate the existence of artifacts in WS benchmarks. We then develop a method for evaluating WS-like sentences in a zero-shot setting to account for the commonsense reasoning abilities acquired during the pretraining and observe that popular language models perform randomly in this setting when using our more strict evaluation. We conclude that the observed progress is mostly due to the use of supervision in training WS models, which is not likely to successfully support all the required commonsense reasoning skills and knowledge.1

https://weibo.com/1402400261/KCVZ142uP

3. [LG] Temporal Abstraction in Reinforcement Learning with the Successor Representation

M C. Machado, A Barreto, D Precup

[University of Alberta & DeepMind]

强化学习基于后继表示的时间抽象。多层次时间抽象推理是智力的关键属性之一。在强化学习中，这通常是通过被称为选项(Options)的时间上扩展的行动过程来模拟的。选项允许智能体在环境中进行预测并在不同抽象层次上操作。然而，基于选项框架的方法往往以事先知道一套合理的选项为前提。当情况并非如此时，对于应该考虑哪些选项并没有明确的答案。本文论证了后继表示，根据状态访问的模式来编码状态，可以被看作是发现和使用时间抽象的自然基底。本文从大局出发，对最近的结果进行了分析，显示了后继表示如何被用来发现促进时间延伸的探索或规划的选项。把这些结果看作是选项发现的一般框架的实例，其中智能体表示被用来识别有用选项，然后这些选项被用来进一步改进其表示。这就形成了一个良性的、永无止境的循环，在这个循环中，表示和选项都在不断地相互完善。除了选项发现本身，还讨论了后继表示法是如何在不需要额外学习的情况下将一组选项增加到一个组合大的对应物中。这一点是通过对之前学习的选项进行组合来实现的。实证评估集中在为时间延伸的探索所发现的选项以及使用后继表示法来组合它们。实验结果揭示了选项定义中涉及的重要设计决策，并证明了基于后继表示法的不同方法的协同作用，如特征选项和选项键盘。

Reasoning at multiple levels of temporal abstraction is one of the key attributes of intelligence. In reinforcement learning, this is often modeled through temporally extended courses of actions called options. Options allow agents to make predictions and to operate at different levels of abstraction within an environment. Nevertheless, approaches based on the options framework often start with the assumption that a reasonable set of options is known beforehand. When this is not the case, there are no definitive answers for which options one should consider. In this paper, we argue that the successor representation, which encodes states based on the pattern of state visitation that follows them, can be seen as a natural substrate for the discovery and use of temporal abstractions. To support our claim, we take a big picture view of recent results, showing how the successor representation can be used to discover options that facilitate either temporally-extended exploration or planning. We cast these results as instantiations of a general framework for option discovery in which the agent’s representation is used to identify useful options, which are then used to further improve its representation. This results in a virtuous, never-ending, cycle in which both the representation and the options are constantly being refined based on each other. Beyond option discovery itself, we also discuss how the successor representation allows us to augment a set of options into a combinatorially large counterpart without additional learning. This is achieved through the combination of previously learned options. Our empirical evaluation focuses on options discovered for temporally-extended exploration and on the use of the successor representation to combine them. The results of our experiments shed light on important design decisions involved in the definition of options and demonstrate the synergy of different methods based on the successor representation, such as eigenoptions and the option keyboard.

https://weibo.com/1402400261/KCW3olPgZ

4. [CV] Ego4D: Around the World in 3,000 Hours of Egocentric Video

K Grauman, A Westbury...

[Facebook AI Research (FAIR) & University of Minnesota & University of Catania & Facebook Reality Labs & Georgia Tech...]

Ego4D：全球范围3000小时自我中心视频数据集。本文介绍Ego4D，一个大规模自我中心视频数据集和基准套件，包括3025小时的日常生活活动视频，涵盖数百种场景(家庭、户外、工作场所、休闲等)，由来自全球74个地方和9个不同国家的855名独特的相机佩戴者拍摄。收集方法旨在维护严格的隐私和道德标准，并在相关情况下征得参与者的同意和健全的去身份化程序。Ego4D极大地扩展了研究界可公开获得的各种自我中心视频片段的数量。部分视频伴随着音频、环境的三维网格、眼球注视、立体声和/或同一事件中多个自我中心摄像机的同步视频。此外，提出了一系列新的基准挑战，这些挑战围绕着理解过去(查询偶发记忆)、现在(分析手-物操作、视听对话和社会互动)和未来(预测活动)的第一人称视觉体验。通过公开分享这个大规模的注释数据集和基准套件，旨在推动第一人称感知的前沿发展。

We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It offers 3,025 hours of dailylife activity video spanning hundreds of scenarios (household, outdoor, workplace, leisure, etc.) captured by 855 unique camera wearers from 74 worldwide locations and 9 different countries. The approach to collection is designed to uphold rigorous privacy and ethics standards with consenting participants and robust de-identification procedures where relevant. Ego4D dramatically expands the volume of diverse egocentric video footage publicly available to the research community. Portions of the video are accompanied by audio, 3D meshes of the environment, eye gaze, stereo, and/or synchronized videos from multiple egocentric cameras at the same event. Furthermore, we present a host of new benchmark challenges centered around understanding the first-person visual experience in the past (querying an episodic memory), present (analyzing hand-object manipulation, audio-visual conversation, and social interactions), and future (forecasting activities). By publicly sharing this massive annotated dataset and benchmark suite, we aim to push the frontier of first-person perception.

5. [LG] Symbolic Brittleness in Sequence Models: on Systematic Generalization in Symbolic Mathematics

S Welleck, P West, J Cao, Y Choi

[University of Washington]

序列模型的符号脆性：符号数学的系统性泛化。用最大似然估计法训练的神经序列模型在许多任务中取得了突破性进展，其中成功是由训练和测试性能之间的差距来定义的。然而，他们实现更强形式泛化的能力仍不清楚。本文考虑了符号数学积分的问题，需要系统性地泛化到测试集之外。本文开发了一种评估泛化的方法，利用问题域的结构和对验证者的访问。尽管在该领域中，序列-序列模型的分布内性能很好，但通过精心构建的人工测试套件和以可控方式自动发现大量故障集合的遗传算法，证明了在实现鲁棒性、组合性和分布外泛化方面的挑战。该调查强调了用主要的建模和学习方法进行很好的泛化的困难，以及在测试集之外对泛化的不同方面进行评估的重要性。

Neural sequence models trained with maximum likelihood estimation have led to breakthroughs in many tasks, where success is defined by the gap between training and test performance. However, their ability to achieve stronger forms of generalization remains unclear. We consider the problem of symbolic mathematical integration, as it requires generalizing systematically beyond the test set. We develop a methodology for evaluating generalization that takes advantage of the problem domain’s structure and access to a verifier. Despite promising in-distribution performance of sequenceto-sequence models in this domain, we demonstrate challenges in achieving robustness, compositionality, and outof-distribution generalization, through both carefully constructed manual test suites and a genetic algorithm that automatically finds large collections of failures in a controllable manner. Our investigation highlights the difficulty of generalizing well with the predominant modeling and learning approach, and the importance of evaluating beyond the test set, across different aspects of generalization.

https://weibo.com/1402400261/KCWiBdtpi