爱可可AI前沿推介(4.3)

LG - 机器学习 CV - 计算机视觉 CL - 计算与语言 AS - 音频与语音 RO - 机器人

转自爱可可爱生活

摘要：修改像素空间以自适应预训练模型、强化学习神经网络表示特性研究、基于槽位Transformer的时序抽象无监督学习、可交换数据不变因果结构识别、基于多任务强化学习的示范-自举自主练习、GAN中涌现的自监督稠密对应、面向动态人体高保真渲染的单相机运动依赖外观学习、对角状态空间与结构化状态空间一样有效、面向目标追踪的统一Transformer追踪器

1、[CV] Visual Prompting: Modifying Pixel Space to Adapt Pre-trained Models

H Bahng, A Jahanian, S Sankaranarayanan, P Isola

[MIT CSAIL]

视觉提示：修改像素空间以自适应预训练模型。提示最近成为一种流行的范式，用于使语言模型自适应下游任务。这种方法不是对模型参数进行调整，也不添加特定任务的头，而是通过在模型输入中添加文本提示，来引导模型执行一个新任务。本文探讨了这样一个问题：能否用像素来创建提示？预训练好的视觉模型，能否仅通过在其输入中添加像素，来自适应一项新任务？提出了视觉提示，学习一个特定任务的图像扰动，从而使一个冻结的预训练模型在这个扰动的提示下执行一项新任务。只需改变几个像素，就足以使模型自适应新的任务和数据集，其表现与目前事实上的轻量级自适应方法——线性探测相当。视觉提示的惊人有效性为如何自适应视觉预训练模型提供了一种新视角，开辟了仅通过输入来自适应模型的可能性，与模型参数或输出不同，输入通常是在最终用户的控制之下。

Prompting has recently become a popular paradigm for adapting language models to downstream tasks. Rather than ne-tuning model parameters or adding task-speci c heads, this approach steers a model to perform a new task simply by adding a text prompt to the model’s inputs. In this paper, we explore the question: can we create prompts with pixels instead? In other words, can pre-trained vision models be adapted to a new task solely by adding pixels to their inputs? We introduce visual prompting, which learns a task-speci c image perturbation such that a frozen pre-trained model prompted with this perturbation performs a new task. We discover that changing only a few pixels is enough to adapt models to new tasks and datasets, and performs on par with linear probing, the current de facto approach to lightweight adaptation. The surprising e ectiveness of visual prompting provides a new perspective on how to adapt pre-trained models in vision, and opens up the possibility of adapting models solely through their inputs, which, unlike model parameters or outputs, are typically under an end-user’s control. Code is available at https: //hjbahng.github.io/visual_prompting/.

https://arxiv.org/abs/2203.17274

2、[LG] Investigating the Properties of Neural Network Representations in Reinforcement Learning

H Wang, E Miahi, M White, M C. Machado, Z Abbas, R Kumaraswamy, V Liu, A White

[University of Alberta & DeepMind]

强化学习神经网络表示特性研究。本文研究了由深度强化学习系统学习的表示的特性。早期的强化学习表示学习工作，大多集中在设计固定基础架构，以实现被认为是理想的特性，如正交性和稀疏性。相比之下，深度强化学习方法背后的想法是，智能体设计者不应该对表示属性进行编码，而是由数据流来决定——在适当的训练方案下，会出现好的表示。本文将这两种观点结合起来，实证研究了支持强化学习中迁移的表示的属性。这一分析使我们能就非线性强化学习方法的端到端训练中的辅助任务的影响提供新的假说。引入并测量了超过2.5万个智能体任务设置的六种表示特性。考虑在一个基于像素的导航环境中用卷积网络的DQN智能体。开发了一种方法来更好地理解为什么有些表示在迁移方面效果更好，通过一种系统方法来改变任务的相似性，并测量和关联表示特性与迁移性能。

In this paper we investigate the properties of representations learned by deep reinforcement learning systems. Much of the earlier work in representation learning for reinforcement learning focused on designing fixed-basis architectures to achieve properties thought to be desirable, such as orthogonality and sparsity. In contrast, the idea behind deep reinforcement learning methods is that the agent designer should not encode representational properties, but rather that the data stream should determine the properties of the representation—good representations emerge under appropriate training schemes. In this paper we bring these two perspectives together, empirically investigating the properties of representations that support transfer in reinforcement learning. This analysis allows us to provide novel hypotheses regarding impact of auxiliary tasks in end-to-end training of non-linear reinforcement learning methods. We introduce and measure six representational properties over more than 25 thousand agent-task settings. We consider DQN agents with convolutional networks in a pixel-based navigation environment. We develop a method to better understand why some representations work better for transfer, through a systematic approach varying task similarity and measuring and correlating representation properties with transfer performance.

https://arxiv.org/abs/2203.15955

3、[LG] Unsupervised Learning of Temporal Abstractions with Slot-based Transformers

A Gopalakrishnan, K Irie, J Schmidhuber, S v Steenkiste

[IDSIA & Google Research]

基于槽位Transformer的时序抽象无监督学习。在复杂的强化学习问题中，发现可重复使用的子例程可以简化决策和规划。之前的方法建议通过观察从执行策略中收集的状态-行动轨迹，以纯粹的无监督方式学习这种时序抽象。然而，目前的一个限制是，他们完全以顺序方式处理每个轨迹，无法根据新的传入信息来修改早期关于子例程边界点的决策。本文提出SloTTAr，一种完全并行的方法，将序列处理Transformer与槽位注意力模块和自适应计算结合起来，以无监督方式学习这种子例程的数量。展示了SloTTAr在边界点发现方面如何能够超越强大的基线，即使是对于包含可变数量子例程的序列，同时在现有基准上的训练速度加快了高达7倍。

The discovery of reusable sub-routines simplifies decision-making and planning in complex reinforcement learning problems. Previous approaches propose to learn such temporal abstractions in a purely unsupervised fashion through observing state-action trajectories gathered from executing a policy. However, a current limitation is that they process each trajectory in an entirely sequential manner, which prevents them from revising earlier decisions about sub-routine boundary points in light of new incoming information. In this work we propose SloTTAr, a fully parallel approach that integrates sequence processing Transformers with a Slot Attention module and adaptive computation for learning about the number of such sub-routines in an unsupervised fashion. We demonstrate how SloTTAr is capable of outperforming strong baselines in terms of boundary point discovery, even for sequences containing variable amounts of sub-routines, while being up to 7x faster to train on existing benchmarks.

https://arxiv.org/abs/2203.13573

4、[LG] Causal de Finetti: On the Identification of Invariant Causal Structure in Exchangeable Data

S Guo, V Tóth, B Schölkopf, F Huszár

[University of Cambridge & Max Planck Institute for Intelligent Systems]

Causal de Finetti：可交换数据不变因果结构识别。不变因果结构的学习，通常依赖于条件独立性测试和独立同分布数据的假设。最近的工作探讨了使用来自不同环境的数据来推断不变因果结构。这些方法基于独立因果机制(ICM)原则，假定成因机制独立于给定成因机制的结果。尽管在机器学习和因果推理中得到了广泛的应用，但对独立机制的含义缺乏一个统计上的形式。本文提出Causal de Finetti，提供了ICM原则的第一个统计形式。

Learning invariant causal structure often relies on conditional independence testing and assumption of independent and identically distributed data. Recent work has explored inferring invariant causal structure using data coming from different environments. These approaches are based on independent causal mechanism (ICM) principle which postulates that the cause mechanism is independent of the effect given cause mechanism. Despite its wide application in machine learning and causal inference, there lacks a statistical formalization of what independent mechanism means. Here we present Causal de Finetti which offers a first statistical formalization of ICM principle.

https://arxiv.org/abs/2203.15756

5、[RO] Demonstration-Bootstrapped Autonomous Practicing via Multi-Task Reinforcement Learning

A Gupta, C Lynch, B Kinman, G Peake, S Levine, K Hausman

[UC Berkeley & Google Inc]

基于多任务强化学习的示范-自举自主练习。强化学习系统有可能在非结构化环境中，利用自主收集的数据，实现持续改进。然而，在实践中，这些系统需要大量仪器或人工干预，在现实世界中学习。本文提出一种强化学习系统，利用多任务强化学习和之前的数据来实现连续的自主练习，最大限度减少所需的重置次数，同时能学习时序上的扩展行为。展示了适当提供的先验数据如何帮助引导低层次多任务策略和对这些任务逐一排序的策略，以实现最小重设的学习。这种机制使机器人系统在训练时能在最小的人工干预下进行练习，在测试时能够解决长跨度任务。在模拟和现实世界中展示了所提出系统在一项具有挑战性的厨房操作任务上的功效，证明了自主练习的能力，以解决时间上的扩展问题。

Reinforcement learning systems have the potential to enable continuous improvement in unstructured environments, leveraging data collected autonomously. However, in practice these systems require significant amounts of instrumentation or human intervention to learn in the real world. In this work, we propose a system for reinforcement learning that leverages multi-task reinforcement learning bootstrapped with prior data to enable continuous autonomous practicing, minimizing the number of resets needed while being able to learn temporally extended behaviors. We show how appropriately provided prior data can help bootstrap both low-level multi-task policies and strategies for sequencing these tasks one after another to enable learning with minimal resets. This mechanism enables our robotic system to practice with minimal human intervention at training time, while being able to solve long horizon tasks at test time. We show the efficacy of the proposed system on a challenging kitchen manipulation task both in simulation and the real world, demonstrating the ability to practice autonomously in order to solve temporally extended problems.

https://arxiv.org/abs/2203.15755