LG - 机器学习   CV - 计算机视觉   CL - 计算与语言   AS - 音频与语音 RO - 机器人




1、[LG] In-context Reinforcement Learning with Algorithm Distillation

M Laskin, L Wang, J Oh, E Parisotto, S Spencer, R Steigerwald, D Strouse, S Hansen, A Filos, E Brooks, M Gazeau, H Sahni, S Singh, V Mnih

We propose Algorithm Distillation (AD), a method for distilling reinforcement learning (RL) algorithms into neural networks by modeling their training histories with a causal sequence model. Algorithm Distillation treats learning to reinforcement learn as an across-episode sequential prediction problem. A dataset of learning histories is generated by a source RL algorithm, and then a causal transformer is trained by autoregressively predicting actions given their preceding learning histories as context. Unlike sequential policy prediction architectures that distill post-learning or expert sequences, AD is able to improve its policy entirely in-context without updating its network parameters. We demonstrate that AD can reinforcement learn in-context in a variety of environments with sparse rewards, combinatorial task structure, and pixel-based observations, and find that AD learns a more data-efficient RL algorithm than the one that generated the source data.



2、[AS] High Fidelity Neural Audio Compression

A Défossez, J Copet, G Synnaeve, Y Adi
[Meta AI]

We introduce a state-of-the-art real-time, high-fidelity, audio codec leveraging neural networks. It consists in a streaming encoder-decoder architecture with quantized latent space trained in an end-to-end fashion. We simplify and speed-up the training by using a single multiscale spectrogram adversary that efficiently reduces artifacts and produce high-quality samples. We introduce a novel loss balancer mechanism to stabilize training: the weight of a loss now defines the fraction of the overall gradient it should represent, thus decoupling the choice of this hyper-parameter from the typical scale of the loss. Finally, we study how lightweight Transformer models can be used to further compress the obtained representation by up to 40%, while staying faster than real time. We provide a detailed description of the key design choices of the proposed model including: training objective, architectural changes and a study of various perceptual loss functions. We present an extensive subjective evaluation (MUSHRA tests) together with an ablation study for a range of bandwidths and audio domains, including speech, noisy-reverberant speech, and music. Our approach is superior to the baselines methods across all evaluated settings, considering both 24 kHz monophonic and 48 kHz stereophonic audio. Code and models are available at this http URL.



3、[RO] PlanT: Explainable Planning Transformers via Object-Level Representations

K Renz, K Chitta, O Mercea, A. S Koepke, Z Akata, A Geiger
[University of Tübingen]

Planning an optimal route in a complex environment requires efficient reasoning about the surrounding scene. While human drivers prioritize important objects and ignore details not relevant to the decision, learning-based planners typically extract features from dense, high-dimensional grid representations containing all vehicle and road context information. In this paper, we propose PlanT, a novel approach for planning in the context of self-driving that uses a standard transformer architecture. PlanT is based on imitation learning with a compact object-level input representation. On the Longest6 benchmark for CARLA, PlanT outperforms all prior methods (matching the driving score of the expert) while being 5.3x faster than equivalent pixel-based planning baselines during inference. Combining PlanT with an off-the-shelf perception module provides a sensor-based driving system that is more than 10 points better in terms of driving score than the existing state of the art. Furthermore, we propose an evaluation protocol to quantify the ability of planners to identify relevant objects, providing insights regarding their decision-making. Our results indicate that PlanT can focus on the most relevant object in the scene, even when this object is geometrically distant.



4、[RO] DeXtreme: Transfer of Agile In-hand Manipulation from Simulation to Reality

A Handa, A Allshire, V Makoviychuk, A Petrenko, R Singh...
[NVIDIA & University of Toronto & University of Southern California]
DeXtreme:敏捷手上操作的模拟到现实迁移。最近的工作表明,深度强化学习(RL)算法有能力在模拟中学习复杂的机器人行为,包括在多指操作领域。然而,由于模拟和现实之间的差距,这样的模型要迁移到现实世界中是有难度的。本文提出一种新方案,以训练:a)能在拟人机器手上进行鲁棒的灵巧操纵的策略;b)适合提供被操纵物体状态的可靠实时信息的鲁棒姿态估计器。该策略经过训练,能适应模拟中的各种条件。基于视觉的策略在相同的调整方向任务上明显优于文献中的最佳视觉策略,并且与通过运动捕捉系统获得特殊状态信息的策略相比具有竞争力。本文工作再次证实了在不同类型的硬件和模拟器设置中,模拟到现实的灵巧操纵的可能性,所述案例中,用Allegro Hand和Isaac Gym的GPU模拟。此外,还为研究人员提供了用普通的、可负担得起的机器手和摄像机实现这种结果的可能性。

Recent work has demonstrated the ability of deep reinforcement learning (RL) algorithms to learn complex robotic behaviours in simulation, including in the domain of multi-fingered manipulation. However, such models can be challenging to transfer to the real world due to the gap between simulation and reality. In this paper, we present our techniques to train a) a policy that can perform robust dexterous manipulation on an anthropomorphic robot hand and b) a robust pose estimator suitable for providing reliable real-time information on the state of the object being manipulated. Our policies are trained to adapt to a wide range of conditions in simulation. Consequently, our vision-based policies significantly outperform the best vision policies in the literature on the same reorientation task and are competitive with policies that are given privileged state information via motion capture systems. Our work reaffirms the possibilities of sim-to-real transfer for dexterous manipulation in diverse kinds of hardware and simulator setups, and in our case, with the Allegro Hand and Isaac Gym GPU-based simulation. Furthermore, it opens up possibilities for researchers to achieve such results with commonly-available, affordable robot hands and cameras. Videos of the resulting policy and supplementary information, including experiments and demos, can be found at this https URL


5、[CV] On the Versatile Uses of Partial Distance Correlation in Deep Learning

X Zhen, Z Meng, R Chakraborty, V Singh
[University of Wisconsin-Madison & Butlr]

Comparing the functional behavior of neural network models, whether it is a single network over time or two (or more networks) during or post-training, is an essential step in understanding what they are learning (and what they are not), and for identifying strategies for regularization or efficiency improvements. Despite recent progress, e.g., comparing vision transformers to CNNs, systematic comparison of function, especially across different networks, remains difficult and is often carried out layer by layer. Approaches such as canonical correlation analysis (CCA) are applicable in principle, but have been sparingly used so far. In this paper, we revisit a (less widely known) from statistics, called distance correlation (and its partial variant), designed to evaluate correlation between feature spaces of different dimensions. We describe the steps necessary to carry out its deployment for large scale models -- this opens the door to a surprising array of applications ranging from conditioning one deep model w.r.t. another, learning disentangled representations as well as optimizing diverse models that would directly be more robust to adversarial attacks. Our experiments suggest a versatile regularizer (or constraint) with many advantages, which avoids some of the common difficulties one faces in such analyses. Code is at this https URL.





[LG] Learning General World Models in a Handful of Reward-Free Deployments

Y Xu, J Parker-Holder, A Pacchiano, P J. Ball, O Rybkin...
[UCL & University of Oxford & Microsoft Research & UPenn]

[CL] Finding Dataset Shortcuts with Grammar Induction

D Friedman, A Wettig, D Chen
[Princeton University]

[CL] Subspace-based Set Operations on a Pre-trained Word Embedding Space

Y Ishibashi, S Yokoi, K Sudoh, S Nakamura
[NAIST & Tohoku University]

[CV] NeRF-SLAM: Real-Time Dense Monocular SLAM with Neural Radiance Fields

NeRF-SLAM: 基于神经辐射场的实时致密单目SLAM
A Rosinol, J J. Leonard, L Carlone



