爱可可AI前沿推介(12.24)

LG - 机器学习 CV - 计算机视觉 CL - 计算与语言 AS - 音频与语音 RO - 机器人

1、[LG] Weisfeiler and Leman go Machine Learning: The Story so far

C Morris, Y Lipman, H Maron, B Rieck, N M. Kriege, M Grohe, M Fey, K Borgwardt

[McGill University & Weizmann Institute of Science & NVIDIA Research & University of Vienna...]

基于Weisfeiler-Leman算法的机器学习综述。近年来，基于Weisfeiler-Leman算法——一种著名的图同构问题启发式算法——的算法和神经架构，作为强大的工具出现在图和关系数据的机器学习领域。本文对该算法在机器学习中的应用进行了全面的概述，重点是有监督场景。讨论了理论背景，展示了如何将其用于有监督的图和节点表示学习，讨论了最新的扩展，并概述了该算法与(置换)等变神经架构的联系。概述了当前的应用和未来的方向，以刺激进一步的研究。

In recent years, algorithms and neural architectures based on the Weisfeiler-Leman algorithm, a well-known heuristic for the graph isomorphism problem, emerged as a powerful tool for machine learning with graphs and relational data. Here, we give a comprehensive overview of the algorithm's use in a machine learning setting, focusing on the supervised regime. We discuss the theoretical background, show how to use it for supervised graph- and node representation learning, discuss recent extensions, and outline the algorithm's connection to (permutation-)equivariant neural architectures. Moreover, we give an overview of current applications and future directions to stimulate further research.

2、[LG] RvS: What is Essential for Offline RL via Supervised Learning?

S Emmons, B Eysenbach, I Kostrikov, S Levine

[UC Berkeley & CMU]

RvS：什么是通过监督学习进行离线强化学习的关键？最近的工作表明，只用有监督学习，不用时间差分(TD)学习，对离线强化学习来说是非常有效的。这在什么时候是成立，哪些算法组件是必要的？通过广泛的实验，本文将离线强化学习的监督学习归结为其基本要素。在考虑的每一个环境套件中，简单地用两层前馈MLP最大化可能性，与基于时间差分学习或用Transformer进行序列建模的更复杂的方法的最先进结果相比，具有竞争力。谨慎选择模型容量(例如，通过正则化或架构)和选择哪些信息作为条件(例如，目标或奖励)对性能至关重要。这些见解可作为实践者通过监督学习进行强化学习(RvS学习)的现场指南。本文还探究了现有RvS方法的局限性，这些方法在随机数据上相对较弱，并提出了一些开放性问题。

Recent work has shown that supervised learning alone, without temporal difference (TD) learning, can be remarkably effective for offline RL. When does this hold true, and which algorithmic components are necessary? Through extensive experiments, we boil supervised learning for offline RL down to its essential elements. In every environment suite we consider, simply maximizing likelihood with a two-layer feedforward MLP is competitive with state-of-the-art results of substantially more complex methods based on TD learning or sequence modeling with Transformers. Carefully choosing model capacity (e.g., via regularization or architecture) and choosing which information to condition on (e.g., goals or rewards) are critical for performance. These insights serve as a field guide for practitioners doing Reinforcement Learning via Supervised Learning (which we coin RvS learning). They also probe the limits of existing RvS methods, which are comparatively weak on random data, and suggest a number of open problems.

3、[LG] Autonomous Reinforcement Learning: Formalism and Benchmarking

A Sharma, K Xu, N Sardana, A Gupta, K Hausman, S Levine, C Finn

[Stanford University & UC Berkeley & MIT & Google Brain]

自主强化学习：形式化与基准。强化学习(RL)为通过试错式学习提供了一种自然的框架，它之所以吸引人，是因为它的简单性和有效性，以及它与人类和动物通过经验获得技能的方式相类似。然而，现实世界中的具身学习，如人类和动物所进行的学习，是在一个持续的、非偶发性的世界中进行的，而强化学习中常见的基准任务是偶发性的，在试验之间环境重新设置，为智能体提供多次尝试的机会。当试图将为偶发模拟环境开发的强化学习算法在现实世界平台(如机器人)上运行时，这种差异带来了重大挑战。本文旨在通过建立一个自主强化学习(ARL)的框架来解决这一差异：强化学习中，智能体不仅通过自己的经验进行学习，而且还要应对缺乏人工监督的情况，在试验之间进行重置。本文围绕该框架引入了一个模拟基准EARL，包含一组多样的、具有挑战性的模拟任务，反映了在只需极少依赖外在干预的情况下，学习所遇到的障碍。随着干预的最小化，标准的偶发性强化学习方法和现有的方法都显示出不足，强调了开发新的强化学习算法的必要性，并更加关注自主性。

Reinforcement learning (RL) provides a naturalistic framing for learning through trial and error, which is appealing both because of its simplicity and effectiveness and because of its resemblance to how humans and animals acquire skills through experience. However, real-world embodied learning, such as that performed by humans and animals, is situated in a continual, non-episodic world, whereas common benchmark tasks in RL are episodic, with the environment resetting between trials to provide the agent with multiple attempts. This discrepancy presents a major challenge when attempting to take RL algorithms developed for episodic simulated environments and run them on real-world platforms, such as robots. In this paper, we aim to address this discrepancy by laying out a framework for Autonomous Reinforcement Learning (ARL): reinforcement learning where the agent not only learns through its own experience, but also contends with lack of human supervision to reset between trials. We introduce a simulated benchmark EARL1 around this framework, containing a set of diverse and challenging simulated tasks reflective of the hurdles introduced to learning when only a minimal reliance on extrinsic intervention can be assumed. We show that standard approaches to episodic RL and existing approaches struggle as interventions are minimized, underscoring the need for developing new algorithms for reinforcement learning with a greater focus on autonomy.

4、[CV] NICE-SLAM: Neural Implicit Scalable Encoding for SLAM

Z Zhu, S Peng, V Larsson, W Xu, H Bao, Z Cui, M R. Oswald, M Pollefeys

[ETH Zurich & Zhejiang University]

NICE-SLAM：SLAM神经隐性可扩展编码。最近，神经隐性表示在各个领域显示出令人鼓舞的结果，包括在同步定位和映射(SLAM)方面取得的可喜进展。然而，现有方法产生了过平滑的场景重建，并且难以扩展到大场景。这些限制主要是由于其简单的全连接网络结构，没有将局部信息纳入观察范围。本文提出NICE-SLAM，一种密集SLAM系统，通过引入分层场景表示，纳入多层次本地信息。用预训练好的几何先验来优化该表示，可以在大的室内场景中进行详细的重建。与最近的神经隐性SLAM系统相比，所提出方法更具有可扩展性、效率和鲁棒性。在五个具有挑战性的数据集上进行的实验表明，NICESLAM在映射和跟踪质量方面都具有竞争力。

Neural implicit representations have recently shown encouraging results in various domains, including promising progress in simultaneous localization and mapping (SLAM). Nevertheless, existing methods produce oversmoothed scene reconstructions and have difficulty scaling up to large scenes. These limitations are mainly due to their simple fully-connected network architecture that does not incorporate local information in the observations. In this paper, we present NICE-SLAM, a dense SLAM system that incorporates multi-level local information by introducing a hierarchical scene representation. Optimizing this representation with pre-trained geometric priors enables detailed reconstruction on large indoor scenes. Compared to recent neural implicit SLAM systems, our approach is more scalable, efficient, and robust. Experiments on five challenging datasets demonstrate competitive results of NICESLAM in both mapping and tracking quality. Project page: https://pengsongyou.github.io/nice-slam.

5、[CV] Cost Aggregation Is All You Need for Few-Shot Segmentation

S Hong, S Cho, J Nam, S Kim

[Korea University & Yonsei University]

少样本分割代价聚合就够了。本文提出一种新的代价聚合网络——Transformer体聚合(VAT)，通过使用卷积和Transformer来解决少样本分割任务，以有效地处理查询和支持之间的高维相关图。提出了由体嵌入模块组成的编码器，不仅可以将相关图转化为更可操作的大小，而且还可以注入一些卷积归纳的偏差和用于代价聚合的体Transformer模块。所提出的编码器具有一个金字塔结构，让较粗层次的聚合来指导较细层次，并强制学习互补的匹配得分。将输出结果与用于指导分割过程的投影特征图一起送入亲和力感知解码器。结合这些组件，通过实验证明了所提出方法的有效性，为所有的标准基准设定了一个新的最先进的少样本分割任务。所提出的方法甚至在语义对应任务的标准基准中也达到了最先进的性能，尽管它不是专门为这个任务设计的。

We introduce a novel cost aggregation network, dubbed Volumetric Aggregation with Transformers (VAT), to tackle the few-shot segmentation task by using both convolutions and transformers to efficiently handle high dimensional correlation maps between query and support. In specific, we propose our encoder consisting of volume embedding module to not only transform the correlation maps into more tractable size but also inject some convolutional inductive bias and volumetric transformer module for the cost aggregation. Our encoder has a pyramidal structure to let the coarser level aggregation to guide the finer level and enforce to learn complementary matching scores. We then feed the output into our affinity-aware decoder along with the projected feature maps for guiding the segmentation process. Combining these components, we conduct experiments to demonstrate the effectiveness of the proposed method, and our method sets a new state-of-the-art for all the standard benchmarks in few-shot segmentation task. Furthermore, we find that the proposed method attains state-of-the-art performance even for the standard benchmarks in semantic correspondence task although not specifically designed for this task. We also provide an extensive ablation study to validate our architectural choices. The trained weights and codes are available at: https: //seokju-cho.github.io/VAT/.