爱可可AI前沿推介(8.29)

LG - 机器学习 CV - 计算机视觉 CL - 计算与语言 AS - 音频与语音 RO - 机器人

转自爱可可爱生活

摘要：过参数化双层优化中的隐性偏差研究、基于视觉的操纵者也需要手中视角、面向超高分辨率图像分割的下采样学习、基于偏微分方程的CNN凝炼、离线强化学习未标记数据的有效利用、街道级目标的目标检测算法比较、多指手的可微丰富接触抓握合成、自然语言理解中大型语言模型的捷径学习综述、防止协同多模态脑肿瘤分割拆分学习中的数据泄漏

1、[LG] On Implicit Bias in Overparameterized Bilevel Optimization

P Vicol, J Lorraine, F Pedregosa, D Duvenaud…

[University of Toronto & Google Research]

过参数化双层优化中的隐性偏差研究。机器学习中许多问题都涉及到双层优化(BLO)，包括超参数优化、元学习和数据集蒸馏。双层问题涉及内部和外部参数，每个参数都为其自身的目标而优化。通常情况下，两个层次中至少有一个是不明确的，并且有多种方法在等价的优化中进行选择。受最近对单层优化中优化算法引起的隐性偏差的研究的启发，本文研究了不同的基于梯度的算法对内部和外部参数进行联合优化的隐性偏差。本文划定了两种标准的BLO方法——冷启动和热启动BLO，并表明收敛的解决方案或长期行为在很大程度上取决于这些和其他算法的选择，如超梯度近似。来自暖启动BLO的解决方案可以编码关于外部目标的惊人的信息量，即使外部优化变量是低维的。隐性偏见在双层优化的研究中应该发挥核心作用，就像它在单层神经网络优化的研究中所取得的那样。

Many problems in machine learning involve bilevel optimization (BLO), including hyperparameter optimization, meta-learning, and dataset distillation. Bilevel problems involve inner and outer parameters, each optimized for its own objective. Often, at least one of the two levels is underspecified and there are multiple ways to choose among equivalent optima. Inspired by recent studies of the implicit bias induced by optimization algorithms in single-level optimization, we investigate the implicit bias of different gradient-based algorithms for jointly optimizing the inner and outer parameters. We delineate two standard BLO methods—cold-start and warm-start BLO—and show that the converged solution or long-run behavior depends to a large degree on these and other algorithmic choices, such as the hypergradient approximation. We also show that the solutions from warm-start BLO can encode a surprising amount of information about the outer objective, even when the outer optimization variables are low-dimensional. We believe that implicit bias deserves as central a role in the study of bilevel optimization as it has attained in the study of single-level neural net optimization.

https://proceedings.mlr.press/v162/vicol22a.html

2、[RO] Vision-Based Manipulators Need to Also See from Their Hands

K Hsu, M J Kim, R Rafailov, J Wu, C Finn

[Stanford University]

基于视觉的操纵者也需要手中视角。本文研究了视觉视角的选择是如何影响从原始传感器观察到的物理操纵的学习和泛化的。与更常用的全局第三人称视角相比，以手为中心(手中只眼)的视角降低了可观察性，但它始终能提高训练效率和分布外的泛化。这些好处在各种学习算法、实验设置和分布漂移中都是有效的，并且适用于模拟的和真实的机器人设备。然而，这只是在以手为中心的可观察性足够的情况下；否则，包括第三人称视角对学习是必要的，但也会损害分布外的泛化。为缓解这种情况，本文建议通过一个变种信息瓶颈来规范第三人称信息流。在六个具有代表性的操纵任务中，以手为中心的可观察性各不相同，这些任务改编自Meta-World基准，结果是一个最先进的强化学习智能体从两个角度操作，在每个任务中都改善了其分布外的泛化性。虽然一些实践者早已将相机放在机器人的手中，但本文工作系统地分析了这样做的好处，并为改善基于视觉的端到端学习机器人操纵提供了简单而广泛适用的见解。

We study how the choice of visual perspective affects learning and generalization in the context of physical manipulation from raw sensor observations. Compared with the more commonly used global third-person perspective, a hand-centric (eye-in-hand) perspective affords reduced observability, but we find that it consistently improves training efficiency and out-of-distribution generalization. These benefits hold across a variety of learning algorithms, experimental settings, and distribution shifts, and for both simulated and real robot apparatuses. However, this is only the case when hand-centric observability is sufficient; otherwise, including a third-person perspective is necessary for learning, but also harms out-of-distribution generalization. To mitigate this, we propose to regularize the third-person information stream via a variational information bottleneck. On six representative manipulation tasks with varying hand-centric observability adapted from the Meta-World benchmark, this results in a state-of-the-art reinforcement learning agent operating from both perspectives improving its out-of-distribution generalization on every task. While some practitioners have long put cameras in the hands of robots, our work systematically analyzes the benefits of doing so and provides simple and broadly applicable insights for improving end-to-end learned vision-based robotic manipulation.

https://arxiv.org/abs/2203.12677

3、[CV] Learning to Downsample for Segmentation of Ultra-High Resolution Images

C Jin, R Tanno, T Mertzanidou, E Panagiotaki, D C. Alexander

[University College London & Microsoft Research]

面向超高分辨率图像分割的下采样学习。许多计算机视觉系统需要基于深度学习的低成本分割算法，因为输入图像的巨大尺寸或计算预算有限。常见的解决方案是对输入图像进行统一降样以满足内存限制，并假设所有像素都具有同等的信息量。本文证明了这种假设会损害分割性能，因为分割的难度在空间上有所不同。本文通过引入一个可学习的降采样模块来解决这个问题，该模块可以和给定的分割模型一起以端到端的方式进行优化。将训练这种下采样模块的问题表述为对输入图像的采样密度分布进行优化，并给出其低分辨率视图。为了防止退化的解决方案(例如对背景等微不足道的区域过度采样)，本文提出一种正则化方式，鼓励采样位置集中在目标边界周围。降采样模块学会了在困难的位置更密集地采样，从而提高了分割性能。在高分辨率街景、航空和医疗图像基准上进行的实验表明，与统一下采样和最近的两种先进的下采样技术相比，在效率和精度的权衡方面有了很大的改进。

Many computer vision systems require low-cost segmentation algorithms based on deep learning, either because of the enormous size of input images or limited computational budget. Common solutions uniformly downsample the input images to meet memory constraints, assuming all pixels are equally informative. In this work, we demonstrate that this assumption can harm the segmentation performance because the segmentation difficulty varies spatially (see Figure 1 “Uniform”). We combat this problem by introducing a learnable downsampling module, which can be optimised together with the given segmentation model in an end-to-end fashion. We formulate the problem of training such downsampling module as optimisation of sampling density distributions over the input images given their low-resolution views. To defend against degenerate solutions (e.g. over-sampling trivial regions like the backgrounds), we propose a regularisation term that encourages the sampling locations to concentrate around the object boundaries. We find the downsampling module learns to sample more densely at difficult locations, thereby improving the segmentation performance (see Figure 1 "Ours"). Our experiments on benchmarks of high-resolution street view, aerial and medical images demonstrate substantial improvements in terms of efficiency-and-accuracy trade-off compared to both uniform downsampling and two recent advanced downsampling techniques.

https://arxiv.org/abs/2109.11071

4、[CV] Condensing CNNs With Partial Differential Equations

A Kag, V Saligrama

[Boston University]

基于偏微分方程的CNN凝炼。卷积神经网络(CNN)依靠架构的深度来获得复杂的特征。这导致低资源物联网设备的计算成本很高的模型。卷积算子是局部的，在感受野中受到限制，随着深度的增加而增加。本文探索了偏微分方程(PDE)，它提供了一种全局的感受野，而没有维护大核卷积滤波器的额外开销。本文提出一种新的特征层，即全局层，对特征图实施PDE约束，从而产生丰富的特征。这些约束是通过在网络中嵌入迭代方案来解决的。所提出的层可以被嵌入任意深度CNN中，将其转化为较浅的网络。因此，产生了紧凑和计算高效的架构，达到与原始网络类似的性能。实验评估表明，具有全局层的架构所需的计算和存储预算减少了2-5倍，而在性能上没有任何明显的损失。

Convolutional neural networks (CNNs) rely on the depth of the architecture to obtain complex features. It results in computationally expensive models for low-resource IoT devices. Convolutional operators are local and restricted in the receptive field, which increases with depth. We explore partial differential equations (PDEs) that offer a global receptive field without the added overhead of maintaining large kernel convolutional filters. We propose a new feature layer, called the Global layer, that enforces PDE constraints on the feature maps, resulting in rich features. These constraints are solved by embedding iterative schemes in the network. The proposed layer can be embedded in any deep CNN to transform it into a shallower network. Thus, resulting in compact and computationally efficient architectures achieving similar performance as the original network. Our experimental evaluation demonstrates that architectures with global layers require 2− 5× less computational and storage budget without any significant loss in performance.

https://openaccess.thecvf.com/content/CVPR2022/html/Kag_Condensing_CNNs_With_Partial_Differential_Equations_CVPR_2022_paper.html

5、[LG] How to Leverage Unlabeled Data in Offline Reinforcement Learning

T Yu, A Kumar, Y Chebotar, K Hausman, C Finn, S Levine

[Stanford University & Google Research & UC Berkeley]

离线强化学习未标记数据的有效利用。离线强化学习(RL)可以从静态数据集中学习控制策略，但与标准强化学习方法一样，需要对每个过程进行奖励标注。在许多情况下，用奖励标记大型数据集可能成本很高，特别是如果这些奖励必须由人工标注提供，而收集各种未标注的数据可能相对便宜。如何才能在离线RL中最好地利用这些未标注的数据？一个自然的解决方案是，从已标注的数据中学习一个奖励函数，并用它来标注未标注的数据。本文发现，也许令人惊讶的是，一种更简单的方法，即简单地将零奖励应用于未标记的数据，在理论上和实践上都能导致有效的数据共享，而根本无需学习任何奖励模型。虽然这种方法起初看起来很奇怪(而且不正确)，但本文提供了大量的理论和经验分析，说明它是如何在奖励偏差、样本复杂性和分布漂移之间进行权衡的，往往能带来良好的结果。本文描述了这一简单策略有效的条件，并进一步表明用简单的重加权方法来扩展它可以进一步减轻使用不正确的奖励标签所带来的偏差。实证评估证实了在模拟机器人运动、导航和操纵环境中的这些发现。

Offline reinforcement learning (RL) can learn control policies from static datasets but, like standard RL methods, it requires reward annotations for every transition. In many cases, labeling large datasets with rewards may be costly, especially if those rewards must be provided by human labelers, while collecting diverse unlabeled data might be comparatively inexpensive. How can we best leverage such unlabeled data in offline RL? One natural solution is to learn a reward function from the labeled data and use it to label the unlabeled data. In this paper, we find that, perhaps surprisingly, a much simpler method that simply applies zero rewards to unlabeled data leads to effective data sharing both in theory and in practice, without learning any reward model at all. While this approach might seem strange (and incorrect) at first, we provide extensive theoretical and empirical analysis that illustrates how it trades off reward bias, sample complexity and distributional shift, often leading to good results. We characterize conditions under which this simple strategy is effective, and further show that extending it with a simple reweighting approach can further alleviate the bias introduced by using incorrect reward labels. Our empirical evaluation confirms these findings in simulated robotic locomotion, navigation, and manipulation settings.

https://arxiv.org/abs/2202.01741