爱可可AI前沿推介(6.6)

LG - 机器学习 CV - 计算机视觉 CL - 计算与语言 AS - 音频与语音 RO - 机器人

转自爱可可爱生活

摘要：CLRS算法推理基准、不确定性估计Transformer、自然语言生成评价解构、来自无约束真实世界任意图像集的形状和材质、深入排列敏感图神经网络、优化视觉Transformer相关图有助于提高鲁棒性、神经网络推理学习、基础模型有助于实现完美的保密吗、图(过)平滑理论分析

1、[LG] The CLRS Algorithmic Reasoning Benchmark

P Veličković, A P Badia, D Budden, R Pascanu, A Banino, M Dashevskiy, R Hadsell, C Blundell

[DeepMind]

CLRS算法推理基准。学习算法表示是机器学习的一个新兴领域，试图将神经网络的概念与经典算法联系起来。一些重要的工作已经研究了神经网络是否能像算法一样有效地推理，通常是通过学习执行算法。然而，该领域的共同趋势是产生有针对性的各类算法数据来评估特定的假设，使得结果很难在发表的不同工作之间迁移，并增加了进入的障碍。为了巩固进展并努力实现统一的评估，本文提出CLRS算法推理基准，涵盖了《算法导论》教科书中的经典算法。该基准跨越了各种算法推理程序，包括排序、搜索、动态规划、图形算法、字符串算法和几何算法。广泛的实验证明了几个流行的算法推理基线在这些任务上的表现，突出了与几个公开挑战的联系。

Learning representations of algorithms is an emerging area of machine learning, seeking to bridge concepts from neural networks with classical algorithms. Several important works have investigated whether neural networks can effectively reason like algorithms, typically by learning to execute them. The common trend in the area, however, is to generate targeted kinds of algorithmic data to evaluate specific hypotheses, making results hard to transfer across publications, and increasing the barrier of entry. To consolidate progress and work towards unified evaluation, we propose the CLRS Algorithmic Reasoning Benchmark, covering classical algorithms from the Introduction to Algorithms textbook. Our benchmark spans a variety of algorithmic reasoning procedures, including sorting, searching, dynamic programming, graph algorithms, string algorithms and geometric algorithms. We perform extensive experiments to demonstrate how several popular algorithmic reasoning baselines perform on these tasks, and consequently, highlight links to several open challenges. Our library is readily available at https://github.com/deepmind/clrs.

https://arxiv.org/abs/2205.15659

2、[CL] BayesFormer: Transformer with Uncertainty Estimation

K A Sankararaman, S Wang, H Fang

[Meta AI]

BayesFormer: 不确定性估计Transformer。由于在各种NLP和图像处理任务中的主导性能，Transformer已经变得无处不在。然而，它缺乏对如何为Transformer架构产生有数学依据的不确定性估计的理解。配备这种不确定性估计的模型通常可以提高预测性能，使网络鲁棒，避免过拟合，并在主动学习中用作获取函数。本文提出BayesFormer，一种由贝叶斯理论设计的带有dropout的Transformer模型。提出了一种新的理论框架，将基于近似变分推理的dropous扩展到基于Transformer的架构中。通过广泛的实验，在四个范式中验证了所提出的架构，并显示出全面的改进：语言建模和分类、长序列理解、机器翻译和主动学习的获取函数。

Transformer has become ubiquitous due to its dominant performance in various NLP and image processing tasks. However, it lacks understanding of how to generate mathematically grounded uncertainty estimates for transformer architectures. Models equipped with such uncertainty estimates can typically improve predictive performance, make networks robust, avoid over-fitting and used as acquisition function in active learning. In this paper, we introduce BayesFormer, a Transformer model with dropouts designed by Bayesian theory. We proposed a new theoretical framework to extend the approximate variational inference-based dropout to Transformer-based architectures. Through extensive experiments, we validate the proposed architecture in four paradigms and show improvements across the board: language modeling and classification, long-sequence understanding, machine translation and acquisition function for active learning.

https://arxiv.org/abs/2206.00826

3、[CL] Deconstructing NLG Evaluation: Evaluation Practices, Assumptions, and Their Implications

K Zhou, S L Blodgett, A Trischler, H D III, K Suleman, A Olteanu

[Stanford University & Microsoft Research]

自然语言生成评价解构：评价的实践、假设及影响。在文本中表达类似事物的方式有很多，这使得评价自然语言生成(NLG)系统变得很困难。更加困难的是，需要根据不同的部署环境来评价不同的质量标准。虽然对NLG的评估已经有了很好的规划，但从业者的目标、假设和约束——这些都是关于评估什么、何时评价和如何评价的决定——往往是部分或隐含地陈述，或者根本没有陈述。结合对NLG从业人员(N=18)的半结构化访谈研究和对更广泛的从业人员样本(N=61)的调查研究，全面探讨了NLG评价的目标、社区实践、假设和约束，研究了其影响以及如何体现道德考虑。

There are many ways to express similar things in text, which makes evaluating natural language generation (NLG) systems difficult. Compounding this difficulty is the need to assess varying quality criteria depending on the deployment setting. While the landscape of NLG evaluation has been well-mapped, practitioners’ goals, assumptions, and constraints— which inform decisions about what, when, and how to evaluate—are often partially or implicitly stated, or not stated at all. Combining a formative semi-structured interview study of NLG practitioners (N=18) with a survey study of a broader sample of practitioners (N=61), we surface goals, community practices, assumptions, and constraints that shape NLG evaluations, examining their implications and how they embody ethical considerations.

https://arxiv.org/abs/2205.06828

4、[CV] SAMURAI: Shape And Material from Unconstrained Real-world Arbitrary Image collections

M Boss, A Engelhardt, A Kar, Y Li, D Sun, J T. Barron, H P. A. Lensch, V Jampani

[University of Tübingen & Google Research]

SAMURAI：来自无约束真实世界任意图像集的形状和材质。在完全未知的拍摄条件下对物体进行逆向渲染是计算机视觉和图形学的一个基本挑战。像NeRF这样的神经方法已经在新视图合成上取得了逼真的结果，但它们需要已知的相机位置。在未知相机位置的情况下解决这个问题是非常具有挑战性的，因为它需要对形状、辐射度和位置进行联合优化。当输入的图像是在真实场景拍摄的，具有不同的背景和光照时，这个问题就更严重了。标准的姿态估计技术在真实的这种图像集合中很难成功，因为不同图像之间的估计对应关系非常少。此外，NeRF不能在任意光照下为一个场景重新打光，因为它在辐射度(反射率和光照度的乘积)上操作。本文提出一种联合优化框架来估计形状、BRDF和每个图像的相机姿态和光照。适用于一个物体的真实场景在线图像集，并为AR/VR等几个使用案例产生可重新打光的3D资产。该方法是第一个以最小的用户交互来解决这个严重不受限制的任务。

Inverse rendering of an object under entirely unknown capture conditions is a fundamental challenge in computer vision and graphics. Neural approaches such as NeRF have achieved photorealistic results on novel view synthesis, but they require known camera poses. Solving this problem with unknown camera poses is highly challenging as it requires joint optimization over shape, radiance, and pose. This problem is exacerbated when the input images are captured in the wild with varying backgrounds and illuminations. Standard pose estimation techniques fail in such image collections in the wild due to very few estimated correspondences across images. Furthermore, NeRF cannot relight a scene under any illumination, as it operates on radiance (the product of reflectance and illumination). We propose a joint optimization framework to estimate the shape, BRDF, and per-image camera pose and illumination. Our method works on inthe-wild online image collections of an object and produces relightable 3D assets for several use-cases such as AR/VR. To our knowledge, our method is the first to tackle this severely unconstrained task with minimal user interaction. Project page: https://markboss.me/publication/2022-samurai/

https://arxiv.org/abs/2205.15768

5、[LG] Going Deeper into Permutation-Sensitive Graph Neural Networks

Z Huang, Y Wang, C Li, H He

[Chinese Academy of Sciences & Tsinghua University & Microsoft Research Asia]

深入排列敏感图神经网络。对邻接矩阵排列的不变性，即图的同构性，是图神经网络(GNN)的一个首要要求。传统上，这一先决条件可以通过聚合信息时对节点排列的不变操作来满足。然而，这种不变的方式可能忽略了相邻节点之间的关系，从而阻碍了GNN的表现力。本文通过排列组设计了一个有效的对排列敏感的聚合机制，捕捉相邻节点之间的成对关联。实验表明，该方法严格来说比2D Weisfeiler-Lehman(2-WL)图的同构性测试更强大，也不比3-WL测试更差。此外，该方法达到了线性抽样的复杂度。在多个合成和真实世界数据集上的综合实验证明了模型的优越性。

The invariance to permutations of the adjacency matrix, i.e., graph isomorphism, is an overarching requirement for Graph Neural Networks (GNNs). Conventionally, this prerequisite can be satisfied by the invariant operations over node permutations when aggregating messages. However, such an invariant manner may ignore the relationships among neighboring nodes, thereby hindering the expressivity of GNNs. In this work, we devise an efficient permutation-sensitive aggregation mechanism via permutation groups, capturing pairwise correlations between neighboring nodes. We prove that our approach is strictly more powerful than the 2-dimensional Weisfeiler-Lehman (2-WL) graph isomorphism test and not less powerful than the 3-WL test. Moreover, we prove that our approach achieves the linear sampling complexity. Comprehensive experiments on multiple synthetic and real-world datasets demonstrate the superiority of our model.

https://arxiv.org/abs/2205.14368