爱可可AI前沿推介(2.8)

LG - 机器学习 CV - 计算机视觉 CL - 计算与语言 AS - 音频与语音 RO - 机器人

转自爱可可爱生活

1、[LG] Deep End-to-end Causal Inference

T Geffner, J Antoran, A Foster, W Gong, C Ma, E Kiciman, A Sharma, A Lamb, M Kukla, N Pawlowski, M Allamanis, C Zhang

[Microsoft Research & University of Massachusetts Amherst & University of Cambridge]

深度端到端因果推理。因果推理对跨领域数据驱动决策至关重要，例如商业契约、医疗或政策制定。然而，关于因果发现和推理的研究是分别发展的，而这两个领域的结合并不简单。本文提出深度端到端因果推理(DECI)，一种基于单一流程的方法，纳入了观察数据，可同时进行因果发现和推断，包括条件平均处理效应(CATE)的估计。本文提供了一个理论保证，即DECI在温和的假设条件下可以恢复真实值因果图。此外，该方法可以处理具有缺失值的异质性、真实世界的混合型数据，允许连续和离散的处理决策。该方法的设计原理还可以超越DECI，提供一个通用的端到端因果推理(ECI)方案，使得不同的ECI框架可以使用现有的方法来构建。结果表明，在合成数据集和其他因果机器学习基准数据集的一千多个实验中，与相关基线相比，DECI在因果发现和(C)ATE估计方面都有卓越的表现。

Causal inference is essential for data-driven decision making across domains such as business engagement, medical treatment or policy making. However, research on causal discovery and inference has evolved separately, and the combination of the two domains is not trivial. In this work, we develop Deep End-to-end Causal Inference (DECI), a single flow-based method that takes in observational data and can perform both causal discovery and inference, including conditional average treatment effect (CATE) estimation. We provide a theoretical guarantee that DECI can recover the ground truth causal graph under mild assumptions. In addition, our method can handle heterogeneous, real-world, mixed-type data with missing values, allowing for both continuous and discrete treatment decisions. Moreover, the design principle of our method can generalize beyond DECI, providing a general End-to-end Causal Inference (ECI) recipe, which enables different ECI frameworks to be built using existing methods. Our results show the superior performance of DECI when compared to relevant baselines for both causal discovery and (C)ATE estimation in over a thousand experiments on both synthetic datasets and other causal machine learning benchmark datasets.

2、[CV] Learning with Neighbor Consistency for Noisy Labels

A Iscen, J Valmadre, A Arnab, C Schmid

[Google Research & University of Adelaide]

噪声标签近邻一致性学习，一种有效的标签噪声深度学习策略。深度学习的最新进展是依靠大型的标记数据集来训练高容量模型。然而，用具有时间效益和成本效益的方式收集大型数据集往往会导致标签噪音。本文提出一种从噪声标签中学习的方法，利用特征空间训练样本间的相似性，鼓励每个样本的预测与其最近邻相似。与使用多个模型或不同阶段的训练算法相比，该方法采用简单的、额外的正则化项的形式，只需要在随机梯度下降中优化目标中增加一个额外的损失，可解释为经典的直推标签传播算法的归纳版本。在评估合成噪声(CIFAR-10、CIFAR100)和现实噪声(mini-WebVision、Clothing1M、miniImageNet-Red)的数据集上彻底评估了所提出的方法，在所有这些数据集上取得了具有竞争力的或最先进的准确性。

Recent advances in deep learning have relied on large, labelled datasets to train high-capacity models. However, collecting large datasets in a timeand cost-efficient manner often results in label noise. We present a method for learning from noisy labels that leverages similarities between training examples in feature space, encouraging the prediction of each example to be similar to its nearest neighbours. Compared to training algorithms that use multiple models or distinct stages, our approach takes the form of a simple, additional regularization term. It can be interpreted as an inductive version of the classical, transductive label propagation algorithm. We thoroughly evaluate our method on datasets evaluating both synthetic (CIFAR-10, CIFAR100) and realistic (mini-WebVision, Clothing1M, miniImageNet-Red) noise, and achieve competitive or state-ofthe-art accuracies across all of them.

3、[LG] Graph-Coupled Oscillator Networks

T. K Rusch, B P. Chamberlain, J Rowbottom, S Mishra, M M. Bronstein

[ETH Zurich & Twitter Inc. & University of Oxford]

图耦合振荡器网络。本文提出图耦合振荡器网络(GraphCON)，一种图深度学习的新框架，基于二阶常微分方程(ODE)系统的离散化，为非线性受迫和阻尼振荡器的网络建模，通过底层图邻接结构进行耦合。该框架的灵活性允许任意基本GNN层(如卷积层或注意力层)作为耦合函数，通过所提出的ODE的动态建立起一个多层的深度神经网络。本文将GNN中经常遇到的过平滑问题与底层ODE的稳定状态联系起来，并表明零迪里切特能量的稳定状态对所提出的ODE来说是不稳定的。所提出的框架减轻了过平滑的问题，在各种基于图的学习任务上提供了与最先进的技术相竞争的性能。

We propose Graph-Coupled Oscillator Networks (GraphCON), a novel framework for deep learning on graphs. It is based on discretizations of a second-order system of ordinary differential equations (ODEs), which model a network of nonlinear forced and damped oscillators, coupled via the adjacency structure of the underlying graph. The flexibility of our framework permits any basic GNN layer (e.g. convolutional or attentional) as the coupling function, from which a multi-layer deep neural network is built up via the dynamics of the proposed ODEs. We relate the oversmoothing problem, commonly encountered in GNNs, to the stability of steady states of the underlying ODE and show that zero-Dirichlet energy steady states are not stable for our proposed ODEs. This demonstrates that the proposed framework mitigates the oversmoothing problem. Finally, we show that our approach offers competitive performance with respect to the state-of-the-art on a variety of graph-based learning tasks.

4、[CV] Neural Dual Contouring

Z Chen, A Tagliasacchi, T Funkhouser, H Zhang

[Simon Fraser University & Google Research]

神经双轮廓法。本文提出神经双轮廓法(NDC)，一种基于双轮廓法(DC)的新的数据驱动网格重建方法。与传统DC一样，在每个网格单元产生一个顶点，在每个网格边缘交叉点产生一个四边形，这是一种自然而有效的结构，可以再现尖锐的特征。然而，NDC不是用直接依赖难以获得的表面梯度的手工定制的函数来计算顶点位置和边缘交叉点，而是用神经网络来进行预测。因此，NDC可以被训练成从有符号或无符号的距离场、二进制体素网格或点云(有或没有法线)中产生网格；而且可以在输入代表片状或局部表面的情况下产生开放表面。在对五个主要数据集的实验中，NDC在对其中一个数据集进行训练后，对其他数据集的泛化性很好。此外，与之前的学习方法(如神经移动立方、卷积占用网络)和传统方法(如泊松法)相比，NDC提供了更好的表面重建精度、特征保持、输出复杂度、三角形质量和推理时间。

We introduce neural dual contouring (NDC), a new data-driven approach to mesh reconstruction based on dual contouring (DC). Like traditional DC, it produces exactly one vertex per grid cell and one quad for each grid edge intersection, a natural and efficient structure for reproducing sharp features. However, rather than computing vertex locations and edge crossings with hand-crafted functions that depend directly on difficult-to-obtain surface gradients, NDC uses a neural network to predict them. As a result, NDC can be trained to produce meshes from signed or unsigned distance fields, binary voxel grids, or point clouds (with or without normals); and it can produce open surfaces in cases where the input represents a sheet or partial surface. During experiments with five prominent datasets, we find that NDC, when trained on one of the datasets, generalizes well to the others. Furthermore, NDC provides better surface reconstruction accuracy, feature preservation, output complexity, triangle quality, and inference time in comparison to previous learned (e.g., neural marching cubes, convolutional occupancy networks) and traditional (e.g., Poisson) methods.

5、[CV] Webly Supervised Concept Expansion for General Purpose Vision Models

A Kamath, C Clark, T Gupta, E Kolve, D Hoiem, A Kembhavi

[Allen Institute for AI & University of Illinois at Urbana-Champaign]

面向通用视觉模型的web网络监督概念扩展。通用视觉(GPV)系统是旨在无需改变架构解决广泛视觉任务的模型。如今，GPV主要从大型全监督数据集中学习技能和概念。通过获取数据来学习每个技能的每个概念，将GPV扩展到数以万计的概念，很快就变得难以承受。本文提出一种有效而廉价的替代方案：从全监督数据集中学习技能，从网络图像搜索结果中学习概念，利用GPV的一个关键特征——跨技能迁移视觉知识的能力。用一个由百万以上图像组成的数据集，横跨10k以上的视觉概念，在3个基准上展示了两个现有的GPV(GPV-1和VL-T5)的web网络监督概念扩展，5个基于COCO的数据集(80个主要概念)，一个新策划的基于OpenImages和VisualGenome库的5个数据集系列(∼500个概念)和web网络衍生数据集(10k+概念)。还提出一种新架构GPV-2，支持各种任务——从分类和定位等视觉任务到QA和描述等视觉+语言任务，再到人与物交互识别等更小众的任务。GPV-2从web网络数据中获益匪浅，在这些基准中优于GPV-1和VL-T5，并且在零样本动作和属性识别方面表现良好。

General purpose vision (GPV) systems [25] are models that are designed to solve a wide array of visual tasks without requiring architectural changes. Today, GPVs primarily learn both skills and concepts from large fully supervised datasets. Scaling GPVs to tens of thousands of concepts by acquiring data to learn each concept for every skill quickly becomes prohibitive. This work presents an effective and inexpensive alternative: learn skills from fully supervised datasets, learn concepts from web image search results, and leverage a key characteristic of GPVs – the ability to transfer visual knowledge across skills. We use a dataset of 1M+ images spanning 10k+ visual concepts to demonstrate webly-supervised concept expansion for two existing GPVs (GPV-1 [25] and VL-T5 [14]) on 3 benchmarks 5 COCO based datasets (80 primary concepts), a newly curated series of 5 datasets based on the OpenImages and VisualGenome repositories (∼500 concepts) and the Web-derived dataset (10k+ concepts). We also propose a new architecture, GPV-2 that supports a variety of tasks – from vision tasks like classification and localization to vision+language tasks like QA and captioning to more niche ones like human-object interaction recognition. GPV-2 benefits hugely from web data, outperforms GPV-1 and VL-T5 across these benchmarks, and does well in a 0-shot setting at action and attribute recognition.

另外几篇值得关注的论文：

[LG] Measuring Disparate Outcomes of Content Recommendation Algorithms with Distributional Inequality Metrics

用分配不平等度量衡量内容推荐算法的不同结果

T Lazovich, L Belli, A Gonzales, A Bower, U Tantipongpipat, K Lum, F Huszar, R Chowdhury

[Twitter & University of Cambridge]

[LG] CoST: Contrastive Learning of Disentangled Seasonal-Trend Representations for Time Series Forecasting

CoST：时间序列预测中解缠季节性趋势表示对比学习

G Woo, C Liu, D Sahoo, A Kumar, S Hoi

[Salesforce Research Asia & Singapore Management University]

[LG] AntBO: Towards Real-World Automated Antibody Design with Combinatorial Bayesian Optimisation

AntBO：基于组合贝叶斯优化的现实世界自动化抗体设计

A Khan, A I. Cowen-Rivers, D Deik, A Grosnit, K Dreczkowski, P A. Robert, V Greiff, R Tutunov, D Bou-Ammar, J Wang, H Bou-Ammar

[University of Edinburgh & Huawei Noahs Ark Lab & University of Oslo...]

[LG] Choosing an Appropriate Platform and Workflow for Processing Camera Trap Data using Artificial Intelligence

为使用AI处理相机陷阱数据选择合适的平台和工作流程

J Vélez, P J. Castiblanco-Camacho, M A. Tabak, C Chalmers, P Fergus, J Fieberg

[University of Minnesota & Universidad de los Andes & ULC & Liverpool John Moores University]

内容中包含的图片若涉及版权问题，请及时与我们联系删除