爱可可AI前沿推介(11.29)

LG - 机器学习 CV - 计算机视觉 CL - 计算与语言 AS - 音频与语音 RO - 机器人

转自爱可可爱生活

1、[CL] TAPEX: Table Pre-training via Learning a Neural SQL Executor

Q Liu, B Chen, J Guo, M Ziyadi, Z Lin, W Chen, J Lou

[Beihang University & Xi’an Jiaotong University & Microsoft Research Asia]

TAPEX：基于神经SQL执行器学习的表格预训练。最近在语言模型预训练方面的进展，通过利用大规模的非结构化文本数据取得了巨大的成功。然而，由于缺乏大规模高质量的表格数据，在结构化表格数据上应用预训练仍是一个挑战。本文提出TAPEX，可通过在合成语料库上学习神经SQL执行器来实现表的预训练，合成语料库是通过自动合成可执行的SQL查询和它们的执行结果获得的。TAPEX通过引导语言模型在多样化、大规模和高质量的合成语料库上模仿SQL执行器来解决数据稀缺的问题。在四个基准数据集上评估了TAPEX。实验结果表明，TAPEX在很大程度上超过了以前的表格预训练方法，并在所有的表格上取得了新的最先进的结果。这包括在弱监督的WikiSQL引证准确率提高到89.5%(+2.3%)，WikiTableQuestions引证准确率提高到57.5%(+4.8%)，SQA引证准确率提高到74.5%(+3.5%)，以及TabFact准确率提高到84.2%(+3.2%)。

Recent progress in language model pre-training has achieved a great success via leveraging large-scale unstructured textual data. However, it is still a challenge to apply pre-training on structured tabular data due to the absence of large-scale high-quality tabular data. In this paper, we propose TAPEX to show that table pretraining can be achieved by learning a neural SQL executor over a synthetic corpus, which is obtained by automatically synthesizing executable SQL queries and their execution outputs. TAPEX addresses the data scarcity challenge via guiding the language model to mimic a SQL executor on the diverse, large-scale and highquality synthetic corpus. We evaluate TAPEX on four benchmark datasets. Experimental results demonstrate that TAPEX outperforms previous table pre-training approaches by a large margin and achieves new state-of-the-art results on all of them. This includes improvements on the weakly-supervised WikiSQL denotation accuracy to 89.5% (+2.3%), the WikiTableQuestions denotation accuracy to 57.5% (+4.8%), the SQA denotation accuracy to 74.5% (+3.5%), and the TabFact accuracy to 84.2% (+3.2%). To our knowledge, this is the first work to exploit table pre-training via synthetic executable programs and to achieve new state-of-the-art results on various downstream tasks.

https://weibo.com/1402400261/L3FjrxYcl

2、[LG] Neural Deep Equilibrium Solvers

神经深度平衡求解器。深度平衡(DEQ)模型通过求解单一非线性层fθ的不动点放弃了传统的深度。这种结构使得层的内部结构(控制表示能力)与不动点的实际计算方式(影响推理时间效率)脱钩，后者通常是通过Broyden方法或Anderson加速等经典技术。本文展示了可以利用这种解耦，并通过一个定制的神经求解器大大增强这种固定点的计算。该求解器使用一个参数化的网络来猜测优化的初始值并进行迭代更新，这种方法推广了安德森加速的可学习形式，并能以无监督的方式进行端到端的训练。这样的解决方案特别适合隐模型的设置，因为这些模型的推理需要在不同的输入下重复求解同一非线性层的固定点，这是网络擅长的任务。实验表明，这些神经平衡求解器的训练速度很快(只比原始DEQ的训练时间多出0.9-1.1%)，需要的额外参数很少(原始模型大小的1-3%)，但在众多领域和任务中，DEQ网络推理的速度提高了2倍，而精度没有任何下降。

A deep equilibrium (DEQ) model abandons traditional depth by solving for the fixed point of a single nonlinear layer fθ. This structure enables decoupling the internal structure of the layer (which controls representational capacity) from how the fixed point is actually computed (which impacts inference-time efficiency), which is usually via classic techniques such as Broyden’s method or Anderson acceleration. In this paper, we show that one can exploit such decoupling and substantially enhance this fixed point computation using a custom neural solver. Specifically, our solver uses a parameterized network to both guess an initial value of the optimization and perform iterative updates, in a method that generalizes a learnable form of Anderson acceleration and can be trained end-to-end in an unsupervised manner. Such a solution is particularly well suited to the implicit model setting, because inference in these models requires repeatedly solving for a fixed point of the same nonlinear layer for different inputs, a task at which our network excels. Our experiments show that these neural equilibrium solvers are fast to train (only taking an extra 0.9-1.1% over the original DEQ’s training time), require few additional parameters (1-3% of the original model size), yet lead to a 2× speedup in DEQ network inference without any degradation in accuracy across numerous domains and tasks.

https://weibo.com/1402400261/L3Fn6qYkd

3、[CV] Extracting Triangular 3D Models, Materials, and Lighting From Images

J Munkberg, J Hasselgren, T Shen...

[NVIDIA]

从图像中提取三角3D模型、材质和照明。本文提出一种从多视角图像观测中联合优化拓扑结构、材质和照明的有效方法。与最近的多视图重建方法不同，这些方法通常会产生在神经网络中编码的纠缠3D表示，所提出方法输出带有空间变化的材质和环境照明的三角网格，可以不加修改地部署在任何传统的图形引擎中。利用最近在可微渲染方面的工作、基于坐标的网络来紧凑地表示体纹理，同时利用可微行进四面体来直接在表面网格上实现基于梯度的优化。引入了环境照明的分割和近似的可微表述，以有效地恢复所有频率的照明。实验表明，所提取的模型被用于高级场景编辑、材料分解和高质量的视图插值，所有这些都在基于三角形的渲染器(光栅器和路径跟踪器)中以交互式速率运行。

We present an efficient method for joint optimization of topology, materials and lighting from multi-view image observations. Unlike recent multi-view reconstruction approaches, which typically produce entangled 3D representations encoded in neural networks, we output triangle meshes with spatially-varying materials and environment lighting that can be deployed in any traditional graphics engine unmodified. We leverage recent work in differentiable rendering, coordinate-based networks to compactly represent volumetric texturing, alongside differentiable marching tetrahedrons to enable gradient-based optimization directly on the surface mesh. Finally, we introduce a differentiable formulation of the split sum approximation of environment lighting to efficiently recover all-frequency lighting. Experiments show our extracted models used in advanced scene editing, material decomposition, and high quality view interpolation, all running at interactive rates in triangle-based renderers (rasterizers and path tracers).

https://weibo.com/1402400261/L3Fu2afPM

4、[CV] Discrete Representations Strengthen Vision Transformer Robustness

C Mao, L Jiang, M Dehghani, C Vondrick, R Sukthankar, I Essa

[Google Research & Columbia University]

用离散表示加强视觉Transformer鲁棒性。视觉Transformer(ViT)是图像识别的最先进架构。虽然最近的研究表明，ViTs比卷积更鲁棒，但实验发现，ViT过度依赖局部特征，未能充分使用全局上下文(如形状和结构)。因此，ViT不能推广到分布之外的真实世界的数据。为了解决这一缺陷，对ViT的输入层进行了简单而有效的结构修改，增加了由矢量量化编码器产生的离散token。与标准的连续像素token不同，离散token在小的扰动下是不变的，而且单独包含的信息较少，这促进了ViT学习全局信息的不变性。实验结果表明，在四个架构变体上增加离散表示，在七个ImageNet鲁棒性基准中，ViT的鲁棒性增强了12%，同时保持了ImageNet的性能。

Vision Transformer (ViT) is emerging as the state-of-the-art architecture for image recognition. While recent studies suggest that ViTs are more robust than their convolutional counterparts, our experiments find that ViTs are overly reliant on local features (e.g., nuisances and texture) and fail to make adequate use of global context (e.g., shape and structure). As a result, ViTs fail to generalize to out-ofdistribution, real-world data. To address this deficiency, we present a simple and effective architecture modification to ViT’s input layer by adding discrete tokens produced by a vector-quantized encoder. Different from the standard continuous pixel tokens, discrete tokens are invariant under small perturbations and contain less information individually, which promote ViTs to learn global information that is invariant. Experimental results demonstrate that adding discrete representation on four architecture variants strengthens ViT robustness by up to 12% across seven ImageNet robustness benchmarks while maintaining the performance on ImageNet.

https://weibo.com/1402400261/L3FxIfc6T

5、[LG] ExoMiner: A Highly Accurate and Explainable Deep Learning Classifier to Mine Exoplanets

H Valizadegan, M Martinho, L S. Wilkens...

[Universities Space Research Association (USRA) & NASA Ames Research Center (NASA ARC) & Delft University of Technology & The SETI Institute & Cleveland State University]

ExoMiner：面向系外星系发现与验证的高精度可解释深度学习分类器。开普勒和TESS任务已经产生了超过10万个潜在的过境信号，必须对这些信号进行处理，以创建一个候选行星目录。在过去的几年里，人们对使用机器学习来分析这些数据以寻找新的系外行星的兴趣越来越大。与现有的机器学习工作不同，本文提出的深度学习分类器ExoMiner模仿了领域专家检查诊断测试的方式来审核过境信号。ExoMiner是一种高度准确、可解释和稳健的分类器，1）允许从MAST开普勒档案中验证301颗新的系外行星；2）足够通用，可以应用于各种任务，如正在进行的TESS任务。进行了广泛的实验研究，以验证ExoMiner在不同的分类和排名指标方面比现有的过境信号分类器更加可靠和准确。ExoMiner的模块化设计有利于其可解释性。引入了一个简单的可解释性框架，为专家提供反馈，说明为什么ExoMiner将过境信号归入一个特定的类别标签。

The Kepler and TESS missions have generated over 100,000 potential transit signals that must be processed in order to create a catalog of planet candidates. During the last few years, there has been a growing interest in using machine learning to analyze these data in search of new exoplanets. Different from the existing machine learning works, ExoMiner, the proposed deep learning classifier in this work, mimics how domain experts examine diagnostic tests to vet a transit signal. ExoMiner is a highly accurate, explainable, and robust classifier that 1) allows us to validate 301 new exoplanets from the MAST Kepler Archive and 2) is general enough to be applied across missions such as the on-going TESS mission. We perform an extensive experimental study to verify that ExoMiner is more reliable and accurate than the existing transit signal classifiers in terms of different classification and ranking metrics. For example, for a fixed precision value of 99%, ExoMiner retrieves 93.6% of all exoplanets in the test set (i.e., recall=0.936) while this rate is 76.3% for the best existing classifier. Furthermore, the modular design of ExoMiner favors its explainability. We introduce a simple explainability framework that provides experts with feedback on why ExoMiner classifies a transit signal into a specific class label (e.g., planet candidate or not planet candidate).

https://weibo.com/1402400261/L3FECroNh

另外几篇值得关注的论文：

[LG] Iterative Refinement Graph Neural Network for Antibody Sequence-Structure Co-design

面向抗体序列-结构协同设计的迭代细化图神经网络

W Jin, J Wohlwend, R Barzilay, T Jaakkola

[MIT]

https://weibo.com/1402400261/L3FHXyDi5

[LG] When Can We Learn General-Sum Markov Games with a Large Number of Players Sample-Efficiently?

何时可以样本高效地学习大量玩家的一般和马尔可夫博弈？

Z Song, S Mei, Y Bai

[Peking University & UC Berkeley & Salesforce Research]

https://weibo.com/1402400261/L3FLT6bAA

[LG] What Happens after SGD Reaches Zero Loss? --A Mathematical Framework

SGD达到零损失会如何？——一个数学框架

Z Li, T Wang, S Arora

[Princeton University & Yale University]

https://weibo.com/1402400261/L3FNMBRVS

[LG] Neural optimal feedback control with local learning rules

基于局部学习规则的神经最优反馈控制

J Friedrich, S Golkar, S Farashahi, A Genkin, A M. Sengupta, D B. Chklovskii

[Flatiron Institute]

https://weibo.com/1402400261/L3FPICgLJ

内容中包含的图片若涉及版权问题，请及时与我们联系删除