爱可可AI前沿推介(2.9)

LG - 机器学习 CV - 计算机视觉 CL - 计算与语言 AS - 音频与语音 RO - 机器人

转自爱可可爱生活

1、[LG] On Neural Differential Equations

P Kidger

[University of Oxford]

神经微分方程研究(综述)。动态系统和深度学习的结合，已经成为一个备受关注的话题。神经微分方程(NDE)表明，神经网络和微分方程是一个硬币的两面。传统的参数化微分方程是一个特例。许多流行的神经网络架构，如残差网络和递归网络，是离散化的。神经微分方程适合处理生成问题、动态系统和时间序列(特别是在物理学、金融学等)，因此对现代机器学习和传统数学建模都有兴趣。神经微分方程提供了大容量的函数近似、对模型空间的强大先验、处理不规则数据的能力、高记忆效率，以及两方面的大量可用理论。本篇博士论文对该领域进行了深入调研。主题包括：神经常微分方程(如用于物理系统的混合神经/机械建模)；神经控制微分方程(如用于不规则时间序列函数的学习)；以及神经随机微分方程(如产生能代表复杂随机动态的生成模型，或从复杂高维分布中采样)。进一步的主题包括：无损检测的数值方法(如可逆微分方程求解器，基于微分方程的反向传播，布朗重建)；动态系统的符号回归(如通过正则化进化)；以及深度隐模型(如深度平衡模型，可微优化)。

The conjoining of dynamical systems and deep learning has become a topic of great interest. In particular, neural differential equations (NDEs) demonstrate that neural networks and differential equation are two sides of the same coin. Traditional parameterised differential equations are a special case. Many popular neural network architectures, such as residual networks and recurrent networks, are discretisations.

NDEs are suitable for tackling generative problems, dynamical systems, and time series (particularly in physics, finance, ...) and are thus of interest to both modern machine learning and traditional mathematical modelling. NDEs offer high-capacity function approximation, strong priors on model space, the ability to handle irregular data, memory efficiency, and a wealth of available theory on both sides.

This doctoral thesis provides an in-depth survey of the field.

Topics include: neural ordinary differential equations (e.g. for hybrid neural/mechanistic modelling of physical systems); neural controlled differential equations (e.g. for learning functions of irregular time series); and neural stochastic differential equations (e.g. to produce generative models capable of representing complex stochastic dynamics, or sampling from complex high-dimensional distributions).

Further topics include: numerical methods for NDEs (e.g. reversible differential equations solvers, backpropagation through differential equations, Brownian reconstruction); symbolic regression for dynamical systems (e.g. via regularised evolution); and deep implicit models (e.g. deep equilibrium models, differentiable optimisation).

We anticipate this thesis will be of interest to anyone interested in the marriage of deep learning with dynamical systems, and hope it will provide a useful reference for the current state of the art.

2、[CV] Corrupted Image Modeling for Self-Supervised Visual Pre-Training

Y Fang, L Dong, H Bao, X Wang, F Wei

[Huazhong University of Science and Technology & Microsoft Research]

面向自监督视觉预训练的破坏图像建模。本文提出面向自监督视觉预训练的破坏图像建模(CIM)。CIM用一个具有小型可训练BEiT的辅助生成器来破坏输入图像，而不是用人工掩码token，其中一些图块是随机选择的，并用从BEiT输出分布中抽样的合理的替代内容进行替换。对于该被破坏图像，一个增强器网络学习恢复所有原始图像像素，或预测每个视觉token是否被生成器样本所取代。生成器和增强器同时被训练并协同更新。经过预训练后，增强器可以作为一个大容量的视觉编码器用于下游任务。CIM是一个通用且灵活的视觉预训练框架，适用于各种网络架构。CIM首次证明了ViT和CNN都可以使用一个统一的、非Siamese的框架学习丰富的视觉表示。该方法在视觉基准上取得了引人注目的结果，如ImageNet分类和ADE20K语义分割，在ImageNet-1K图像分类中，300个epoch CIM预训练的vanilla ViT-Base/16和ResNet-50分别获得83.3和80.6的Top-1微调精度。

We introduce Corrupted Image Modeling (CIM) for self-supervised visual pre-training. CIM uses an auxiliary generator with a small trainable BEiT to corrupt the input image instead of using artificial mask tokens, where some patches are randomly selected and replaced with plausible alternatives sampled from the BEiT output distribution. Given this corrupted image, an enhancer network learns to either recover all the original image pixels, or predict whether each visual token is replaced by a generator sample or not. The generator and the enhancer are simultaneously trained and synergistically updated. After pre-training, the enhancer can be used as a high-capacity visual encoder for downstream tasks. CIM is a general and flexible visual pre-training framework that is suitable for various network architectures. For the first time, CIM demonstrates that both ViT and CNN can learn rich visual representations using a unified, non-Siamese framework. Experimental results show that our approach achieves compelling results in vision benchmarks, such as ImageNet classification and ADE20K semantic segmentation. For example, 300-epoch CIM pretrained vanilla ViT-Base/16 and ResNet-50 obtain 83.3 and 80.6 Top-1 fine-tuning accuracy on ImageNet-1K image classification respectively.

3、[LG] Anticorrelated Noise Injection for Improved Generalization

A Orvieto, H Kersting, F Proske, F Bach, A Lucchi

[ETH Zurich & INRIA & University of Oslo & University of Basel]

通过注入反相关噪声提高模型泛化能力。向梯度下降法(GD)注入人工噪声常用于提高机器学习模型的性能。通常，不相关的噪声被用于这种扰动梯度下降(PGD)方法中。然而，这是否是最佳的，或者其他类型的噪声是否能提供更好的泛化性能，目前还不得而知。本文放大了连续扰动梯度下降步骤中扰动的相关性问题。考虑了各种目标函数，发现具有反相关扰动的梯度下降(即"Anti-PGD")的泛化效果明显优于梯度下降和标准(不相关)扰动梯度下降。为支持这些实验结果，还推导出一个理论分析，证明Anti-PGD会移动到更广阔范围的最小值，收敛于山谷的平坦部分，而梯度下降和吧标准扰动梯度下降则仍然停留在次优区域(山谷的尖锐部分)，甚至偏离。在真实数据的现实实验中(如CIFAR 10)，同样观察到Anti-PGD收敛到平坦的最小值，而且泛化性很好(与GD和标准PGD相比)。反相关噪声和泛化之间的这种新联系为利用噪声来训练机器学习模型开辟了新途径。

Injecting artificial noise into gradient descent (GD) is commonly employed to improve the performance of machine learning models. Usually, uncorrelated noise is used in such perturbed gradient descent (PGD) methods. It is, however, not known if this is optimal or whether other types of noise could provide better generalization performance. In this paper, we zoom in on the problem of correlating the perturbations of consecutive PGD steps. We consider a variety of objective functions for which we find that GD with anticorrelated perturbations (“Anti-PGD”) generalizes significantly better than GD and standard (uncorrelated) PGD. To support these experimental findings, we also derive a theoretical analysis that demonstrates that Anti-PGD moves to wider minima, while GD and PGD remain stuck in suboptimal regions or even diverge. This new connection between anticorrelated noise and generalization opens the field to novel ways to exploit noise for training machine learning models.

4、[LG] Message Passing Neural PDE Solvers

J Brandstetter, D Worrall, M Welling

[University of Amsterdam & Qualcomm AI Research]

消息传递神经偏微分方程求解器。偏微分方程(PDE)的数值求解是很困难的，迄今已经历了一个世纪的研究。最近，人们一直在推动建立神经-数值混合求解器，这与现代的全端到端学习系统的趋势相吻合。到目前为止，大多数工作只能推广到通用求解器所要面对的多种属性的一个子集，包括：分辨率、拓扑、几何、边界条件、域离散正则、维度等。本文提出一种满足这些属性的求解器，其中所有组件都是基于神经信息传递的，用反向优化的神经函数近似器取代了计算图中所有启发式设计的组件。神经信息传递求解器在表述上包含一些经典方法，如有限差分、有限体和WENO方案。为鼓励自回归模型训练的稳定性，本文提出一种基于零稳定性原则的方法，将稳定性作为一个域自适应问题。在各种类流体流动问题上验证了所提出方法，证明了在一维和二维的不同域拓扑、离散化等方面的快速、稳定和精度性能。所提出模型在低分辨率系统中的速度和精度方面优于最先进的数值求解器。MP-PDE求解器不仅可以用来预测PDE的解，还可以被重新解释为优化积分网格和PDE的参数。该模型的一个局限性是需要高质量的真实数据来进行训练。

https://github.com/tum-pbs/PhiFlow

The numerical solution of partial differential equations (PDEs) is difficult, having led to a century of research so far. Recently, there have been pushes to build neural–numerical hybrid solvers, which piggy-backs the modern trend towards fully end-to-end learned systems. Most works so far can only generalize over a subset of properties to which a generic solver would be faced, including: resolution, topology, geometry, boundary conditions, domain discretization regularity, dimensionality, etc. In this work, we build a solver, satisfying these properties, where all the components are based on neural message passing, replacing all heuristically designed components in the computation graph with backpropoptimized neural function approximators. We show that neural message passing solvers representationally contain some classical methods, such as finite differences, finite volumes, and WENO schemes. In order to encourage stability in training autoregressive models, we put forward a method that is based on the principle of zero-stability, posing stability as a domain adaptation problem. We validate our method on various fluid-like flow problems, demonstrating fast, stable, and accurate performance across different domain topologies, discretization, etc. in 1D and 2D. Our model outperforms state-of-the-art numerical solvers in the low resolution regime in terms of speed and accuracy.

5、[CV] Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

P Wang, A Yang, R Men, J Lin, S Bai, Z Li, J Ma, C Zhou, J Zhou, H Yang

[DAMO Academy]

统一架构、任务和模态的简单序列到序列学习框架。本文研究统一的多模态预训练范式，以打破复杂任务/特定模态定制的框架。提出OFA，一个统一的多模态预训练模型，将模态(即跨模态、视觉、语言)和任务(如图像生成、图像描述、图像分类、文本生成等)统一到一个基于编码器-解码器架构的简单序列-序列学习框架，用任务指令进行预训练和微调，不引入额外的特定任务层进行微调。实验结果表明，OFA在一系列多模态任务上达到了新的水平。通过广泛分析证明OFA在单模态任务中达到了与单模态预训练模型(如BERT、MAE、MoCo v3、SimCLR v2等)相当的性能，包括NLU、NLG和图像分类，并且可以有效地迁移到未见的任务和领域。

In this work, we pursue a unified paradigm for multimodal pretraining to break the scaffolds of complex task/modality-specific customization. We propose OFA, a unified multimodal pretrained model that unifies modalities (i.e., cross-modality, vision, language) and tasks (e.g., image generation, visual grounding, image captioning, image classification, text generation, etc.) to a simple sequence-tosequence learning framework based on the encoder-decoder architecture. OFA performs pretraining and finetuning with task instructions and introduces no extra task-specific layers for finetuning. Experimental results show that OFA achieves new state-of-the-arts on a series of multimodal tasks, including image captioning (COCO test CIDEr: 149.6), text-to-image generation (COCO test FID: 10.5), VQA (test-std acc.: 80.02), SNLI-VE (test acc.: 90.20), and referring expression comprehension (RefCOCO / RefCOCO+ / RefCOCOg test acc.: 92.93 / 90.10 / 85.20). Through extensive analyses, we demonstrate that OFA reaches comparable performance with uni-modal pretrained models (e.g., BERT, MAE, MoCo v3, SimCLR v2, etc.) in uni-modal tasks, including NLU, NLG, and image classification, and it effectively transfers to unseen tasks and domains. Code shall be released soon at https://github.com/OFA-Sys/OFA.