爱可可AI前沿推介(6.10)

LG - 机器学习 CV - 计算机视觉 CL - 计算与语言 AS - 音频与语音 RO - 机器人

转自爱可可爱生活

摘要：神经协方差SDE、基于分数扩散的通用语音增强、神经扩散过程、微调与元强化学习有效性比较研究、基于扩散模型的快速无监督脑成像异常检测与分割、无需额外计算成本的锐度感知训练、基于强化遗忘的可控文本生成、基于合成数据高效参数化的数据集浓缩、编-解码器模型的表格生成框架

1、[LG] The Neural Covariance SDE: Shaped Infinite Depth-and-Width Networks at Initialization

M B Li, M Nica, D M. Roy

[University of Toronto and Vector & InstituteUniversity of Guelph and Vector Institute]

神经协方差SDE：初始化时成型的无限深度和宽度网络。给定由倒数第二层定义的随机协方差矩阵，初始化时前馈神经网络的logit输出是有条件的高斯。本文研究该随机矩阵的分布。最近的工作表明，随着网络深度的增加，塑造激活函数对于这个协方差矩阵的非退化是必要的。然而，目前对这种塑造方法的无限宽式理解对于大深度来说并不令人满意：无限宽分析忽略了层与层之间的微观波动，但这些波动在多个层上累积。为克服这一缺陷，本文研究了成型无限深度和宽度极限的随机协方差矩阵。确定了到达非显著极限所需的激活函数的精确比例，并表明随机协方差矩阵受一个随机微分方程(SDE)的支配，称为神经协方差SDE。通过模拟表明，该SDE与有限网络的随机协方差矩阵的分布密切相关。此外，在激活函数的基础上，恢复了大的成型网络的爆炸和消失规范的if-and-only-if条件。

The logit outputs of a feedforward neural network at initialization are conditionally Gaussian, given a random covariance matrix defined by the penultimate layer. In this work, we study the distribution of this random matrix. Recent work has shown that shaping the activation function as network depth grows large is necessary for this covariance matrix to be non-degenerate. However, the current infinite-width-style understanding of this shaping method is unsatisfactory for large depth: infinite-width analyses ignore the microscopic fluctuations from layer to layer, but these fluctuations accumulate over many layers. To overcome this shortcoming, we study the random covariance matrix in the shaped infinitedepth-and-width limit. We identify the precise scaling of the activation function necessary to arrive at a non-trivial limit, and show that the random covariance matrix is governed by a stochastic differential equation (SDE) that we call the Neural Covariance SDE. Using simulations, we show that the SDE closely matches the distribution of the random covariance matrix of finite networks. Additionally, we recover an if-and-only-if condition for exploding and vanishing norms of large shaped networks based on the activation function.

https://arxiv.org/abs/2206.02768

2、[AS] Universal Speech Enhancement with Score-based Diffusion

J Serrà, S Pascual, J Pons, R. O Araz, D Scaini

[Dolby Laboratories]

基于分数扩散的通用语音增强。从语音音频中去除背景噪音一直是一个相当大的研究方向，特别是在近年来由于虚拟通信和业余录音的兴起。然而，背景噪音并不是唯一会妨碍理解力的令人不快的干扰因素：混响、削波、编解码器伪影、有问题的均衡、有限带宽或不一致的响度也同样令人不安且无处不在。本文提出将语音增强任务视为一项整体工作，并提出一种通用的语音增强系统，同时处理55种不同的失真现象。该方法包括一个用基于分数扩散的生成模型，以及一个用混合密度网络进行增强的多分辨率调节网络。在专家听众进行的主观测试中，该方法明显优于现有技术水平。尽管没有考虑任何特定的快速采样策略，只用了4-8个扩散步骤就达到了有竞争力的客观分数。

Removing background noise from speech audio has been the subject of considerable research and effort, especially in recent years due to the rise of virtual communication and amateur sound recording. Yet background noise is not the only unpleasant disturbance that can prevent intelligibility: reverb, clipping, codec artifacts, problematic equalization, limited bandwidth, or inconsistent loudness are equally disturbing and ubiquitous. In this work, we propose to consider the task of speech enhancement as a holistic endeavor, and present a universal speech enhancement system that tackles 55 different distortions at the same time. Our approach consists of a generative model that employs score-based diffusion, together with a multi-resolution conditioning network that performs enhancement with mixture density networks. We show that this approach significantly outperforms the state of the art in a subjective test performed by expert listeners. We also show that it achieves competitive objective scores with just 4–8 diffusion steps, despite not considering any particular strategy for fast sampling. We hope that both our methodology and technical contributions encourage researchers and practitioners to adopt a universal approach to speech enhancement, possibly framing it as a generative task.https://arxiv.org/abs/2206.03065

3、[LG] Neural Diffusion Processes

V Dutordoir, A Saul, Z Ghahramani, F Simpson

[University of Cambridge & Secondmind]

神经扩散过程。高斯过程为特定化函数先验和后验分布提供了一个优雅的框架。然而，它们在计算上也很昂贵，并受限于其协方差函数的表达能力。本文提出神经扩散过程(NDP)，一种基于扩散模型的新方法，可以学习从函数分布中采样。使用一种新的注意力模块，能将随机过程的属性，如交换性，直接纳入NDP架构。经验表明，NDP能捕获接近高斯过程真实贝叶斯后验的函数分布。这使得各种下游任务成为可能，包括超参数边缘化和贝叶斯优化。

Gaussian processes provide an elegant framework for specifying prior and posterior distributions over functions. They are, however, also computationally expensive, and limited by the expressivity of their covariance function. We propose Neural Diffusion Processes (NDPs), a novel approach based upon diffusion models, that learns to sample from distributions over functions. Using a novel attention block we are able to incorporate properties of stochastic processes, such as exchangeability, directly into the NDP’s architecture. We empirically show that NDPs are able to capture functional distributions that are close to the true Bayesian posterior of a Gaussian process. This enables a variety of downstream tasks, including hyperparameter marginalisation and Bayesian optimisation.

https://arxiv.org/abs/2206.03992

4、[LG] On the Effectiveness of Fine-tuning Versus Meta-reinforcement Learning

Z Mandi, P Abbeel, S James

[UC Berkeley]

微调与元强化学习有效性比较研究。智能的智能体应该有能力利用以前学过的任务的知识，以便快速有效地学习新任务。元学习方法已经成为实现这一目标的流行解决方案。然而，到目前为止，元强化学习(metareinforcement learning，meta-RL)算法被限制在具有窄任务分布的简单环境中。此外，在监督和自监督学习中，预训练后再进行微调以适应新任务的范式，已成为一种简单而有效的解决方案。这让人质疑元学习方法在强化学习中的益处，因为它通常以高复杂性为代价。本文在各种基于视觉的基准中研究了元学习方法，包括Procgen、RLBench和Atari，在这些基准中对全新的任务进行了评估。研究结果表明，当元学习方法在不同的任务(而不是同一任务的不同变体)上进行评估时，在新任务上进行微调的多任务预训练与元测试时自适应的元预训练表现同样好，甚至更好。这对未来的研究来说是令人鼓舞的，因为多任务预训练往往比元强化学习更简单，计算成本更低。根据这些发现，本文主张在更具挑战性的任务上评估未来的元强化学习方法，并将带有微调的多任务预训练作为一个简单而强大的基线。

Intelligent agents should have the ability to leverage knowledge from previously learned tasks in order to learn new ones quickly and efficiently. Meta-learning approaches have emerged as a popular solution to achieve this. However, metareinforcement learning (meta-RL) algorithms have thus far been restricted to simple environments with narrow task distributions. Moreover, the paradigm of pretraining followed by fine-tuning to adapt to new tasks has emerged as a simple yet effective solution in supervised and self-supervised learning. This calls into question the benefits of meta-learning approaches in reinforcement learning, which typically come at the cost of high complexity. We therefore investigate meta-RL approaches in a variety of vision-based benchmarks, including Procgen, RLBench, and Atari, where evaluations are made on completely novel tasks. Our findings show that when meta-learning approaches are evaluated on different tasks (rather than different variations of the same task), multi-task pretraining with fine-tuning on new tasks performs equally as well, or better, than meta-pretraining with meta test-time adaptation. This is encouraging for future research, as multi-task pretraining tends to be simpler and computationally cheaper than meta-RL. From these findings, we advocate for evaluating future meta-RL methods on more challenging tasks and including multi-task pretraining with fine-tuning as a simple, yet strong baseline.

https://arxiv.org/abs/2206.03271

5、[CV] Fast Unsupervised Brain Anomaly Detection and Segmentation with Diffusion Models

W H. L. Pinaya, M S. Graham, R Gray, P F D Costa, P Tudosiu, P Wright...

[King’s College London & University College London]

基于扩散模型的快速无监督脑成像异常检测与分割。深度生成模型已经成为检测数据中任意异常情况的有前途的工具，免除了人工标记的必要性。最近，自回归Transformer在医学成像的异常检测方面取得了最先进的性能。尽管如此，这些模型仍有一些内在的弱点，如要求图像被建模为一维序列，采样过程中的误差积累，以及与Transformer有关的大量推理时间。去噪扩散概率模型是一类非自回归生成模型，最近被证明在计算机视觉中能产生优秀的样本(超过生成对抗网络)，并能实现与Transformer相竞争的对数似然，同时具有快速推理时间。扩散模型可应用于由自编码器学习的潜表示，使其易于扩展，是应用于高维数据(如医学图像)的最佳选择。本文提出一种基于扩散模型的方法来检测和分割大脑成像中的异常情况。通过在健康数据上训练模型，然后在其马尔可夫链上探索其扩散和逆向步骤，可以在潜空间中识别异常区域，从而在像素空间中识别异常情况。与自回归方法相比，所提出的扩散模型在一系列涉及合成和真实病变的二维CT和MRI数据的实验中取得了有竞争力的性能，推断时间大大减少，使其应用在临床上成为可能。

Deep generative models have emerged as promising tools for detecting arbitrary anomalies in data, dispensing with the necessity for manual labelling. Recently, autoregressive transformers have achieved state-of-the-art performance for anomaly detection in medical imaging. Nonetheless, these models still have some intrinsic weaknesses, such as requiring images to be modelled as 1D sequences, the accumulation of errors during the sampling process, and the significant inference times associated with transformers. Denoising diffusion probabilistic models are a class of non-autoregressive generative models recently shown to produce excellent samples in computer vision (surpassing Generative Adversarial Networks), and to achieve log-likelihoods that are competitive with transformers while having fast inference times. Diffusion models can be applied to the latent representations learnt by autoencoders, making them easily scalable and great candidates for application to high dimensional data, such as medical images. Here, we propose a method based on diffusion models to detect and segment anomalies in brain imaging. By training the models on healthy data and then exploring its diffusion and reverse steps across its Markov chain, we can identify anomalous areas in the latent space and hence identify anomalies in the pixel space. Our diffusion models achieve competitive performance compared with autoregressive approaches across a series of experiments with 2D CT and MRI data involving synthetic and real pathological lesions with much reduced inference times, making their usage clinically viable.

https://arxiv.org/abs/2206.03461

另外几篇值得关注的论文：

[LG] Sharpness-Aware Training for Free

无需额外计算成本的锐度感知训练J Du, D Zhou, J Feng, V Y. F. Tan, J T Zhou

[A*STAR & National University of Singapore & ByteDance]https://arxiv.org/abs/2205.14083

[CL] Quark: Controllable Text Generation with Reinforced Unlearning

Quark：基于强化遗忘的可控文本生成X Lu, S Welleck, L Jiang, J Hessel, L Qin, P West, P Ammanabrolu... [Allen Institute for Artificial Intelligence & University of Washington] (2022) https://arxiv.org/abs/2205.13636

[LG] Dataset Condensation via Efficient Synthetic-Data Parameterization

基于合成数据高效参数化的数据集浓缩J Kim, J Kim, S J Oh, S Yun, H Song, J Jeong, J Ha, H O Song [Seoul National University & NAVER AI Lab & NAVER Clova] (2022) https://arxiv.org/abs/2205.14959

[CL] STable: Table Generation Framework for Encoder-Decoder Models

STable：编-解码器模型的表格生成框架M Pietruszka, M Turski, Ł Borchmann, T Dwojak, G Pałka, K Szyndler, D Jurkiewicz, Ł Garncarek [http://Applica.ai] (2022) https://arxiv.org/abs/2206.04045