爱可可AI前沿推介(10.27)

LG - 机器学习 CV - 计算机视觉 CL - 计算与语言 AS - 音频与语音 RO - 机器人

转自爱可可爱生活

1、[LG] Applications and Techniques for Fast Machine Learning in Science

A M Deiana, N Tran, J Agar, M Blott, G D Guglielmo, J Duarte...

[Southern Methodist University & Fermi National Accelerator Laboratory...]

科学快速机器学习的应用与技术。这份社区综述报告讨论了快速机器学习(ML)在科学中的应用和技术——将强大的机器学习方法整合到实时实验数据处理回路中以加速科学发现的概念。报告的材料建立在科学界快速机器学习的两次研讨会基础上，涵盖了三个主要领域：快速机器学习在一些科学领域的应用；训练和实现高性能和资源高效的机器学习算法的技术；以及部署这些算法的计算架构、平台和技术。介绍了多个科学领域的共同挑战，在这些领域可以找到共通的解决方案。本报告旨在通过集成和加速的机器学习解决方案为科学发现提供大量的例子和灵感。随后是对技术进步的高层次概述和组织，大量指向源材料，以实现这些突破。

In this community review report, we discuss applications and techniques for fast machine learning (ML) in science—the concept of integrating power ML methods into the real-time experimental data processing loop to accelerate scientific discovery. The material for the report builds on two workshops held by the Fast ML for Science community and covers three main areas: applications for fast ML across a number of scientific domains; techniques for training and implementing performant and resource-efficient ML algorithms; and computing architectures, platforms, and technologies for deploying these algorithms. We also present overlapping challenges across the multiple scientific domains where common solutions can be found. This community report is intended to give plenty of examples and inspiration for scientific discovery through integrated and accelerated ML solutions. This is followed by a high-level overview and organization of technical advances, including an abundance of pointers to source material, which can enable these breakthroughs.

https://weibo.com/1402400261/KEBEH5bKJ

2、[AS] Wav2CLIP: Learning Robust Audio Representations From CLIP

H Wu, P Seetharaman, K Kumar, J P Bello

[New York University & Descript, Inc]

Wav2CLIP：从CLIP学习鲁棒音频表示。本文提出Wav2CLIP，一种通过从对比语言-图像预训练(CLIP)中提炼出的鲁棒音频表示学习方法。在各种音频任务上系统地评估了Wav2CLIP，包括分类、检索和生成，表明Wav2CLIP可以胜过几种公开的预训练音频表示算法。Wav2CLIP将音频投射到一个与图像和文本共享的嵌入空间中，这使得多模态应用成为可能，如零样本分类和跨模态检索。此外，与全监督模型相比，Wav2CLIP只需要10%的数据就能在下游任务中取得有竞争力的表现，而且与其他竞争方法相比，预训练效率更高，因为它不需要在学习视觉模型的同时学习听觉模型。展示了从Wav2CLIP生成的图像，作为对共享嵌入空间的定性评估。

We propose Wav2CLIP, a robust audio representation learning method by distilling from Contrastive Language-Image Pre-training (CLIP). We systematically evaluate Wav2CLIP on a variety of audio tasks including classification, retrieval, and generation, and show that Wav2CLIP can outperform several publicly available pretrained audio representation algorithms. Wav2CLIP projects audio into a shared embedding space with images and text, which enables multimodal applications such as zero-shot classification, and cross-modal retrieval. Furthermore, Wav2CLIP needs just ∼10% of the data to achieve competitive performance on downstream tasks compared with fully supervised models, and is more efficient to pretrain than competing methods as it does not require learning a visual model in concert with an auditory model. Finally, we demonstrate image generation from Wav2CLIP as qualitative assessment of the shared embedding space. Our code and model weights are open sourced and made available for further applications.

https://weibo.com/1402400261/KEBJolWw2

3、[CV] Image-Based CLIP-Guided Essence Transfer

H Chefer, S Benaim, R Paiss, L Wolf

[Tel Aviv University]

基于图像的CLIP引导要素迁移。两种信号的概念性混合是一项语义任务，可能会强调创造力和智慧。本文建议以一种包含两个潜空间的方式来进行这种混合：生成器网络潜空间和语义网络潜空间。对于第一个网络，采用强大的StyleGAN生成器，对于第二个网络，采用CLIP强大的图像-语言匹配网络。新方法创建了一个混合算子，该算子被优化为同时在两个潜空间中都是加性的。实验结果表明，这导致的混合比在每个空间单独获得的要自然得多。

The conceptual blending of two signals is a semantic task that may underline both creativity and intelligence. We propose to perform such blending in a way that incorporates two latent spaces: that of the generator network and that of the semantic network. For the first network, we employ the powerful StyleGAN generator, and for the second, the powerful image-language matching network of CLIP. The new method creates a blending operator that is optimized to be simultaneously additive in both latent spaces. Our results demonstrate that this leads to blending that is much more natural than what can be obtained in each space separately. Our code is available at: https: //github.com/hila-chefer/TargetCLIP

https://weibo.com/1402400261/KEBManPL5

4、[LG] Neural Tangent Kernel Eigenvalues Accurately Predict Generalization

J B. Simon, M Dickens, M R. DeWeese

[UC Berkeley]

神经切线核特征值精确预测泛化。寻找神经网络泛化的定量理论一直是深度学习研究的核心目标。本文扩展了最近的成果，证明通过研究神经网络的"神经正切核"的特征系统，可以预测其在学习任意函数时的泛化性能。该理论不仅准确预测了测试的均方误差，而且预测了网络所学函数的所有一阶和二阶统计数据。此外，利用量化给定目标函数"可学习性"的措施，证明了一个新的"没有免费午餐"定理，该定理描述了广义神经网络归纳偏差中的一个基本权衡：提高网络对给定目标函数的泛化能力，必须降低其对正交函数的泛化能力。通过分析预测两个令人惊讶的现象——在难学习函数上的worse-than-chance泛化以及在小数据系统中的非单调误差曲线——来进一步证明该理论的效用，随后在实验中观察到这些现象。尽管该理论是针对无限宽架构推导出来的，但它与宽度为20的网络是一致的，这表明它对实用神经网络的泛化具有预测性。

Finding a quantitative theory of neural network generalization has long been a central goal of deep learning research. We extend recent results to demonstrate that, by examining the eigensystem of a neural network’s “neural tangent kernel”, one can predict its generalization performance when learning arbitrary functions. Our theory accurately predicts not only test mean-squared-error but all firstand second-order statistics of the network’s learned function. Furthermore, using a measure quantifying the “learnability” of a given target function, we prove a new “no-free-lunch” theorem characterizing a fundamental tradeoff in the inductive bias of wide neural networks: improving a network’s generalization for a given target function must worsen its generalization for orthogonal functions. We further demonstrate the utility of our theory by analytically predicting two surprising phenomena — worse-than-chance generalization on hard-to-learn functions and nonmonotonic error curves in the small data regime — which we subsequently observe in experiments. Though our theory is derived for infinite-width architectures, we find it agrees with networks as narrow as width 20, suggesting it is predictive of generalization in practical neural networks.

https://weibo.com/1402400261/KEBQupBDL

5、[AS] Unsupervised Source Separation By Steering Pretrained Music Models

E Manilow, P O'Reilly, P Seetharaman, B Pardo

[Northwestern University & Descript, Inc]

基于引导预训练音乐模型的无监督音源分离。本文提出一种无监督方法，将为音乐生成和音乐标签训练的深度模型用于音源分离，而不需要任何重新训练。一个音频生成模型以输入混合为条件，产生一个用于生成音频的潜编码。这个生成的音频被送入一个预训练的音乐标记器，该标记器创建了源标签。生成的音频的标签分布和一个孤立的源的预定义分布之间的交叉熵损失被用来指导生成模型的(不变)潜空间的梯度上升。系统不更新生成模型或标记器的权重，只依靠在生成模型的潜空间中移动来产生分离源。用OpenAI的JUKEBOX作为预训练的生成模型，并将其与四种预训练的音乐标记器(两种架构和两种标记数据集)结合起来。在两个音源分离数据集上的实验结果表明，这种方法可以对更多的音源产生分离估计，比其他经过测试的有监督或无监督的系统都要多。本文工作指出了大型预训练音乐模型在音源分离等音频到音频任务中的巨大潜力。

We showcase an unsupervised method that repurposes deep models trained for music generation and music tagging for audio source separation, without any retraining. An audio generation model is conditioned on an input mixture, producing a latent encoding of the audio used to generate audio. This generated audio is fed to a pretrained music tagger that creates source labels. The cross-entropy loss between the tag distribution for the generated audio and a predefined distribution for an isolated source is used to guide gradient ascent in the (unchanging) latent space of the generative model. This system does not update the weights of the generative model or the tagger, and only relies on moving through the generative model’s latent space to produce separated sources. We use OpenAI’s JUKEBOX as the pretrained generative model, and we couple it with four kinds of pretrained music taggers (two architectures and two tagging datasets). Experimental results on two source separation datasets, show this approach can produce separation estimates for a wider variety of sources than any tested supervised or unsupervised system. This work points to the vast and heretofore untapped potential of large pretrained music models for audio-to-audio tasks like source separation.

https://weibo.com/1402400261/KEBUBFA0i

另外几篇值得关注的论文：

[LG] Scaling Up Machine Learning For Quantum Field Theory with Equivariant Continuous Flows

基于等变连续流扩展量子场论机器学习

P d Haan, C Rainone, M Cheng, R Bondesan

[Qualcomm AI Research & University of Amsterdam]

https://weibo.com/1402400261/KEBYy1Yqo

[LG] Parameter Prediction for Unseen Deep Architectures

未见深度架构参数预测

B Knyazev, M Drozdzal, G W. Taylor, A Romero-Soriano

[University of Guelph & Facebook AI Research]

https://weibo.com/1402400261/KEC1guUwT

[LG] What Would Jiminy Cricket Do? Towards Agents That Behave Morally

行为道德智能体研究

D Hendrycks, M Mazeika, A Zou, S Patel, C Zhu, J Navarro, D Song, B Li, J Steinhardt

[UC Berkeley & UIUC]

https://weibo.com/1402400261/KEC40pqzY

[CL] Situated Dialogue Learning through Procedural Environment Generation

基于自动环境生成的情境对话学习

P Ammanabrolu, R Jia, M O. Riedl

[Georgia Institute of Technology]

https://weibo.com/1402400261/KEC5W77Zz

内容中包含的图片若涉及版权问题，请及时与我们联系删除