爱可可AI前沿推介(11.28)

LG - 机器学习 CV - 计算机视觉 CL - 计算与语言 AS - 音频与语音 RO - 机器人

1、[CV] Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields

J T. Barron, B Mildenhall, D Verbin, P P. Srinivasan, P Hedman

[Google Research]

Mip-NeRF 360：无界抗锯齿神经辐射场。尽管神经辐射场(NeRF)在物体和空间的小范围内展示了令人印象深刻的视图合成结果，但它们在"无界"场景中却存在困难，因为摄像机可能指向任何方向，内容可能存在于任何距离。这种情况下，现有的类似于NeRF的模型往往会产生模糊或低分辨率的渲染(由于附近和远处物体的细节和规模不平衡)，训练速度慢，并且由于从一组小图像中重建一个大场景的任务的固有模糊性，可能会表现出伪影。本文提出了Mip-NeRF(一种解决采样和混叠问题的NeRF变体)的扩展Mip-NeRF 360，用非线性场景参数化、在线蒸馏和一种新的基于失真的正则器来克服无界场景带来的挑战，目标是摄像机围绕一个点旋转360度的场景，与Mip-NeRF相比，均方误差减少54%，并且能够为高度复杂、无界的现实世界场景产生逼真的合成视图和详细的深度图。

Though neural radiance fields (NeRF) have demonstrated impressive view synthesis results on objects and small bounded regions of space, they struggle on “unbounded” scenes, where the camera may point in any direction and content may exist at any distance. In this setting, existing NeRF-like models often produce blurry or low-resolution renderings (due to the unbalanced detail and scale of nearby and distant objects), are slow to train, and may exhibit artifacts due to the inherent ambiguity of the task of reconstructing a large scene from a small set of images. We present an extension of mip-NeRF (a NeRF variant that addresses sampling and aliasing) that uses a non-linear scene parameterization, online distillation, and a novel distortion-based regularizer to overcome the challenges presented by unbounded scenes. Our model, which we dub “mip-NeRF 360” as we target scenes in which the camera rotates 360 degrees around a point, reduces meansquared error by 54% compared to mip-NeRF, and is able to produce realistic synthesized views and detailed depth maps for highly intricate, unbounded real-world scenes.

https://weibo.com/1402400261/L3vQzzTk1

2、[CV] Smoothing the Generative Latent Space with Mixup-based Distance Learning

C Kong, J Kim, D Han, N Kwak

[Seoul National University]

基于混合远程学习的生成式潜空间平滑。用生成式模型(如GAN)制作多样化和逼真的图像，通常需要用大量的图像进行大规模的训练。用极其有限的数据训练的GAN很容易对少数训练样本过拟合，并显示出不理想的特性，如过渡是不连续的"阶梯状"潜空间，偶尔会产生输出的突然变化。本文考虑到既没有感兴趣的大规模数据集，也没有可迁移的源数据集的情况，并寻求以最小的过拟合和模式崩溃来训练现有的生成模型。在生成器和对应的鉴别器的特征空间上提出了潜混合距离正则化，鼓励两个参与者不仅要推理稀缺的观察数据点，还要推理它们在特征空间中的相对距离。对不同数据集的定性和定量评估表明，所提出方法普遍适用于现有模型，在有限的数据约束下提高保真度和多样性。

Producing diverse and realistic images with generative models such as GANs typically requires large scale training with vast amount of images. GANs trained with extremely limited data can easily overfit to few training samples and display undesirable properties like ”stairlike” latent space where transitions in latent space suffer from discontinuity, occasionally yielding abrupt changes in outputs. In this work, we consider the situation where neither large scale dataset of our interest nor transferable source dataset is available, and seek to train existing generative models with minimal overfitting and mode collapse. We propose latent mixup-based distance regularization on the feature space of both a generator and the counterpart discriminator that encourages the two players to reason not only about the scarce observed data points but the relative distances in the feature space they reside. Qualitative and quantitative evaluation on diverse datasets demonstrates that our method is generally applicable to existing models to enhance both fidelity and diversity under the constraint of limited data. Code will be made public.

https://weibo.com/1402400261/L3vUDD9pZ

3、[CL] Towards a Unified View of Parameter-Efficient Transfer Learning

J He, C Zhou, X Ma, T Berg-Kirkpatrick, G Neubig

[CMU & University of Southern California & UC San Diego]

参数高效迁移学习统一视角研究。在下游任务上对大型预训练语言模型进行微调，已经成为NLP中事实上的学习范式。然而，传统的方法是对预训练模型的所有参数进行微调，随着模型规模和任务数量的增加，这种方法变得越发困难。最近的工作提出了各种参数高效的迁移学习方法，这些方法只对少数(额外的)参数进行微调，以达到强大的性能。虽然有效，但对成功的关键因素和各种方法之间的联系了解甚少。本文分解了最先进的参数高效转移学习方法的设计，提出一个统一框架，建立了它们之间的联系。把它们重新框定为对预训练模型中特定隐藏状态的修改，并定义了一组不同方法的设计维度，如计算修改的函数和应用修改的位置等。通过对机器翻译、文本摘要、语言理解和文本分类基准的全面实证研究，从统一视角来识别之前方法中的重要设计选择。统一框架使设计元素能够在不同的方法之间迁移，能实例化新的参数高效微调方法，这些方法比之前的方法调整的参数更少，也更有效，在所有四个任务上取得与微调所有参数相当的结果。

Fine-tuning large pretrained language models on downstream tasks has become the de-facto learning paradigm in NLP. However, conventional approaches finetune all the parameters of the pretrained model, which becomes prohibitive as the model size and the number of tasks grow. Recent work has proposed a variety of parameter-efficient transfer learning methods that only fine-tune a small number of (extra) parameters to attain strong performance. While effective, the critical ingredients for success and the connections among the various methods are poorly understood. In this paper, we break down the design of state-of-the-art parameter-efficient transfer learning methods and present a unified framework that establishes connections between them. Specifically, we re-frame them as modifications to specific hidden states in pretrained models, and define a set of design dimensions along which different methods vary, such as the function to compute the modification and the position to apply the modification. Through comprehensive empirical studies across machine translation, text summarization, language understanding, and text classification benchmarks, we utilize the unified view to identify important design choices in previous methods. Furthermore, our unified framework enables the transfer of design elements across different approaches, and as a result we are able to instantiate new parameter-efficient fine-tuning methods that tune less parameters than previous methods while being more effective, achieving comparable results to fine-tuning all parameters on all four tasks.

https://weibo.com/1402400261/L3vXG6oFi

4、[LG] Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and Beyond

C Yun, S Rajput, S Sra

[MIT & University of Wisconsin-Madison]

Minibatch与混洗局部SGD：严格收敛界及相关研究。在分布式学习中，局部SGD(也被称为联合平均)和其简单的基线minibatch SGD是被广泛研究的优化方法。对这些方法的大多数现有分析都假定通过与替换采样获得独立和无偏的梯度估计。相比之下，本文研究基于混洗的变体：Minibatch和局部随机混洗，在不替换的情况下采取随机梯度，更接近于实际。对于满足Polyak-Łojasiewicz条件的平滑函数，获得了收敛界，这表明这些基于混洗的变体比其带替换的变体收敛更快。证明了匹配下界，表明收敛分析是严格的。提出了一种被称为同步混洗的算法修改，导致收敛率在近均质情况下比下界更快。

In distributed learning, local SGD (also known as federated averaging) and its simple baseline minibatch SGD are widely studied optimization methods. Most existing analyses of these methods assume independent and unbiased gradient estimates obtained via with-replacement sampling. In contrast, we study shuffling-based variants: minibatch and local Random Reshuffling, which draw stochastic gradients without replacement and are thus closer to practice. For smooth functions satisfying the Polyak-Łojasiewicz condition, we obtain convergence bounds (in the large epoch regime) which show that these shuffling-based variants converge faster than their with-replacement counterparts. Moreover, we prove matching lower bounds showing that our convergence analysis is tight. Finally, we propose an algorithmic modification called synchronized shuffling that leads to convergence rates faster than our lower bounds in nearhomogeneous settings.

https://weibo.com/1402400261/L3w2WDuwR

5、[LG] Adversarially Robust Learning for Security-Constrained Optimal Power Flow

P L. Donti, A Agarwal, N V Bedmutha, L Pileggi, J. Z Kolter

[CMU]

安全约束最优电力流的对抗性鲁棒学习。近年来，机器学习界对对抗性鲁棒学习和隐层的兴趣激增，但这两个领域之间的联系却很少被探讨过。本文将这些领域的创新结合起来，以解决N-k安全约束的最优电力流(SCOPF)问题。N-k SCOPF是电网运行的一个核心问题，其目的是以一种对潜在K个同时发生的设备故障具有鲁棒性的方式安排发电。受对抗性鲁棒训练方法的启发，将N-k SCOPF设定为一个最小化优化问题——将发电设置视为可调整参数，将设备故障视为(对抗性)攻击——通过基于梯度的技术解决该问题。这个最小化问题的损失函数涉及到解决代表电网物理和操作决策的隐式方程，通过隐式函数定理对其进行区分。实验证明所提出框架在解决N-3 SCOPF中的功效，鉴于问题的大小在组合上取决于潜在停电的数量，这在传统上被认为是极其昂贵的解决方法。

In recent years, the ML community has seen surges of interest in both adversarially robust learning and implicit layers, but connections between these two areas have seldom been explored. In this work, we combine innovations from these areas to tackle the problem of N-k security-constrained optimal power flow (SCOPF). N-k SCOPF is a core problem for the operation of electrical grids, and aims to schedule power generation in a manner that is robust to potentially k simultaneous equipment outages. Inspired by methods in adversarially robust training, we frame N-k SCOPF as a minimax optimization problem – viewing power generation settings as adjustable parameters and equipment outages as (adversarial) attacks – and solve this problem via gradient-based techniques. The loss function of this minimax problem involves resolving implicit equations representing grid physics and operational decisions, which we differentiate through via the implicit function theorem. We demonstrate the efficacy of our framework in solving N-3 SCOPF, which has traditionally been considered as prohibitively expensive to solve given that the problem size depends combinatorially on the number of potential outages.

https://weibo.com/1402400261/L3w7riAqp

另外几篇值得关注的论文：

[AS] Differentiable Wavetable Synthesis

可微波表合成

S Shan, L Hantrakul, J Chen, M Avent, D Trevelyan

[University of North Carolina at Chapel Hill & ByteDance]

https://weibo.com/1402400261/L3waxElpL

[CV] L-Verse: Bidirectional Generation Between Image and Text

L-Verse：图像与文本间的双向生成

T Kim, G Song, S Lee, S Kim, Y Seo, S Lee, S H Kim, H Lee, K Bae

[LG AI Research]

https://weibo.com/1402400261/L3wcKpTDh

[CV] Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions

基于大规模视频转录推进高分辨率视频-语言表示

H Xue, T Hang, Y Zeng, Y Sun, B Liu, H Yang, J Fu, B Guo

[Microsoft Research Asia]

https://weibo.com/1402400261/L3we64Cnc

[LG] A Free Lunch from the Noise: Provable and Practical Exploration for Representation Learning

噪声里的免费午餐：表示学习的可证与实用探索

T Ren, T Zhang, C Szepesvári, B Dai

[UT Austin & UC Berkeley & University of Alberta & Google Brain]

https://weibo.com/1402400261/L3wg00WmB

内容中包含的图片若涉及版权问题，请及时与我们联系删除