爱可可AI前沿推介 (11.28)

转自爱可可爱生活

LG - 机器学习 CV - 计算机视觉 CL - 计算与语言 AS - 音频与语音 RO - 机器人 GR - 图形学

摘要：用自然语言和程序抽象为机器注入人的归纳偏差、基于条件扩散模型的一致单视持续视图生成、基于稀疏噪声姿态的神经辐射场、基于离散斯坦因算子的梯度估计、多分布优化学习、通过富任务分布理解泛化的平台、NeRF目标检测通用框架、基于扩散模型的逆向创造力迁移、基于耦合变换的精确扩散逆向

1、[AI] Using Natural Language and Program Abstractions to Instill Human Inductive Biases in Machines

S Kumar, C G. Correa, I Dasgupta, R Marjieh...
[Princeton University & DeepMind]
用自然语言和程序抽象为机器注入人的归纳偏差。强烈的归纳偏差使人有能力快速学习完成各种任务。尽管元学习是一种赋予神经网络有用归纳偏差的方法，但通过元学习训练的智能体有时可能获得与人非常不同的策略。本文表明，在预测来自自然语言任务描述和归纳产生这种任务的程序的表示上共同训练这些代理，可以引导它们趋向更像人的归纳偏差。人工生成的语言描述和添加了新的习得基元的程序归纳模型都包含抽象概念，可以压缩描述长度。与不那么抽象的控制(合成语言描述、没有习得基元的程序归纳)相比，对这些表示的共同训练使得下游元强化学习智能体的行为更像人，这表明这些表示所支持的抽象性是关键。

Strong inductive biases give humans the ability to quickly learn to perform a variety of tasks. Although meta-learning is a method to endow neural networks with useful inductive biases, agents trained by meta-learning may sometimes acquire very different strategies from humans. We show that co-training these agents on predicting representations from natural language task descriptions and programs induced to generate such tasks guides them toward more human-like inductive biases. Human-generated language descriptions and program induction models that add new learned primitives both contain abstract concepts that can compress description length. Co-training on these representations result in more human-like behavior in downstream meta-reinforcement learning agents than less abstract controls (synthetic language descriptions, program induction without learned primitives), suggesting that the abstraction supported by these representations is key.

https://arxiv.org/abs/2205.11558

2、[CV] DiffDreamer: Consistent Single-view Perpetual View Generation with Conditional Diffusion Models

S Cai, E R Chan, S Peng, M Shahbazi...
[Stanford University & ETH Zurich]
DiffDreamer: 基于条件扩散模型的一致单视持续视图生成。持续视图生成——通过飞入给定图像来生成长程新视图的任务——一直是一项新的有前景的任务。本文提出DiffDreamer，一种无监督框架，能在仅对互联网收集的自然场景图像进行训练的同时，合成描绘相机长轨迹的新视图。本文证明了以图像为条件的扩散模型能有效地进行长程的场景推断，同时保留局部和全局的一致性，明显优于之前基于GAN的方法。

Perpetual view generation -- the task of generating long-range novel views by flying into a given image -- has been a novel yet promising task. We introduce DiffDreamer, an unsupervised framework capable of synthesizing novel views depicting a long camera trajectory while training solely on internet-collected images of nature scenes. We demonstrate that image-conditioned diffusion models can effectively perform long-range scene extrapolation while preserving both local and global consistency significantly better than prior GAN-based methods. Project page: this https URL .

https://arxiv.org/abs/2211.12131

3、[CV] SPARF: Neural Radiance Fields from Sparse and Noisy Poses

P Truong, M Rakotosaona, F Manhardt, F Tombari
[Google & ETH Zurich]
SPARF：基于稀疏噪声姿态的神经辐射场。神经辐射场(NeRF)最近作为一种强大的表示方法出现，用于合成逼真的新视图。虽然表现出令人印象深刻的性能，但它依赖于具有高度精确的相机姿态的稠密输入视图的可用性，从而限制了其在现实世界场景中的应用。本文提出稀疏姿态调整辐射场(SPARF)，以解决在只有少数宽基线输入图像(低至3张)和有噪声的相机姿态的情况下进行新视图合成的挑战。所提出方法利用了多视图的几何约束，以便共同学习NeRF并完善摄像机姿态。通过依靠在输入视图间提取的像素匹配，所提出的多视图对应目标强制优化场景和摄像机姿态，以收敛到一个全局和几何上准确的解决方案。所提出的深度一致性损失进一步鼓励重构场景在任何视角下都是一致的。该方法在多个具有挑战性的数据集的稀疏视图系统中达到了新的技术水平。

Neural Radiance Field (NeRF) has recently emerged as a powerful representation to synthesize photorealistic novel views. While showing impressive performance, it relies on the availability of dense input views with highly accurate camera poses, thus limiting its application in real-world scenarios. In this work, we introduce Sparse Pose Adjusting Radiance Field (SPARF), to address the challenge of novel-view synthesis given only few wide-baseline input images (as low as 3) with noisy camera poses. Our approach exploits multi-view geometry constraints in order to jointly learn the NeRF and refine the camera poses. By relying on pixel matches extracted between the input views, our multi-view correspondence objective enforces the optimized scene and camera poses to converge to a global and geometrically accurate solution. Our depth consistency loss further encourages the reconstructed scene to be consistent from any viewpoint. Our approach sets a new state of the art in the sparse-view regime on multiple challenging datasets.

https://arxiv.org/abs/2211.11738

4、[LG] Gradient Estimation with Discrete Stein Operators

J Shi, Y Zhou, J Hwang, M K. Titsias, L Mackey
[Microsoft Research & Tsinghua University & Stanford University & DeepMind]
基于离散斯坦因算子的梯度估计。梯度估计——符合分布参数的期望梯度近似——是解决许多机器学习问题的核心。然而，当分布是离散的时候，大多数常见的梯度估计器都存在过度的方差。为提高梯度估计的质量，本文提出一种基于离散分布的斯坦因算子的方差削减技术。利用这一技术为REINFORCE留一估计器建立灵活的控制变量。该控制变量可以在线调整以最小化方差，不需要对目标函数进行额外的评估。在基准生成建模任务中，如训练二元变分自编码器，所提出的梯度估计器在相同的函数评估数量下取得了比最先进的估计器低得多的方差。

Gradient estimation -- approximating the gradient of an expectation with respect to the parameters of a distribution -- is central to the solution of many machine learning problems. However, when the distribution is discrete, most common gradient estimators suffer from excessive variance. To improve the quality of gradient estimation, we introduce a variance reduction technique based on Stein operators for discrete distributions. We then use this technique to build flexible control variates for the REINFORCE leave-one-out estimator. Our control variates can be adapted online to minimize variance and do not require extra evaluations of the target function. In benchmark generative modeling tasks such as training binary variational autoencoders, our gradient estimator achieves substantially lower variance than state-of-the-art estimators with the same number of function evaluations.

https://arxiv.org/abs/2202.09497

5、[LG] On-Demand Sampling: Learning Optimally from Multiple Distributions

N Haghtalab, M I. Jordan, E Zhao
[UC Berkeley]
按需采样：多分布优化学习。社会和现实世界的考虑，如鲁棒性、公平性、社会福利和多主体权衡，催生了多分布学习范式，如协作式、群体分布式鲁棒性和公平联合学习。在每种情况下，学习器都试图在一组n个预定分布上使其最坏情况下的损失最小，同时使用尽可能少的样本。本文建立了这些学习范式的最佳样本复杂度，并给出了符合该样本复杂度的算法。所提出的样本复杂度界比学习单一分布的样本复杂度只超出了nlog(n)/ϵ2的加性系数，比Mohri等人的不可知联合学习的最佳样本复杂度提高了n倍，比Nguyen和Zakynthinou的协作学习的样本复杂度提高了logn/ϵ3倍，并为Sagawa等人的群体DRO目标给出了第一个样本复杂度界。为实现最佳样本复杂度，所提出算法按需学习采样和从分布中学习。本文的算法设计和分析是通过对求解随机零和游戏的随机优化技术的扩展实现的。本文贡献了随机镜像下降的变体，可以在玩家获得廉价的一次性样本或更昂贵的可重复使用的样本之间进行权衡。

Social and real-world considerations such as robustness, fairness, social welfare and multi-agent tradeoffs have given rise to multi-distribution learning paradigms, such as collaborative, group distributionally robust, and fair federated learning. In each of these settings, a learner seeks to minimize its worst-case loss over a set of n predefined distributions, while using as few samples as possible. In this paper, we establish the optimal sample complexity of these learning paradigms and give algorithms that meet this sample complexity. Importantly, our sample complexity bounds exceed that of the sample complexity of learning a single distribution only by an additive factor of nlog(n)/ϵ2. These improve upon the best known sample complexity of agnostic federated learning by Mohri et al. by a multiplicative factor of n, the sample complexity of collaborative learning by Nguyen and Zakynthinou by a multiplicative factor logn/ϵ3, and give the first sample complexity bounds for the group DRO objective of Sagawa et al. To achieve optimal sample complexity, our algorithms learn to sample and learn from distributions on demand. Our algorithm design and analysis is enabled by our extensions of stochastic optimization techniques for solving stochastic zero-sum games. In particular, we contribute variants of Stochastic Mirror Descent that can trade off between players' access to cheap one-off samples or more expensive reusable ones.

https://arxiv.org/abs/2210.12529

另外几篇值得关注的论文：

[LG] Powderworld: A Platform for Understanding Generalization via Rich Task Distributions

Powderworld：通过富任务分布理解泛化的平台
K Frans, P Isola
[MIT CSAIL]
https://arxiv.org/abs/2211.13051

[CV] NeRF-RPN: A general framework for object detection in NeRFs

NeRF-RPN：NeRF目标检测通用框架
B Hu, J Huang, Y Liu, Y Tai, C Tang
[The Hong Kong University of Science and Technology] https://arxiv.org/abs/2211.11646

[CV] Inversion-Based Creativity Transfer with Diffusion Models

基于扩散模型的逆向创造力迁移
Y Zhang, N Huang, F Tang, H Huang, C Ma, W Dong, C Xu
[Chinese Academy of Sciences & Kuaishou Technology] https://arxiv.org/abs/2211.13203