
LG - 机器学习   CV - 计算机视觉   CL - 计算与语言   AS - 音频与语音 RO - 机器人 GR - 图形学



1、[AI] Using Natural Language and Program Abstractions to Instill Human Inductive Biases in Machines

S Kumar, C G. Correa, I Dasgupta, R Marjieh...
[Princeton University & DeepMind]

Strong inductive biases give humans the ability to quickly learn to perform a variety of tasks. Although meta-learning is a method to endow neural networks with useful inductive biases, agents trained by meta-learning may sometimes acquire very different strategies from humans. We show that co-training these agents on predicting representations from natural language task descriptions and programs induced to generate such tasks guides them toward more human-like inductive biases. Human-generated language descriptions and program induction models that add new learned primitives both contain abstract concepts that can compress description length. Co-training on these representations result in more human-like behavior in downstream meta-reinforcement learning agents than less abstract controls (synthetic language descriptions, program induction without learned primitives), suggesting that the abstraction supported by these representations is key.



2、[CV] DiffDreamer: Consistent Single-view Perpetual View Generation with Conditional Diffusion Models

S Cai, E R Chan, S Peng, M Shahbazi...
[Stanford University & ETH Zurich]
DiffDreamer: 基于条件扩散模型的一致单视持续视图生成。持续视图生成——通过飞入给定图像来生成长程新视图的任务——一直是一项新的有前景的任务。本文提出DiffDreamer,一种无监督框架,能在仅对互联网收集的自然场景图像进行训练的同时,合成描绘相机长轨迹的新视图。本文证明了以图像为条件的扩散模型能有效地进行长程的场景推断,同时保留局部和全局的一致性,明显优于之前基于GAN的方法。

Perpetual view generation -- the task of generating long-range novel views by flying into a given image -- has been a novel yet promising task. We introduce DiffDreamer, an unsupervised framework capable of synthesizing novel views depicting a long camera trajectory while training solely on internet-collected images of nature scenes. We demonstrate that image-conditioned diffusion models can effectively perform long-range scene extrapolation while preserving both local and global consistency significantly better than prior GAN-based methods. Project page: this https URL .



3、[CV] SPARF: Neural Radiance Fields from Sparse and Noisy Poses

P Truong, M Rakotosaona, F Manhardt, F Tombari
[Google & ETH Zurich]

Neural Radiance Field (NeRF) has recently emerged as a powerful representation to synthesize photorealistic novel views. While showing impressive performance, it relies on the availability of dense input views with highly accurate camera poses, thus limiting its application in real-world scenarios. In this work, we introduce Sparse Pose Adjusting Radiance Field (SPARF), to address the challenge of novel-view synthesis given only few wide-baseline input images (as low as 3) with noisy camera poses. Our approach exploits multi-view geometry constraints in order to jointly learn the NeRF and refine the camera poses. By relying on pixel matches extracted between the input views, our multi-view correspondence objective enforces the optimized scene and camera poses to converge to a global and geometrically accurate solution. Our depth consistency loss further encourages the reconstructed scene to be consistent from any viewpoint. Our approach sets a new state of the art in the sparse-view regime on multiple challenging datasets.



4、[LG] Gradient Estimation with Discrete Stein Operators

J Shi, Y Zhou, J Hwang, M K. Titsias, L Mackey
[Microsoft Research & Tsinghua University & Stanford University & DeepMind]

Gradient estimation -- approximating the gradient of an expectation with respect to the parameters of a distribution -- is central to the solution of many machine learning problems. However, when the distribution is discrete, most common gradient estimators suffer from excessive variance. To improve the quality of gradient estimation, we introduce a variance reduction technique based on Stein operators for discrete distributions. We then use this technique to build flexible control variates for the REINFORCE leave-one-out estimator. Our control variates can be adapted online to minimize variance and do not require extra evaluations of the target function. In benchmark generative modeling tasks such as training binary variational autoencoders, our gradient estimator achieves substantially lower variance than state-of-the-art estimators with the same number of function evaluations.



5、[LG] On-Demand Sampling: Learning Optimally from Multiple Distributions

N Haghtalab, M I. Jordan, E Zhao
[UC Berkeley]

Social and real-world considerations such as robustness, fairness, social welfare and multi-agent tradeoffs have given rise to multi-distribution learning paradigms, such as collaborative, group distributionally robust, and fair federated learning. In each of these settings, a learner seeks to minimize its worst-case loss over a set of n predefined distributions, while using as few samples as possible. In this paper, we establish the optimal sample complexity of these learning paradigms and give algorithms that meet this sample complexity. Importantly, our sample complexity bounds exceed that of the sample complexity of learning a single distribution only by an additive factor of nlog(n)/ϵ2. These improve upon the best known sample complexity of agnostic federated learning by Mohri et al. by a multiplicative factor of n, the sample complexity of collaborative learning by Nguyen and Zakynthinou by a multiplicative factor logn/ϵ3, and give the first sample complexity bounds for the group DRO objective of Sagawa et al. To achieve optimal sample complexity, our algorithms learn to sample and learn from distributions on demand. Our algorithm design and analysis is enabled by our extensions of stochastic optimization techniques for solving stochastic zero-sum games. In particular, we contribute variants of Stochastic Mirror Descent that can trade off between players' access to cheap one-off samples or more expensive reusable ones.





[LG] Powderworld: A Platform for Understanding Generalization via Rich Task Distributions

K Frans, P Isola


[CV] NeRF-RPN: A general framework for object detection in NeRFs

B Hu, J Huang, Y Liu, Y Tai, C Tang
[The Hong Kong University of Science and Technology] https://arxiv.org/abs/2211.11646


[CV] Inversion-Based Creativity Transfer with Diffusion Models

Y Zhang, N Huang, F Tang, H Huang, C Ma, W Dong, C Xu
[Chinese Academy of Sciences & Kuaishou Technology] https://arxiv.org/abs/2211.13203


[CV] EDICT: Exact Diffusion Inversion via Coupled Transformations

B Wallace, A Gokul, N Naik
[Salesforce Research]



