LG - 机器学习 CV - 计算机视觉 CL - 计算与语言 AS - 音频与语音 RO - 机器人
1、[CL] Dynamic Generation of Interpretable Inference Rules in a Neuro-Symbolic Expert System
N Weir, B V Durme
[Johns Hopkins University]
We present an approach for systematic reasoning that produces human interpretable proof trees grounded in a factbase. Our solution resembles the style of a classic Prolog-based inference engine, where we replace handcrafted rules through a combination of neural language modeling, guided generation, and semiparametric dense retrieval. This novel reasoning engine, NELLIE, dynamically instantiates interpretable inference rules that capture and score entailment (de)compositions over natural language statements. NELLIE provides competitive performance on scientific QA datasets requiring structured explanations over multiple facts.
2、[CV] LAVIS: A Library for Language-Vision Intelligence
D Li, J Li, H Le, G Wang, S Savarese, S C.H. Hoi
[Salesforce Research]
We introduce LAVIS, an open-source deep learning library for LAnguage-VISion research and applications. LAVIS aims to serve as a one-stop comprehensive library that brings recent advancements in the language-vision field accessible for researchers and practitioners, as well as fertilizing future research and development. It features a unified interface to easily access state-of-the-art imagelanguage, video-language models and common datasets. LAVIS supports training, evaluation and benchmarking on a rich variety of tasks, including multimodal classification, retrieval, captioning, visual question answering, dialogue and pre-training. In the meantime, the library is also highly extensible and configurable, facilitating future development and customization. In this technical report, we describe design principles, key components and functionalities of the library, and also present benchmarking results across common language-vision tasks. The library is available at: https://github.com/salesforce/LAVIS.
3、[CV] NeuralMarker: A Framework for Learning General Marker Correspondence
Z Huang, X Pan, W Pan, W Bian, Y Xu, K C Cheung, G Zhang, H Li
[The Chinese University of Hong Kong & Zhejiang University]
NeuralMarker: 通用标记物对应学习框架。本文要解决的问题,是估计通用标记物(如电影海报)与捕捉到这种标记物的图像间的对应关系。传统上,这个问题是通过拟合一个基于稀疏特征匹配的同源模型来解决的。然而,它们只能够处理类似平面的标记物,而且稀疏特征没有充分利用外观信息。本文提出一个新框架NeuralMarker,训练一个神经网络,在各种具有挑战性的条件下估计密集的标记物对应关系,如标记物变形、恶劣的光线等。此外,本文还提出一种新的标记对应关系评估方法,在真实的标记-图像对上进行标注,并创建了一个新的基准。实验表明,NeuralMarker明显优于之前的方法,并实现了新的有趣应用,包括增强现实(AR)和视频编辑。
We tackle the problem of estimating correspondences from a general marker, such as a movie poster, to an image that captures such a marker. Conventionally, this problem is addressed by fitting a homography model based on sparse feature matching. However, they are only able to handle plane-like markers and the sparse features do not sufficiently utilize appearance information. In this paper, we propose a novel framework NeuralMarker, training a neural network estimating dense marker correspondences under various challenging conditions, such as marker deformation, harsh lighting, etc. Besides, we also propose a novel marker correspondence evaluation method circumstancing annotations on real marker-image pairs and create a new benchmark. We show that NeuralMarker significantly outperforms previous methods and enables new interesting applications, including Augmented Reality (AR) and video editing.
4、[LG] Simplifying Model-based RL: Learning Representations, Latent-space Models, and Policies with One Objective
R Ghugare, H Bharadhwaj, B Eysenbach, S Levine, R Salakhutdinov
[VNIT Nagpur & CMU & UC Berkeley]
While reinforcement learning (RL) methods that learn an internal model of the environment have the potential to be more sample efficient than their model-free counterparts, learning to model raw observations from high dimensional sensors can be challenging. Prior work has addressed this challenge by learning lowdimensional representation of observations through auxiliary objectives, such as reconstruction or value prediction. However, the alignment between these auxiliary objectives and the RL objective is often unclear. In this work, we propose a single objective which jointly optimizes a latent-space model and policy to achieve high returns while remaining self-consistent. This objective is a lower bound on expected returns. Unlike prior bounds for model-based RL on policy exploration or model guarantees, our bound is directly on the overall RL objective. We demonstrate that the resulting algorithm matches or improves the sample-efficiency of the best prior model-based and model-free RL methods. While such sample efficient methods typically are computationally demanding, our method attains the performance of SAC in about 50% less wall-clock time.
5、[CV] DifferSketching: How Differently Do People Sketch 3D Objects?
C Xiao, W Su, J Liao, Z Lian, Y Song, H Fu
[City University of Hong Kong & Peking University & University of Surrey]
Multiple sketch datasets have been proposed to understand how people draw 3D objects. However, such datasets are often of small scale and cover a small set of objects or categories. In addition, these datasets contain freehand sketches mostly from expert users, making it difficult to compare the drawings by expert and novice users, while such comparisons are critical in informing more effective sketch-based interfaces for either user groups. These observations motivate us to analyze how differently people with and without adequate drawing skills sketch 3D objects. We invited 70 novice users and 38 expert users to sketch 136 3D objects, which were presented as 362 images rendered from multiple views. This leads to a new dataset of 3,620 freehand multi-view sketches, which are registered with their corresponding 3D objects under certain views. Our dataset is an order of magnitude larger than the existing datasets. We analyze the collected data at three levels, i.e., sketch-level, stroke-level, and pixel-level, under both spatial and temporal characteristics, and within and across groups of creators. We found that the drawings by professionals and novices show significant differences at stroke-level, both intrinsically and extrinsically. We demonstrate the usefulness of our dataset in two applications: (i) freehand-style sketch synthesis, and (ii) posing it as a potential benchmark for sketch-based 3D reconstruction. Our dataset and code are available at this https URL.
[CV] SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation
M Guo, C Lu, Q Hou, Z Liu, M Cheng, S Hu
[Tsinghua University & Nankai University & Fitten Tech]
[LG] Optimal Scaling for Locally Balanced Proposals in Discrete Spaces
H Sun, H Dai, D Schuurmans
[Georgia Tech & Google Brain]
Optimal scaling has been well studied for Metropolis-Hastings (M-H) algorithms in continuous spaces, but a similar understanding has been lacking in discrete spaces. Recently, a family of locally balanced proposals (LBP) for discrete spaces has been proved to be asymptotically optimal, but the question of optimal scaling has remained open. In this paper, we establish, for the first time, that the efficiency of M-H in discrete spaces can also be characterized by an asymptotic acceptance rate that is independent of the target distribution. Moreover, we verify, both theoretically and empirically, that the optimal acceptance rates for LBP and random walk Metropolis (RWM) are 0.574 and 0.234 respectively. These results also help establish that LBP is asymptotically O(N23) more efficient than RWM with respect to model dimension N. Knowledge of the optimal acceptance rate allows one to automatically tune the neighborhood size of a proposal distribution in a discrete space, directly analogous to step-size control in continuous spaces. We demonstrate empirically that such adaptive M-H sampling can robustly improve sampling in a variety of target distributions in discrete spaces, including training deep energy based models.
[CV] ActiveNeRF: Learning where to See with Uncertainty Estimation
X Pan, Z Lai, S Song, G Huang
[Tsinghua University & CMU]
[LG] On the Theoretical Properties of Noise Correlation in Stochastic Optimization
A Lucchi, F Proske, A Orvieto, F Bach, H Kersting
[University of Basel & University of Oslo & ETH Zürich & PSL Research University]https://arxiv.org/abs/2209.09162