LG - 机器学习 CV - 计算机视觉 CL - 计算与语言 AS - 音频与语音 RO - 机器人 GR - 图形学
1、[CV] 3D Neural Field Generation using Triplane Diffusion
J. R Shue, E R Chan, R Po, Z Ankner, J Wu, G Wetzstein
[Milton Academy & Stanford University & MIT]
Diffusion models have emerged as the state-of-the-art for image generation, among other tasks. Here, we present an efficient diffusion-based model for 3D-aware generation of neural fields. Our approach pre-processes training data, such as ShapeNet meshes, by converting them to continuous occupancy fields and factoring them into a set of axis-aligned triplane feature representations. Thus, our 3D training scenes are all represented by 2D feature planes, and we can directly train existing 2D diffusion models on these representations to generate 3D neural fields with high quality and diversity, outperforming alternative approaches to 3D-aware generation. Our approach requires essential modifications to existing triplane factorization pipelines to make the resulting features easy to learn for the diffusion model. We demonstrate state-of-the-art results on 3D generation on several object classes from ShapeNet.
2、[CV] Exploiting Category Names for Few-Shot Classification with Vision-Language Models
T Xiao, Z Wang, L Cao, J Yu, S Dai, M Yang
[Google & University of California, Merced & Apple]
视觉语言模型利用分类名称进行初始化的少样本分类。在大规模数据上预训练的视觉语言基础模型,为许多视觉理解任务提供了一个强大的工具。值得注意的是,许多视觉语言模型建立了两个编码器(视觉和文本),可以将两种模态映射到同一嵌入空间。因此,学到的表示在图像分类等任务上取得了良好的零样本性能。然而,当每类别只有几个样本时,大型视觉-语言模型的潜力往往表现不佳,这主要是由于大量的参数和相对较少的训练数据之间的差距。本文表明,可以通过利用分类名称来初始化分类头,从而显著提高少样本分类性能。更有趣的是,与随机初始化相比,可以借用非完美的分类名称,甚至是来自外语的名称,来提高少样本分类性能。通过所提出的分类名称初始化方法,所提出模型在一些少样本图片分类基准上获得了最先进的性能(例如,在ImageNet上获得87.37%,在Stanford Cars上获得96.08%,都是用五张图片进行学习)。本文还调查和分析了分类名称的收益何时减少,以及如何用蒸馏来提高小型模型的性能,为未来的研究提供了指导。
Vision-language foundation models pretrained on large-scale data provide a powerful tool for many visual understanding tasks. Notably, many vision-language models build two encoders (visual and textual) that can map two modalities into the same embedding space. As a result, the learned representations achieve good zero-shot performance on tasks like image classification. However, when there are only a few examples per category, the potential of large vision-language models is often underperformed, mainly due to the gap between a large number of parameters and a relatively small amount of training data. This paper shows that we can significantly improve the performance of few-shot classification by using the category names to initialize the classification head. More interestingly, we can borrow the non-perfect category names, or even names from a foreign language, to improve the few-shot classification performance compared with random initialization. With the proposed category name initialization method, our model obtains the state-of-the-art performance on a number of few-shot image classification benchmarks (e.g., 87.37\% on ImageNet and 96.08\% on Stanford Cars, both using five-shot learning). We also investigate and analyze when the benefit of category names diminishes and how to use distillation to improve the performance of smaller models, providing guidance for future research.
3、[LG] Fast Inference from Transformers via Speculative Decoding
Y Leviathan, M Kalman, Y Matias
[Google Research]
基于推测性解码的Transformer快速推断。从大型自回归模型(如Transformer)进行推断是很慢的——对K个Token进行解码需要对模型进行K次连续运行。本文提出推测性解码——一种在不改变输出的情况下通过并行计算多个Token从自回归模型快速采样的算法。该方法的核心是:(1) 困难的语言建模任务通常包括较容易的子任务,这些子任务可以由更有效的模型来近似,以及(2) 用推测性执行和新的采样方法,可以使大型模型的精确解码更快,通过在近似模型的输出上并行运行,能同时产生多个Token,并且不改变分布。该方法支持现有的模型,无需重新训练或改变结构。本文在T5-XXL上进行了演示,与标准的T5X实现相比,在输出相同的情况下,显示了2倍到3倍的加速。
Inference from large autoregressive models like Transformers is slow - decoding K tokens takes K serial runs of the model. In this work we introduce speculative decoding - an algorithm to sample from autoregressive models faster without any changes to the outputs, by computing several tokens in parallel. At the heart of our approach lie the observations that (1) hard language-modeling tasks often include easier subtasks that can be approximated well by more efficient models, and (2) using speculative execution and a novel sampling method, we can make exact decoding from the large models faster, by running them in parallel on the outputs of the approximation models, potentially generating several tokens concurrently, and without changing the distribution. Our method supports existing off-the-shelf models without retraining or architecture changes. We demonstrate it on T5-XXL and show a 2X-3X acceleration compared to the standard T5X implementation, with identical outputs.
4、[CL] CREPE: Open-Domain Question Answering with False Presuppositions
X V Yu, S Min, L Zettlemoyer, H Hajishirzi
[University of Washington & Allen Institute for Artificial Intelligence]
Information seeking users often pose questions with false presuppositions, especially when asking about unfamiliar topics. Most existing question answering (QA) datasets, in contrast, assume all questions have well defined answers. We introduce CREPE, a QA dataset containing a natural distribution of presupposition failures from online information-seeking forums. We find that 25% of questions contain false presuppositions, and provide annotations for these presuppositions and their corrections. Through extensive baseline experiments, we show that adaptations of existing open-domain QA models can find presuppositions moderately well, but struggle when predicting whether a presupposition is factually correct. This is in large part due to difficulty in retrieving relevant evidence passages from a large text corpus. CREPE provides a benchmark to study question answering in the wild, and our analyses provide avenues for future work in better modeling and further studying the task.
5、[LG] Reinforced Genetic Algorithm for Structure-based Drug Design
T Fu, W Gao, C W. Coley, J Sun
[Georgia Institute of Technology & MIT & University of Illinois at Urbana-Champaign]
Structure-based drug design (SBDD) aims to discover drug candidates by finding molecules (ligands) that bind tightly to a disease-related protein (targets), which is the primary approach to computer-aided drug discovery. Recently, applying deep generative models for three-dimensional (3D) molecular design conditioned on protein pockets to solve SBDD has attracted much attention, but their formulation as probabilistic modeling often leads to unsatisfactory optimization performance. On the other hand, traditional combinatorial optimization methods such as genetic algorithms (GA) have demonstrated state-of-the-art performance in various molecular optimization tasks. However, they do not utilize protein target structure to inform design steps but rely on a random-walk-like exploration, which leads to unstable performance and no knowledge transfer between different tasks despite the similar binding physics. To achieve a more stable and efficient SBDD, we propose Reinforced Genetic Algorithm (RGA) that uses neural models to prioritize the profitable design steps and suppress random-walk behavior. The neural models take the 3D structure of the targets and ligands as inputs and are pre-trained using native complex structures to utilize the knowledge of the shared binding physics from different targets and then fine-tuned during optimization. We conduct thorough empirical studies on optimizing binding affinity to various disease targets and show that RGA outperforms the baselines in terms of docking scores and is more robust to random initializations. The ablation study also indicates that the training on different targets helps improve performance by leveraging the shared underlying physics of the binding processes. The code is available at this https URL.
[CV] One is All: Bridging the Gap Between Neural Radiance Fields Architectures with Progressive Volume Distillation
S Fang, W Xu, H Wang, Y Yang, Y Wang, S Zhou
[Beihang University & Megvii Research]
[LG] Coder Reviewer Reranking for Code Generation
T Zhang, T Yu, T B. Hashimoto, M Lewis, W Yih, D Fried, S I. Wang
[Meta AI & Stanford University &The University of Hong Kong & CMU]
[CV] CLIPascene: Scene Sketching with Different Types and Levels of Abstraction
Y Vinker, Y Alaluf, D Cohen-Or, A Shamir
[Tel Aviv University & Reichman University]
[LG] Out-Of-Distribution Detection Is Not All You Need
J Guerin, K Delmas, R S Ferreira, J Guiochet
[Universite de Toulouse & ONERA]