LG - 机器学习   CV - 计算机视觉   CL - 计算与语言   AS - 音频与语音 RO - 机器人




1、[LG] A Generalist Neural Algorithmic Learner

B Ibarz, V Kurin, G Papamakarios...
[DeepMind & University of Oxford & IDSIA & Mila & Purdue University]

The cornerstone of neural algorithmic reasoning is the ability to solve algorithmic tasks, especially in a way that generalises out of distribution. While recent years have seen a surge in methodological improvements in this area, they mostly focused on building specialist models. Specialist models are capable of learning to neurally execute either only one algorithm or a collection of algorithms with identical control-flow backbone. Here, instead, we focus on constructing a generalist neural algorithmic learner—a single graph neural network processor capable of learning to execute a wide range of algorithms, such as sorting, searching, dynamic programming, path-finding and geometry. We leverage the CLRS benchmark to empirically show that, much like recent successes in the domain of perception, generalist algorithmic learners can be built by "incorporating" knowledge. That is, it is possible to effectively learn algorithms in a multi-task manner, so long as we can learn to execute them well in a single-task regime. Motivated by this, we present a series of improvements to the input representation, training regime and processor architecture over CLRS, improving average single-task performance by over 20% from prior art. We then conduct a thorough ablation of multi-task learners leveraging these improvements. Our results demonstrate a generalist learner that effectively incorporates knowledge captured by specialist models.



2、[LG] SPICE, A Dataset of Drug-like Molecules and Peptides for Training Machine Learning Potentials

P Eastman, P K Behara...
[Stanford University & University of California, Irvine & Open Molecular Software Foundation & ...]

Machine learning potentials are an important tool for molecular simulation, but their development is held back by a shortage of high quality datasets to train them on. We describe the SPICE dataset, a new quantum chemistry dataset for training potentials relevant to simulating drug-like small molecules interacting with proteins. It contains over 1.1 million conformations for a diverse set of small molecules, dimers, dipeptides, and solvated amino acids. It includes 15 elements, charged and uncharged molecules, and a wide range of covalent and non-covalent interactions. It provides both forces and energies calculated at the ωB97M-D3(BJ)/def2-TZVPPD level of theory, along with other useful quantities such as multipole moments and bond orders. We train a set of machine learning potentials on it and demonstrate that they can achieve chemical accuracy across a broad region of chemical space. It can serve as a valuable resource for the creation of transferable, ready to use potential functions for use in molecular simulations.



3、[CL] Efficient Few-Shot Learning Without Prompts

L Tunstall, N Reimers, U E S Jo...
[Hugging Face & cohere.ai & Technical University of Darmstadt & Intel Labs]
无提示的高效少样本学习。最近的一些方法,如参数高效微调(PEFT)和模式利用训练(PET),在标签稀缺的情况下取得了令人印象深刻的结果。然而,它们很难被采用,因为它们受制于手工生成的提示的高变化性,并且通常需要十亿个参数的语言模型来实现高精确度。为解决这些缺点,本文提出SETFIT(Sentence Transformer Finetuning),一种高效的、无提示的框架,用于对句子Transformer(ST)进行少样本微调。SETFIT的工作方式是首先在少量文本对上以对比Siamese的方式对预训练的ST进行微调。然后,产生的模型被用来生成丰富的文本嵌入,这些嵌入被用来训练一个分类头。该简单框架不需要任何提示或口头语,并且以比现有技术少几个数量级的参数实现了高精度。实验表明,SETFIT获得了与PEFT和PET技术相当的结果,同时其训练速度也快了一个数量级。SETFIT可以通过简单地切换ST主体而应用于多语言环境。

Recent few-shot methods, such as parameter efficient fine-tuning (PEFT) and pattern exploiting training (PET), have achieved impressive results in label-scarce settings. However, they are difficult to employ since they are subject to high variability from manually crafted prompts, and typically require billionparameter language models to achieve high accuracy. To address these shortcomings, we propose SETFIT (Sentence Transformer Finetuning), an efficient and prompt-free framework for few-shot fine-tuning of Sentence Transformers (ST). SETFIT works by first finetuning a pretrained ST on a small number of text pairs, in a contrastive Siamese manner. The resulting model is then used to generate rich text embeddings, which are used to train a classification head. This simple framework requires no prompts or verbalizers, and achieves high accuracy with orders of magnitude less parameters than existing techniques. Our experiments show that SETFIT obtains comparable results with PEFT and PET techniques, while being an order of magnitude faster to train. We also show that SETFIT can be applied in multilingual settings by simply switching the ST body. Our code1 and datasets2 are made publicly available.



4、[LG] Continuous Mixtures of Tractable Probabilistic Models

A H.C. Correia, G Gala, E Quaeghebeur, C d Campos, R Peharz
[Eindhoven University of Technology]

Probabilistic models based on continuous latent spaces, such as variational autoencoders, can be understood as uncountable mixture models where components depend continuously on the latent code. They have proven expressive tools for generative and probabilistic modelling, but are at odds with tractable probabilistic inference, that is, computing marginals and conditionals of the represented probability distribution. Meanwhile, tractable probabilistic models such as probabilistic circuits (PCs) can be understood as hierarchical discrete mixture models, which allows them to perform exact inference, but often they show subpar performance in comparison to continuous latent-space models. In this paper, we investigate a hybrid approach, namely continuous mixtures of tractable models with a small latent dimension. While these models are analytically intractable, they are well amenable to numerical integration schemes based on a finite set of integration points. With a large enough number of integration points the approximation becomes de-facto exact. Moreover, using a finite set of integration points, the approximation method can be compiled into a PC performing `exact inference in an approximate model'. In experiments, we show that this simple scheme proves remarkably effective, as PCs learned this way set new state-of-the-art for tractable models on many standard density estimation benchmarks.



5、[LG] An Analysis of Ensemble Sampling

C Qin, Z Wen, X Lu, B V Roy
[Columbia University & DeepMind]

Ensemble sampling serves as a practical approximation to Thompson sampling when maintaining an exact posterior distribution over model parameters is computationally intractable. In this paper, we establish a Bayesian regret bound that ensures desirable behavior when ensemble sampling is applied to the linear bandit problem. This represents the first rigorous regret analysis of ensemble sampling and is made possible by leveraging information-theoretic concepts and novel analytic techniques that may prove useful beyond the scope of this paper.




[LG] Transformers in Time Series: A Survey

Q Wen, T Zhou, C Zhang, W Chen, Z Ma, J Yan, L Sun
[Alibaba Group & Shanghai Jiao Tong University]


[CV] UniColor: A Unified Framework for Multi-Modal Colorization with Transformer

Z Huang, N Zhao, J Liao
[City University of Hong Kong & University of Bath]


[CL] LINGUIST: Language Model Instruction Tuning to Generate Annotated Utterances for Intent Classification and Slot Tagging

A Rosenbaum, S Soltan, W Hamza, Y Versley, M Boese


[LG] Deep Linear Networks can Benignly Overfit when Shallow Ones Do

N S. Chatterji, P M. Long
[Stanford University & Google]


