
LG - 机器学习   CV - 计算机视觉   CL - 计算与语言   AS - 音频与语音 RO - 机器人



1、[LG] Online internal speech decoding from single neurons in a human participant

S K. Wandelt,  D A. Bjånes, K Pejsa, B Lee, C Liu, R A. Andersen
[California Institute of Technology]
人类被试者单体神经元在线内在语音解码。语音脑机接口(BMI)将大脑信号转化为文字或音频输出,使因疾病或受伤而失去语言能力的人能够进行交流。虽然在发声、尝试和模拟语音解码方面已经取得了重要进展,但内在语音解码的结果却很少。语音解码方面取得了重要进展,但内部语音解码的结果却很少,而且还没有实现高功能性。值得注意的是,目前还不清楚从哪些大脑区域可以对内在语音进行解码。本文中,一位四肢瘫痪的参与者被植入了位于边上回(SMG)和初级体感皮层(S1)的微电极阵,进行了六个单词和两个假词的内在和发声讲话。结果发现,从SMG单神经的内在语音解码,在一个在线任务中实现了高达91%的分类准确性。在线任务(偶然水平为12.5%)。有证据表明内在语音、单词阅读和发声语音过程之间有共同的神经表征。SMG代表了不同语言(英语/西班牙语)中的单词。 语言(英语/西班牙语)以及假词,提供了语音编码的证据。此外,该解码器通过多种内部语音策略(听觉想象/视觉想象)实现了高分类。S1的活动受到发声的调节,但没有受到内在语音的调节。 这表明,在内在语音产生过程中,没有发生声道的发音器运动。本文工作代表了第一个高性能内在语音BMI的概念证明。

Speech brain-machine interfaces (BMI’s) translate brain signals into words or audio outputs, enabling communication for people having lost their speech abilities due to diseases or injury. While important advances in vocalized, attempted, and mimed speech decoding have been achieved, results for internal speech decoding are sparse, and have yet to achieve high functionality. Notably, it is still unclear from which brain areas internal speech can be decoded. In this work, a tetraplegic participant with implanted microelectrode arrays located in the supramarginal gyrus (SMG) and primary somatosensory cortex (S1) performed internal and vocalized speech of six words and two pseudowords. We found robust internal speech decoding from SMG single neuron activity, achieving up to 91% classification accuracy during an online task (chance level 12.5%). Evidence of shared neural representations between internal speech, word reading, and vocalized speech processes were found. SMG represented words in differentlanguages (English/ Spanish) as well as pseudowords, providing evidence for phonetic encoding. Furthermore, our decoder achieved high classification with multiple internal speech strategies (auditoryimagination/ visual imagination). Activity in S1 was modulated by vocalized but not internal speech,suggesting no articulator movements of the vocal tract occurred during internal speech production. This works represents the first proof-of-concept for a high-performance internal speech BMI.



2、[LG] Efficiently Scaling Transformer Inference

R Pope, S Douglas, A Chowdhery, J Devlin, J Bradbury, A Levskaya, J Heek, K Xiao, S Agrawal, J Dean
Transformer推理高效扩展。本文研究了Transformer模型的高效生成式推理问题,在其最具挑战性的设置中:大型深度模型,具有严格的延迟目标和长序列长度。更好地理解基于Transformer的大型模型推理的工程权衡是非常重要的,因为这些模型的用例在整个应用领域都在快速增长。本文开发了一个简单的推理效率分析模型,根据应用要求选择为TPU v4切片优化的最佳多维分割技术。将其与一套低级别的优化结合起来,在500B以上参数模型的延迟和模型FLOPS利用率(MFU)的权衡上实现了新的帕累托前沿,超过了FasterTransformer系列基准测试。本文进一步表明,通过适当的分区,多查询注意力(即多个查询头共享单个键/值头)的较低内存要求使得扩展到32倍的上下文长度。最后,本文在生成过程中实现了每个token 29ms的低批量大小的延迟(使用int8权重量化),在大批量处理输入token的过程中实现了76%的MFU,同时支持PaLM 540B参数模型上的2048个token的长上下文。

We study the problem of efficient generative inference for Transformer models, in one of its most challenging settings: large deep models, with tight latency targets and long sequence lengths. Better understanding of the engineering tradeoffs for inference for large Transformer-based models is important as use cases of these models are growing rapidly throughout application areas. We develop a simple analytical model for inference efficiency to select the best multi-dimensional partitioning techniques optimized for TPU v4 slices based on the application requirements. We combine these with a suite of low-level optimizations to achieve a new Pareto frontier on the latency and model FLOPS utilization (MFU) tradeoffs on 500B+ parameter models that outperforms the FasterTransformer suite of benchmarks. We further show that with appropriate partitioning, the lower memory requirements of multiquery attention (i.e. multiple query heads share single key/value head) enables scaling up to 32x larger context lengths. Finally, we achieve a low-batch-size latency of 29ms per token during generation (using int8 weight quantization) and a 76% MFU during large-batch-size processing of input tokens, while supporting a long 2048-token context length on the PaLM 540B parameter model.



3、[LG] WeightedSHAP: analyzing and improving Shapley based feature attributions

Y Kwon, J Zou
[Columbia University & Stanford University]

Shapley value is a popular approach for measuring the influence of individual features. While Shapley feature attribution is built upon desiderata from game theory, some of its constraints may be less natural in certain machine learning settings, leading to unintuitive model interpretation. In particular, the Shapley value uses the same weight for all marginal contributions -- i.e. it gives the same importance when a large number of other features are given versus when a small number of other features are given. This property can be problematic if larger feature sets are more or less informative than smaller feature sets. Our work performs a rigorous analysis of the potential limitations of Shapley feature attribution. We identify simple settings where the Shapley value is mathematically suboptimal by assigning larger attributions for less influential features. Motivated by this observation, we propose WeightedSHAP, which generalizes the Shapley value and learns which marginal contributions to focus directly from data. On several real-world datasets, we demonstrate that the influential features identified by WeightedSHAP are better able to recapitulate the model's predictions compared to the features identified by the Shapley value.



4、[LG] Multi-Layered Maps of Neuropil with Segmentation-Guided Contrastive Learning

S Dorkenwald,  P H. Li,  M Januszewski...
[Google Research]

Maps of the nervous system that identify individual cells along with their type, subcellular components, and connectivity have the potential to reveal fundamental organizational principles of neural circuits. Volumetric nanometer-resolution imaging of brain tissue provides the raw data needed to build such maps, but inferring all the relevant cellular and subcellular annotation layers is challenging. Here, we present Segmentation-Guided Contrastive Learning of Representations (“SegCLR”), a self-supervised machine learning technique that produces highly informative representations of cells directly from 3d electron microscope imagery and segmentations. When applied to volumes of human and mouse cerebral cortex, SegCLR enabled the classification of cellular subcompartments (axon, dendrite, soma, astrocytic process) with 4,000-fold less labeled data compared to fully supervised approaches. Surprisingly, SegCLR also enabled inference of cell types (neurons, glia, and subtypes of each) from fragments with lengths as small as 10 micrometers, a task that can be difficult for humans to perform and whose feasibility greatly enhances the utility of imaging portions of brains in which many neuron fragments terminate at a volume boundary. These predictions were further augmented via Gaussian process uncertainty estimation to enable analyses restricted to high confidence subsets of the data. Finally, SegCLR enabled detailed exploration of layer-5 pyramidal cell subtypes and automated large-scale statistical analysis of upstream and downstream synaptic partners in mouse visual cortex.



5、[CL] BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

T L Scao, A Fan, C Akiki, E Pavlick, S Ilić, D Hesslow, R Castagné...

Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.





[AI] Humans decompose tasks by trading off utility and computational cost

C G. Correa, M K. Ho, F Callaway, N D. Daw, T L. Griffiths
[Princeton University]


[CL] Creative Writing with an AI-Powered Writing Assistant: Perspectives from Professional Writers

D Ippolito, A Yuan, A Coenen, S Burnam
[Google Research]


[CV] TAP-Vid: A Benchmark for Tracking Any Point in a Video

C Doersch, A Gupta, L Markeeva, A Recasens, L Smaira, Y Aytar, J Carreira, A Zisserman, Y Yang


[RO] StructDiffusion: Object-Centric Diffusion for Semantic Rearrangement of Novel Objects

W Liu, T Hermans, S Chernova, C Paxton
[Georgia Tech & University of Utah & Meta AI]



