爱可可AI前沿推介(12.14)

LG - 机器学习 CV - 计算机视觉 CL - 计算与语言 AS - 音频与语音 RO - 机器人

转自爱可可爱生活

1、[LG] Self-attention Does Not Need O(n^2) Memory

M N. Rabe, C Staats

[Google Research]

自注意力不需要O(n^2)内存。本文提出一种非常简单的注意力算法，相对于序列长度，只需要O(1)内存，而自注意力的扩展则需要O(log n)内存。这与之前自注意力需要O(n^2)内存的观点不同。虽然时间复杂度仍然是O(n^2)，但设备内存而不是计算能力往往是现代加速器的限制因素。因此，减少注意力的内存需求可以处理比其他方式可行的更长的序列。本文为加速器提供了一个实际的实现，只需要O( √ n)的内存，在数值上是稳定的，并且只需要注意力标准实现运行时间的百分之几。还演示了如何在保持内存效率的同时对该函数进行微分。对于序列长度为16384的情况，自注意力的内存开销在推理时减少了59倍，在微分时减少32倍。

We present a very simple algorithm for attention that requiresO(1) memory with respect to sequence length and an extension to self-attention that requires O(log n) memory. This is in contrast with the frequently stated belief that self-attention requires O(n) memory. While the time complexity is still O(n), device memory rather than compute capability is often the limiting factor on modern accelerators. Thus, reducing the memory requirements of attention allows processing of longer sequences than might otherwise be feasible. We provide a practical implementation for accelerators that requires O( √ n) memory, is numerically stable, and is within a few percent of the runtime of the standard implementation of attention. We also demonstrate how to differentiate the function while remaining memory-efficient. For sequence length 16384, the memory overhead of self-attention is reduced by 59X for inference and by 32X for differentiation.

2、[CV] CityNeRF: Building NeRF at City Scale

Y Xiangli, L Xu, X Pan, N Zhao, A Rao, C Theobalt, B Dai, D Lin

[The Chinese University of Hong Kong & Max Planck Institute for Informatics & Nanyang Technological University]

CityNeRF：打造城市级NeRF。神经辐射场(NeRF)在对3D物体和受控场景进行建模方面取得了卓越的性能，通常是在单一尺度下。本文首次尝试将NeRF带入城市级规模，其视角从捕捉城市概况的卫星层面，到显示建筑复杂细节的地面图像。摄像机与场景的距离跨度很大，产生了具有不同细节和空间覆盖水平的多尺度数据，这给vanilla NeRF带来了巨大的挑战，并使其偏向于妥协的结果。为了解决这些问题，本文提出了CityNeRF，一种渐进式学习模式，可以同步增长NeRF模型和训练集。随着训练的进行，从用一个浅的基础块来拟合远处的视图开始，新的块被添加以适应越来越近的视图中出现的细节。该策略有效激活了位置编码中的高频通道，随着训练的进行展开了更复杂的细节。本文证明了CityNeRF在模拟不同城市规模的场景时的优越性，以及它对不同细节层次的渲染的支持。

Neural Radiance Field (NeRF) has achieved outstanding performance in modeling 3D objects and controlled scenes, usually under a single scale. In this work, we make the first attempt to bring NeRF to city-scale, with views ranging from satellite-level that captures the overview of a city, to ground-level imagery showing complex details of an architecture. The wide span of camera distance to the scene yields multi-scale data with different levels of detail and spatial coverage, which posts great challenges to vanilla NeRF and biases it towards compromised results. To address these issues, we introduce CityNeRF, a progressive learning paradigm that grows the NeRF model and training set synchronously. Starting from fitting distant views with a shallow base block, as training progresses, new blocks are appended to accommodate the emerging details in the increasingly closer views. The strategy effectively activates high-frequency channels in the positional encoding and unfolds more complex details as the training proceeds. We demonstrate the superiority of CityNeRF in modeling diverse city-scale scenes with drastically varying views, and its support for rendering views in different levels of detail. Project page can be found in CityNeRF.

3、[LG] Revisiting dequantization and quantum advantage in learning tasks

J Cotler, H Huang, J R. McClean

[Harvard Society of Fellows & Caltech & Google Quantum AI,]

学习任务去量子化和量子优势的重新审视。已有研究表明，一些量子机器学习算法的明显优势可以通过适当的数据访问用经典算法有效地复制——这一过程被称为去量子化。现有的关于去量子化的工作比较了量子算法和经典算法，前者以n比特量子态|x⟩=∑ixi|i⟩为输入，后者对向量进行采样和查询(SQ)访问。因为经典算法是量子算法的一个子集，证明了SQ访问有时会比量子态输入明显更强大。本文发现表明，在一些学习任务中没有指数级的量子优势，可能是由于SQ访问相对于量子态输入太强大。如果把有量子态输入的量子算法和有量子态测量数据的经典算法进行比较，量子优势的情况可能会有很大的不同。本文注意到，当量子态由指数级大小的经典数据构建时，比较SQ访问和量子态输入是合适的，因为两者都需要指数级的时间来准备。

It has been shown that the apparent advantage of some quantum machine learning algorithms may be efficiently replicated using classical algorithms with suitable data access -- a process known as dequantization. Existing works on dequantization compare quantum algorithms which take copies of an n-qubit quantum state |x⟩=∑ixi|i⟩ as input to classical algorithms which have sample and query (SQ) access to the vector x. In this note, we prove that classical algorithms with SQ access can accomplish some learning tasks exponentially faster than quantum algorithms with quantum state inputs. Because classical algorithms are a subset of quantum algorithms, this demonstrates that SQ access can sometimes be significantly more powerful than quantum state inputs. Our findings suggest that the absence of exponential quantum advantage in some learning tasks may be due to SQ access being too powerful relative to quantum state inputs. If we compare quantum algorithms with quantum state inputs to classical algorithms with access to measurement data on quantum states, the landscape of quantum advantage can be dramatically different. We remark that when the quantum states are constructed from exponential-size classical data, comparing SQ access and quantum state inputs is appropriate since both require exponential time to prepare.

4、[AI] Where is Memory Information Stored in the Brain?

J Tee, D P. Taylor

[The New School for Social Research & University of Canterbury]

大脑在哪储存记忆信息？在科学研究界，人们普遍认为大脑中的记忆信息储存在突触中——这是心理学家唐纳德-赫伯提出的一个著名假设。然而，有越来越多的人认为，记忆是储存在神经元内部的分子(RNA或DNA)水平上的——这是由心理学家Randy Gallistel提出的另一种假设，被称为细胞内生假说。本文回顾了来自争论双方的一些关键实验证据。从Eric Kandel对海蛞蝓的研究开始，它为支持突触假说提供了第一个证据。接下来，提到了John O'Keefe(陈述性记忆和海马体)和Joseph LeDoux（程序性恐惧记忆和杏仁核)的小鼠实验。然后，介绍了突触作为当今人工智能神经网络的基本构建块。之后，介绍了David Glanzman对海蛞蝓记忆存储和突触变化的分离研究，以及Susumu Tonegawa利用激光重新激活小鼠逆行性失忆的实验。在此基础上，强调了Germund Hesslow关于雪貂的条件性停顿的实验，以及Beatrice Gelber关于无突触单细胞生物(Paramecium aurelia)的条件性实验。随后描述了David Glanzman关于用RNA在海蛞蝓之间移植记忆的实验。最后，概述了Brian Dias和Kerry Ressler关于DNA将小鼠的恐惧从父母迁移到后代的实验。最后，总结了对更广泛的心理学领域的一些潜在影响。

Within the scientific research community, memory information in the brain is commonly believed to be stored in the synapse - a hypothesis famously attributed to psychologist Donald Hebb. However, there is a growing minority who postulate that memory is stored inside the neuron at the molecular (RNA or DNA) level - an alternative postulation known as the cell-intrinsic hypothesis, coined by psychologist Randy Gallistel. In this paper, we review a selection of key experimental evidence from both sides of the argument. We begin with Eric Kandel's studies on sea slugs, which provided the first evidence in support of the synaptic hypothesis. Next, we touch on experiments in mice by John O'Keefe (declarative memory and the hippocampus) and Joseph LeDoux (procedural fear memory and the amygdala). Then, we introduce the synapse as the basic building block of today's artificial intelligence neural networks. After that, we describe David Glanzman's study on dissociating memory storage and synaptic change in sea slugs, and Susumu Tonegawa's experiment on reactivating retrograde amnesia in mice using laser. From there, we highlight Germund Hesslow's experiment on conditioned pauses in ferrets, and Beatrice Gelber's experiment on conditioning in single-celled organisms without synapses (Paramecium aurelia). This is followed by a description of David Glanzman's experiment on transplanting memory between sea slugs using RNA. Finally, we provide an overview of Brian Dias and Kerry Ressler's experiment on DNA transfer of fear in mice from parents to offspring. We conclude with some potential implications for the wider field of psychology.

5、[CV] Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval

N Shvetsova, B Chen, A Rouditchenko, S Thomas, B Kingsbury, R Feris, D Harwath, J Glass, H Kuehne

[Goethe University Frankfurt & Columbia University & MIT CSAIL & IBM Research AI & UT Austin]

一次性搞定：面向视频检索的多模态融合Transformer。最近，从视频数据中进行的多模态学习受到了越来越多的关注，因为它可以在不需要人工标注的情况下训练出带有语义的嵌入，实现了零样本检索和分类等任务。本文提出了一种多模态的、与模态无关的融合Transformer的方法，可以学习在多种模态之间交换信息，如视频、音频和文本，并将它们整合到一个联合多模态表示中，以获得一个聚合了多模态时间信息的嵌入。本文提出用组合损失来训练系统，一次性对所有进行训练，单模态以及成对的模态，明确地撇开任何附加的东西，如位置或模态编码。在测试时，产生的模型可以处理和融合任意数量的输入模态。此外，Transformer的隐含属性允许处理不同长度的输入。为评估所提出的方法，在大规模HowTo100M数据集上训练模型，并在四个具有挑战性的基准数据集上评估所产生的嵌入空间，在零样本视频检索和零样本视频动作定位方面获得最先进的结果。

Multi-modal learning from video data has seen increased attention recently as it allows to train semantically meaningful embeddings without human annotation enabling tasks like zero-shot retrieval and classification. In this work, we present a multi-modal, modality agnostic fusion transformer approach that learns to exchange information between multiple modalities, such as video, audio, and text, and integrate them into a joined multi-modal representation to obtain an embedding that aggregates multi-modal temporal information. We propose to train the system with a combinatorial loss on everything at once, single modalities as well as pairs of modalities, explicitly leaving out any add-ons such as position or modality encoding. At test time, the resulting model can process and fuse any number of input modalities. Moreover, the implicit properties of the transformer allow to process inputs of different lengths. To evaluate the proposed approach, we train the model on the large scale HowTo100M dataset and evaluate the resulting embedding space on four challenging benchmark datasets obtaining state-of-the-art results in zero-shot video retrieval and zero-shot video action localization.

另外几篇值得关注的论文：

[CV] Come-Closer-Diffuse-Faster: Accelerating Conditional Diffusion Models for Inverse Problems through Stochastic Contraction

Come-Closer-Diffuse-Faster：通过随机收缩加速逆问题的条件扩散模型

H Chung, B Sim, J C Ye

[KAIST]

[CV] MAGMA -- Multimodal Augmentation of Generative Models through Adapter-based Finetuning

MAGMA——通过基于适配器的微调实现生成模型的多模态增强

C Eichenberg, S Black, S Weinbach, L Parcalabescu, A Frank

[Aleph Alpha & Heidelberg University]

[CV] CLIP2StyleGAN: Unsupervised Extraction of StyleGAN Edit Directions

CLIP2StyleGAN：StyleGAN编辑方向的无监督提取

R Abdal, P Zhu, J Femiani, N J. Mitra, P Wonka

[KAUST & Miami University & UCL]

[CV] FaceFormer: Speech-Driven 3D Facial Animation with Transformers

FaceFormer：基于Transformer的语音驱动3D脸动画

Y Fan, Z Lin, J Saito, W Wang, T Komura

[The University of Hong Kong & The Hong Kong University of Science and Technology & Adobe Research]

内容中包含的图片若涉及版权问题，请及时与我们联系删除