爱可可AI前沿推介(9.30)

LG - 机器学习 CV - 计算机视觉 CL - 计算与语言 AS - 音频与语音 RO - 机器人

转自爱可可爱生活

摘要：无文本视觉-语言Transformer、深度神经网络低秩训练探索、Reward Hacking的定义和刻画、基于语言学的抗体语言形式化、几何复杂度的正则化器、有效且高效的检索增强文本生成、基于快速标签传播的大型网络社区检测、基于指令的文本改进基准、面向紧凑有效局部特征描述子计算的基于学习的降维

1、[CV] TVLT: Textless Vision-Language Transformer

Z Tang, J Cho, Y Nie, M Bansal
[UNC Chapel Hill]
TVLT：无文本视觉-语言Transformer。本文提出无文本视觉语言转换器(TVLT)，其中同质Transformer模块采用原始的视觉和音频输入进行视觉和语言表示学习，具有最小的特定模式设计，并且不使用文本特定模块，如标记化或自动语音识别(ASR)。TVLT是通过重建连续视频帧和音频频谱图的掩码补丁(掩码自动编码)和对比建模来训练的，以对齐视频和音频。在各种多模态任务，如视觉问答、图像检索、视频检索和多模态情感分析中，TVLT达到了与基于文本的对应方法相当的性能，推理速度快了28倍，参数只有1/3。本文发现表明，有可能从低层次的视觉和音频信号中学习紧凑和高效的视觉语言表示，而不需要事先假设文本的存在。

In this work, we present the Textless Vision-Language Transformer (TVLT), where homogeneous transformer blocks take raw visual and audio inputs for vision-andlanguage representation learning with minimal modality-specific design, and do not use text-specific modules such as tokenization or automatic speech recognition (ASR). TVLT is trained by reconstructing masked patches of continuous video frames and audio spectrograms (masked autoencoding) and contrastive modeling to align video and audio. TVLT attains performance comparable to its text-based counterpart, on various multimodal tasks, such as visual question answering, image retrieval, video retrieval, and multimodal sentiment analysis, with 28x faster inference speed and only 1/3 of the parameters. Our findings suggest the possibility of learning compact and efficient visual-linguistic representations from low-level visual and audio signals without assuming the prior existence of text.

https://arxiv.org/abs/2209.14156

2、[LG] Exploring Low Rank Training of Deep Neural Networks

S R Kamalakara, A Locatelli, B Venkitesh, J Ba, Y Gal, A N. Gomez
[Cohere & FOR.ai & University of Toronto & University of Oxford]
深度神经网络低秩训练探索。在低秩中训练深度神经网络，即使用因子分解层，是社区特别感兴趣的：在内存消耗和训练时间方面，比非因子分解化训练更高效。之前的工作主要集中在预训练网络的低秩近似和带有额外目标的低秩空间训练上，对所选择的实践提供了各种特别的解释。本文分析了在实践中行之有效的技术，通过对GPT2等模型的广泛消融，提供了证据来证伪该领域的普遍信念，并在此过程中暗示了仍然需要回答的令人兴奋的研究机会。

Training deep neural networks in low rank, i.e. with factorised layers, is of particular interest to the community: it offers efficiency over unfactorised training in terms of both memory consumption and training time. Prior work has focused on low rank approximations of pre-trained networks and training in low rank space with additional objectives, offering various ad hoc explanations for chosen practice. We analyse techniques that work well in practice, and through extensive ablations on models such as GPT2 we provide evidence falsifying common beliefs in the field, hinting in the process at exciting research opportunities that still need answering.

https://arxiv.org/abs/2209.13569

3、[LG] Defining and Characterizing Reward Hacking

J Skalse, N H. R. Howe, D Krasheninnikov, D Krueger
[University of Oxford & Université de Montréal & University of Cambridge]
Reward Hacking的定义和刻画。本文提供了Reward Hacking的第一个正式定义，在这种现象中，优化一个不完美的智能体奖励函数R～会导致根据真实奖励函数R的不良表现。如果增加预期的智能体回报永远不会减少预期的真实回报，那么智能体就是不可hack的。直觉上，通过从奖励函数中排除一些条款(使其 "更窄")或忽略大致相等的结果之间的细微区别，有可能创建一个不可hack的智能体，但本文表明通常不是这种情况。一个关键的见解是，奖励的线性(在状态-行动访问计数中)使得不可hack性成为一个非常强大的条件。特别是，对于所有随机策略的集合，只有当其中一个是常数时，两个奖励函数才可能是不可hack的。因此，本文把注意力转向确定性策略和有限的随机策略集，在这些策略集中，非三态的不可hack对总是存在的，并为简化的存在建立必要和充分的条件，这是不可hack性的一个重要特例。本文结果揭示了使用奖励函数来指定窄任务和使人工智能系统与人工价值观保持一致之间的矛盾。

We provide the first formal definition of reward hacking, a phenomenon where optimizing an imperfect proxy reward function, R̃, leads to poor performance according to the true reward function, R. We say that a proxy is unhackable if increasing the expected proxy return can never decrease the expected true return. Intuitively, it might be possible to create an unhackable proxy by leaving some terms out of the reward function (making it “narrower”) or overlooking fine-grained distinctions between roughly equivalent outcomes, but we show this is usually not the case. A key insight is that the linearity of reward (in state-action visit counts) makes unhackability a very strong condition. In particular, for the set of all stochastic policies, two reward functions can only be unhackable if one of them is constant. We thus turn our attention to deterministic policies and finite sets of stochastic policies, where non-trivial unhackable pairs always exist, and establish necessary and sufficient conditions for the existence of simplifications, an important special case of unhackability. Our results reveal a tension between using reward functions to specify narrow tasks and aligning AI systems with human values.

https://arxiv.org/abs/2209.13085

4、[LG] ImmunoLingo: Linguistics-based formalization of the antibody language

M H Vu, P A. Robert, R Akbar, B Swiatczak, G K Sandve, D T T Haug, V Greiff
[University of Oslo & University of Science and Technology of China]
ImmunoLingo: 基于语言学的抗体语言形式化。自然语言和生物序列之间明显的相似性，导致了最近在抗体和其他生物序列分析中应用深度语言模型(LM)的热潮。然而，由于缺乏对生物序列语言的严格的语言学形式化，这将定义基本的组成部分，如词法(即语言的离散单位)和语法(即连接序列的完好性、结构和意义的规则)，导致LM的应用在很大程度上是不特定领域的，没有考虑到所研究的生物序列的基本结构。另一方面，语言学的形式化建立了以语言学为基础的，从而与领域相适应的LM应用的组成部分。这将有助于更好地理解自然语言和生物序列之间的差异和相似性如何影响LM的质量，这对于设计具有可提取的序列-功能关系规则的可解释模型至关重要，例如抗体特异性预测问题的基础。破解抗体特异性的规则对于加速合理的、硅学的生物治疗药物设计至关重要。本文正式确定了抗体语言的属性，从而不仅为语言学工具在适应性免疫受体分析中的应用奠定了基础，而且也为免疫受体特异性的系统免疫语言学研究奠定了基础。

Apparent parallels between natural language and biological sequence have led to a recent surge in the application of deep language models (LMs) to the analysis of antibody and other biological sequences. However, a lack of a rigorous linguistic formalization of biological sequence languages, which would define basic components, such as lexicon (i.e., the discrete units of the language) and grammar (i.e., the rules that link sequence well-formedness, structure, and meaning) has led to largely domain-unspecific applications of LMs, which do not take into account the underlying structure of the biological sequences studied. A linguistic formalization, on the other hand, establishes linguistically-informed and thus domain-adapted components for LM applications. It would facilitate a better understanding of how differences and similarities between natural language and biological sequences influence the quality of LMs, which is crucial for the design of interpretable models with extractable sequence-functions relationship rules, such as the ones underlying the antibody specificity prediction problem. Deciphering the rules of antibody specificity is crucial to accelerating rational and in silico biotherapeutic drug design. Here, we formalize the properties of the antibody language and thereby establish not only a foundation for the application of linguistic tools in adaptive immune receptor analysis but also for the systematic immunolinguistic studies of immune receptor specificity in general.

https://arxiv.org/abs/2209.12635

5、[LG] Why neural networks find simple solutions: the many regularizers of geometric complexity

B Dherin, M Munn, M Rosca, D G.T. Barrett
[Google & DeepMind]
为什么神经网络找到简单解决方案：几何复杂度的正则化器。在许多情况下，更简单的模型比更复杂的模型更可取，控制这种模型的复杂度是机器学习中许多方法的目标，如正则化、超参数调整和架构设计。在深度学习中，一直很难理解复杂性控制的基本机制，因为许多传统的衡量标准并不自然地适用于深度神经网络。本文发展了几何复杂度的概念，它是对模型函数变异性的衡量，使用离散的Dirichlet能量进行计算。利用理论论证和经验结果的结合，本文表明许多常见的训练启发式方法，如参数正则化、频谱正则化、平坦性正则化、隐含梯度正则化、噪声正则化和参数初始化的选择都可以控制几何复杂度，提供一个统一的框架来描述深度学习模型的行为。

In many contexts, simpler models are preferable to more complex models and the control of this model complexity is the goal for many methods in machine learning such as regularization, hyperparameter tuning and architecture design. In deep learning, it has been difficult to understand the underlying mechanisms of complexity control, since many traditional measures are not naturally suitable for deep neural networks. Here we develop the notion of geometric complexity, which is a measure of the variability of the model function, computed using a discrete Dirichlet energy. Using a combination of theoretical arguments and empirical results, we show that many common training heuristics such as parameter norm regularization, spectral norm regularization, flatness regularization, implicit gradient regularization, noise regularization and the choice of parameter initialization all act to control geometric complexity, providing a unifying framework in which to characterize the behavior of deep learning models.

https://arxiv.org/abs/2209.13083

另外几篇值得关注的论文：

[CL] FiD-Light: Efficient and Effective Retrieval-Augmented Text Generation

FiD-Light: 有效且高效的检索增强文本生成
S Hofstätter, J Chen, K Raman, H Zamani
[TU Wien & Google & University of Massachusetts Amherst] https://arxiv.org/abs/2209.14290

[SI] Large network community detection by fast label propagation

基于快速标签传播的大型网络社区检测
V A. Traag, L Šubelj
[Centre for Science and Technology Studies & University of Ljubljana] https://arxiv.org/abs/2209.13338

[CL] EditEval: An Instruction-Based Benchmark for Text Improvements

EditEval: 基于指令的文本改进基准
J Dwivedi-Yu, T Schick, Z Jiang, M Lomeli, P Lewis, G Izacard, E Grave, S Riedel, F Petroni
[Meta AI Research] https://arxiv.org/abs/2209.13331

[CV] Learning-Based Dimensionality Reduction for Computing Compact and Effective Local Feature Descriptors

面向紧凑有效局部特征描述子计算的基于学习的降维
H Dong, X Chen...
[ETH Zurich & University of Bonn & Lund University & Microsoft & University of Oxford]
https://arxiv.org/abs/2209.13586