爱可可AI前沿推介(11.27)

LG - 机器学习 CV - 计算机视觉 CL - 计算与语言 AS - 音频与语音 RO - 机器人

转自爱可可爱生活

1、[LG] Neural networks to learn protein sequence–function relationships from deep mutational scanning data

S Gelman, SA Fahlberg, P Heinzelman, PA Romero…

[University of Wisconsin–Madison]

面向深度突变扫描数据蛋白质序列-功能关系学习的神经网络。从蛋白质序列到功能的映射是非常复杂的，这使得预测序列变化将如何影响蛋白质行为和特性变得很有挑战性。本文提出一种有监督深度学习框架，从深度突变扫描数据中学习序列-功能映射，并对新的、未定性的序列变体进行预测。测试了多种神经网络架构，包括包含蛋白质结构的图卷积网络，以探索网络的内部表示如何影响其序列-功能映射学习能力。有监督学习方法比基于物理学的和无监督的预测方法显示出优越的性能。捕捉非线性相互作用和跨序列位置共享参数的网络对学习序列和功能之间的关系很重要。对训练好的模型的进一步分析表明，这些网络有能力学习关于蛋白质结构和机制的有生物学意义的信息。展示了这些模型在序列空间中的导航能力，以及在训练集之外设计新蛋白质的能力。应用蛋白G B1结构域(GB1)模型设计了一个与免疫球蛋白G结合的序列，其亲和力大大高于野生型GB1。

The mapping from protein sequence to function is highly complex, making it challenging to predict how sequence changes will affect a protein’s behavior and properties. We present a supervised deep learning framework to learn the sequence–function mapping from deep mutational scanning data and make predictions for new, uncharacterized sequence variants. We test multiple neural network architectures, including a graph convolutional network that incorporates protein structure, to explore how a network’s internal representation affects its ability to learn the sequence–function mapping. Our supervised learning approach displays superior performance over physics-based and unsupervised prediction methods. We find that networks that capture nonlinear interactions and share parameters across sequence positions are important for learning the relationship between sequence and function. Further analysis of the trained models reveals the networks’ ability to learn biologically meaningful information about protein structure and mechanism. Finally, we demonstrate the models’ ability to navigate sequence space and design new proteins beyond the training set. We applied the protein G B1 domain (GB1) models to design a sequence that binds to immunoglobulin G with substantially higher affinity than wild-type GB1.

https://weibo.com/1402400261/L3mvxz8Ua

2、[LG] On the Optimal Memorization Power of ReLU Neural Networks

G Vardi, G Yehudai, O Shamir

[Weizmann Institute of Science]

ReLU神经网络优化记忆能力研究。研究了前馈ReLU神经网络的记忆能力。该网络可以用 Õ ( √ N ) 个参数记忆满足温和可分离性假设的任意N个点。已知的VC维上界意味着记忆N个样本需要Ω(√ N)个参数，因此所提出构造在对数系数下是最优的。给出了一个深度界为1≤L≤√N的网络的广义结构，即用 Õ(N/L)个参数记忆N个样本。这个界在对数系数下也是最优的。所提结构用具有大比特复杂度的权重。本文证明，拥有这样大的比特复杂度对于用亚线性参数数量记忆来说是必要的，也是充分的。

We study the memorization power of feedforward ReLU neural networks. We show that such networks can memorize any N points that satisfy a mild separability assumption using Õ (√ N ) parameters. Known VC-dimension upper bounds imply that memorizing N samples requires Ω( √ N) parameters, and hence our construction is optimal up to logarithmic factors. We also give a generalized construction for networks with depth bounded by 1 ≤ L ≤ √ N , for memorizing N samples using Õ(N/L) parameters. This bound is also optimal up to logarithmic factors. Our construction uses weights with large bit complexity. We prove that having such a large bit complexity is both necessary and sufficient for memorization with a sub-linear number of parameters.

https://weibo.com/1402400261/L3mz15IWN

3、[CV] Score-Based Generative Modeling with Critically-Damped Langevin Diffusion

基于临界阻尼朗文扩散的基于分数生成式模型。基于分数的生成模型(SGM)已经显示出显著的合成质量。SGM依赖于一个扩散过程，该过程逐渐将数据扰动到一个可行的分布，同时生成模型学习去噪。除了数据分布本身之外，这项去噪任务的复杂性是由扩散过程唯一决定的。本文认为，目前的SGM采用了过于简单的扩散，导致了不必要的复杂的去噪过程，从而限制了生成式模型的性能。基于与统计力学的联系，本文提出一种新的临界阻尼朗文扩散(CLD)，并表明基于CLD的SGM取得了卓越的性能。CLD可以被解释为在一个扩展的空间中运行联合扩散，其中辅助变量可以被视为"速度"，与数据变量耦合，就像在哈密尔顿动力学中一样。本文为CLD推导出一个新的分数匹配目标，并表明该模型只需要学习给定数据的速度条件分布的分数函数，这比直接学习数据的分数要容易。还推导出一种新的采样方案，用于基于CLD的扩散模型的有效合成。在类似的网络结构和采样计算预算下，CLD的合成质量优于以前的SGM。为CLD设计的新型采样器明显优于Euler-Maruyama等求解器。所提出框架为基于分数的去噪扩散模型提供了新的见解，并可随时用于高分辨率图像合成。

Score-based generative models (SGMs) have demonstrated remarkable synthesis quality. SGMs rely on a diffusion process that gradually perturbs the data towards a tractable distribution, while the generative model learns to denoise. The complexity of this denoising task is, apart from the data distribution itself, uniquely determined by the diffusion process. We argue that current SGMs employ overly simplistic diffusions, leading to unnecessarily complex denoising processes, which limit generative modeling performance. Based on connections to statistical mechanics, we propose a novel critically-damped Langevin diffusion (CLD) and show that CLD-based SGMs achieve superior performance. CLD can be interpreted as running a joint diffusion in an extended space, where the auxiliary variables can be considered “velocities” that are coupled to the data variables as in Hamiltonian dynamics. We derive a novel score matching objective for CLD and show that the model only needs to learn the score function of the conditional distribution of the velocity given data, an easier task than learning scores of the data directly. We also derive a new sampling scheme for efficient synthesis from CLD-based diffusion models. We find that CLD outperforms previous SGMs in synthesis quality for similar network architectures and sampling compute budgets. We show that our novel sampler for CLD significantly outperforms solvers such as Euler–Maruyama. Our framework provides new insights into score-based denoising diffusion models and can be readily used for high-resolution image synthesis.

https://weibo.com/1402400261/L3mF2zCq7

4、[AS] MIDI-DDSP: Detailed Control of Musical Performance via Hierarchical Modeling

MIDI-DDSP：基于层次建模的音乐表现细节控制。音乐的表达需要控制哪些音符被演奏，以及如何被演奏。传统的音频合成器提供了细致的表现力控制，但以现实感为代价。黑盒神经音频合成和串联采样器可以产生逼真的音频，但控制机制很少。本文提出MIDI-DDSP，一种乐器的层次模型，既能实现逼真的神经音频合成，又能实现详细的用户控制。从可解释的微分数字信号处理(DDSP)合成参数开始，推断出音符和其表现力的高级属性(如音色、颤音、动态和衔接)。创造了一个3层次的结构(音符、演奏、合成)，使个人可以选择在每个层次上进行干预，或者利用训练好的先验(给定音符的演奏，给定演奏的合成)进行创造性的辅助。通过定量实验和听觉测试，证明该层次结构可以重建高保真音频，准确预测一个音符序列的性能属性，独立操作一个给定性能的属性，并作为一个完整的系统，从一个新的音符序列中产生逼真的音频。通过利用可解释的层次结构和多层次的粒度，MIDI-DDSP打开了辅助工具的大门，为不同音乐经验的人赋予能力。

Musical expression requires control of both what notes are played, and how they are performed. Conventional audio synthesizers provide detailed expressive controls, but at the cost of realism. Black-box neural audio synthesis and concatenative samplers can produce realistic audio, but have few mechanisms for control. In this work, we introduce MIDI-DDSP a hierarchical model of musical instruments that enables both realistic neural audio synthesis and detailed user control. Starting from interpretable Differentiable Digital Signal Processing (DDSP) synthesis parameters, we infer musical notes and high-level properties of their expressive performance (such as timbre, vibrato, dynamics, and articulation). This creates a 3-level hierarchy (notes, performance, synthesis) that affords individuals the option to intervene at each level, or utilize trained priors (performance given notes, synthesis given performance) for creative assistance. Through quantitative experiments and listening tests, we demonstrate that this hierarchy can reconstruct high-fidelity audio, accurately predict performance attributes for a note sequence, independently manipulate the attributes of a given performance, and as a complete system, generate realistic audio from a novel note sequence. By utilizing an interpretable hierarchy, with multiple levels of granularity, MIDI-DDSP opens the door to assistive tools to empower individuals across a diverse range of musical experience.

https://weibo.com/1402400261/L3mKQaq1Q

5、[LG] Value Function Spaces: Skill-Centric State Abstractions for Long-Horizon Reasoning

D Shah, P Xu, Y Lu, T Xiao, A Toshev, S Levine, B Ichter

[Google Research & UC Berkeley]

价值函数空间：面向长程推理的技能为中心状态抽象。强化学习可以训练有效执行复杂任务的策略。然而，对于长周期的任务，这些方法的性能会随着周期的变化而下降，往往需要对低级技能进行推理和组合。分层强化学习旨在通过提供一个低级技能库作为行动抽象来实现这一目标。层次结构可以通过对空间状态的抽象化来进一步改进。本文认为，一个合适的状态抽象应该取决于可用的低级策略的能力。提出了"价值函数空间"：一种简单的方法，通过使用与每个下层技能相对应的价值函数来产生这样一种表示。这些价值函数捕捉到了场景的承受力，从而形成了一个紧凑地抽象出任务相关信息并鲁棒地忽略干扰因素的表示。对解决迷宫和机器人操作任务的实证评估表明，该方法提高了长程的性能，并比其他无模型和基于模型的方法实现了更好的零样本泛化。

Reinforcement learning can train policies that effectively perform complex tasks. However for long-horizon tasks, the performance of these methods degrades with horizon, often necessitating reasoning over and composing lower-level skills. Hierarchical reinforcement learning aims to enable this by providing a bank of low-level skills as action abstractions. Hierarchies can further improve on this by abstracting the space states as well. We posit that a suitable state abstraction should depend on the capabilities of the available lower-level policies. We propose Value Function Spaces: a simple approach that produces such a representation by using the value functions corresponding to each lower-level skill. These value functions capture the affordances of the scene, thus forming a representation that compactly abstracts task relevant information and robustly ignores distractors. Empirical evaluations for maze-solving and robotic manipulation tasks demonstrate that our approach improves long-horizon performance and enables better zero-shot generalization than alternative model-free and model-based methods.

https://weibo.com/1402400261/L3mOEe4cs

另外几篇值得关注的论文：

[CV] Self-slimmed Vision Transformer

自精简视觉Transformer

Z Zong, K Li, G Song, Y Wang, Y Qiao, B Leng, Y Liu

[Beihang University & SenseTime Research & Chinese Academy of Sciences & Shanghai AI Laboratory]

https://weibo.com/1402400261/L3mTp9h17

[LG] Tabular Data: Deep Learning is Not All You Need

表格数据：深度学习还不够

R Shwartz-Ziv, A Armon

[Intel]

https://weibo.com/1402400261/L3mVnt9eg

[CV] Efficient Video Transformers with Spatial-Temporal Token Selection

基于时空Token选择的高效视频Transformer

J Wang, X Yang, H Li, Z Wu, Y Jiang

[Fudan University & University of Maryland]

https://weibo.com/1402400261/L3mWBqLgP

[CV] Conditional Object-Centric Learning from Video

视频的条件化以对象为中心学习

T Kipf, G F. Elsayed, A Mahendran, A Stone, S Sabour, G Heigold, R Jonschkowski, A Dosovitskiy, K Greff

[Google Research]

https://weibo.com/1402400261/L3mY7Eaez

内容中包含的图片若涉及版权问题，请及时与我们联系删除