来自今天的爱可可AI前沿推介
[LG] Ankh: Optimized Protein Language Model Unlocks General-Purpose Modelling
A Elnaggar, H Essam, W Salah-Eldin...
[Technical University of Munich (TUM) & Proteinea, Inc & Columbia University]
Ankh: 用优化的蛋白质语言模型解锁通用建模
要点:
-
提出Ankh,一种蛋白质语言模型(PLM),针对特定于蛋白质的数据效率、成本降低和知识引导的方向进行了优化,这是第一个以更少参数超越最先进性能的通用PLM; -
提供了一系列Ankh擅长的结构和功能基准,并提供了蛋白质变体生成分析,使Ankh成功学习蛋白质进化保守-突变趋势,并在保留关键结构功能特征的同时引入功能多样性; -
试图通过蛋白质特异性优化来提高 PLM 的性能,而不仅是扩展PLM过规模。
一句话总结:
提出 Ankh,一种蛋白质语言模型(PLM),针对特定于蛋白质的数据效率、成本降低和知识引导的方向进行了优化,以更少的参数超越了最先进的性能,旨在通过可获得的资源促进研究创新的可访问性。
摘要:
与通过扩展蛋白质语言模型(PLM)规模不同,本文寻求通过蛋白特异性优化来提高性能。虽然语言模型大小与其学习表示的丰富性之间的相称性得到了验证,但本文优先考虑可访问性,并追求数据高效、低成本和知识导向优化的路径。通过掩码、架构和预训练数据等方面的二十多个实验,从蛋白特异性实验中获得了洞察,以构建一个最佳方式解释生命语言的模型。本文提出 Ankh,第一个在Google TPU-v4 上训练的通用 PLM,以更少的参数(预训练<10%,推理<7%,嵌入维度<30%)超过最先进的性能。本文提供了Ankh擅长的具有代表性的结构和功能基准。Ankh成功学到了蛋白质进化保守-突变趋势,并引入了功能多样性,同时保留了关键的结构功能特征。
As opposed to scaling-up protein language models (PLMs), we seek improving performance via protein-specific optimization. Although the proportionality between the language model size and the richness of its learned representations is validated, we prioritize accessibility and pursue a path of data-efficient, cost-reduced, and knowledge-guided optimization. Through over twenty experiments ranging from masking, architecture, and pre-training data, we derive insights from protein-specific experimentation into building a model that interprets the language of life, optimally. We present Ankh, the first general-purpose PLM trained on Google's TPU-v4 surpassing the state-of-the-art performance with fewer parameters (<10% for pre-training, <7% for inference, and <30% for the embedding dimension). We provide a representative range of structure and function benchmarks where Ankh excels. We further provide a protein variant generation analysis on High-N and One-N input data scales where Ankh succeeds in learning protein evolutionary conservation-mutation trends and introducing functional diversity while retaining key structural-functional characteristics. We dedicate our work to promoting accessibility to research innovation via attainable resources.
论文链接:https://arxiv.org/abs/2301.06568
内容中包含的图片若涉及版权问题,请及时与我们联系删除
评论
沙发等你来抢