Intro

本周DeepMind一口气放出了三篇关于预训练语言模型的文章,涉及到了a) 预训练模型探索、b) 语言模型中的道德和社会风险研究、c) 基于检索增强的预训练语言模型架构三个方面。

最后一方面工作中的 Retrieval-Enhanced Transformer (RETRO) ,一下子就刺激到了我的 NLP DNA。

无独有偶,在 ACL2022 上,也即将举办一个属于同一脉工作的workshop:Semi-parametric Methods in NLP。(Danqi 女神Jason Weston 都出现在了 Invited Speakers列表中)

正巧,前一阵对这一支的工作进行了梳理(两周前刚在组会上做了一个简短的report),在这里挖个抗,来盘一盘这类 Non-parametric (或是称为 Retrieval-enhanced) 方法的源起、优势、技术核心、应用,以及该方法将会为预训练语言模型(PLM)的玩法、NLP 任务的方法范式 添加哪些新的“金砖玉瓦”。

 

相关工作:

Pre-PLM Era

  1. Pointer sentinel mixture models. ICLR,2017. [paper]
  2. Improving neural language models with a continuous cache. ICLR,2017. [paper]
  3. Unbounded cache model for online language modeling with open vocabulary. NeurIPS,2017. [paper]

 

Post-PLM Era

  1. Episodic Memory in Lifelong Language Learning. NeurIPS,2019. [paper]
  2. REALM: Retrieval-Augmented Language Model Pre-Training. 2020. [paper]
  3. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. 2020. [paper]
  4. Generalization through Memorization: Nearest Neighbor Language Models. ICLR,2020. [paper]
  5. Nearest Neighbor Machine Translation. ICLR,2021. [paper]
  6. BERT-kNN: Adding a kNN Search Component to Pretrained Language Models for Better QA. Findings of EMNLP,2020. [paper]
  7. Augmenting Transformers with KNN-Based Composite Memory for Dialog. TACL,2021. [paper]
  8. Adaptive Semiparametric Language Models. TACL,2021. [paper]
  9. Adaptable and Interpretable Neural Memory Over Symbolic Knowledge. NAACL,2021. [paper]
  10. Adaptive Nearest Neighbor Machine Translation. ACL,2021. [paper]
  11. Learning Kernel-Smoothed Machine Translation with Retrieved Examples. EMNLP,2021. [paper]
  12. Improving language models by retrieving from trillions of tokens. arxiv,2021. [paper]

内容中包含的图片若涉及版权问题,请及时与我们联系删除