NLP 探新 | Non/Semi-parametric Approach in NLP

Intro

本周DeepMind一口气放出了三篇关于预训练语言模型的文章，涉及到了a) 预训练模型探索、b) 语言模型中的道德和社会风险研究、c) 基于检索增强的预训练语言模型架构三个方面。

https://deepmind.com/blog/article/language-modelling-at-scaledeepmind.com/blog/article/language-modelling-at-scale

最后一方面工作中的 Retrieval-Enhanced Transformer (RETRO) ，一下子就刺激到了我的 NLP DNA。

无独有偶，在 ACL2022 上，也即将举办一个属于同一脉工作的workshop：Semi-parametric Methods in NLP。（Danqi 女神和Jason Weston 都出现在了 Invited Speakers列表中）

http://www.semiparametric.ml/www.semiparametric.ml/

正巧，前一阵对这一支的工作进行了梳理（两周前刚在组会上做了一个简短的report），在这里挖个抗，来盘一盘这类 Non-parametric （或是称为 Retrieval-enhanced）方法的源起、优势、技术核心、应用，以及该方法将会为预训练语言模型（PLM）的玩法、NLP 任务的方法范式添加哪些新的“金砖玉瓦”。

相关工作：

Pre-PLM Era

Pointer sentinel mixture models. ICLR,2017. [paper]
Improving neural language models with a continuous cache. ICLR,2017. [paper]
Unbounded cache model for online language modeling with open vocabulary. NeurIPS,2017. [paper]

Post-PLM Era

Episodic Memory in Lifelong Language Learning. NeurIPS,2019. [paper]
REALM: Retrieval-Augmented Language Model Pre-Training. 2020. [paper]
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. 2020. [paper]
Generalization through Memorization: Nearest Neighbor Language Models. ICLR,2020. [paper]
Nearest Neighbor Machine Translation. ICLR,2021. [paper]
BERT-kNN: Adding a kNN Search Component to Pretrained Language Models for Better QA. Findings of EMNLP,2020. [paper]
Augmenting Transformers with KNN-Based Composite Memory for Dialog. TACL,2021. [paper]
Adaptive Semiparametric Language Models. TACL,2021. [paper]
Adaptable and Interpretable Neural Memory Over Symbolic Knowledge. NAACL,2021. [paper]
Adaptive Nearest Neighbor Machine Translation. ACL,2021. [paper]
Learning Kernel-Smoothed Machine Translation with Retrieved Examples. EMNLP,2021. [paper]
Improving language models by retrieving from trillions of tokens. arxiv,2021. [paper]

内容中包含的图片若涉及版权问题，请及时与我们联系删除

NLP 探新 | Non/Semi-parametric Approach in NLP

Intro

相关工作：

Pre-PLM Era

Post-PLM Era

评论