mT5：Google的多语言文本-文本预训练Transformer模型

采用类似于T5的技巧，在许多跨语言任务上获得了SOTA。涵盖101种语言。

论文一作是本科毕业于上海电力大学、博士毕业于北卡的薛林亭。

论文：https://arxiv.org/abs/2010.11934 代码：https://github.com/google-research/multilingual-t5

【摘要】

The recent "Text-to-Text Transfer Transformer" (T5) leveraged a unified text-to-text format and scale to attain state-of-the-art results on a wide variety of English-language NLP tasks. In this paper, we introduce mT5, a multilingual variant of T5 that was pre-trained on a new Common Crawl-based dataset covering 101 languages. We describe the design and modified training of mT5 and demonstrate its state-of-the-art performance on many multilingual benchmarks. All of the code and model checkpoints used in this work are publicly available. 最近的T5利用统一的文本到文本的格式和比例来在各种英语NLP任务中获得最先进的结果。在本文中，我们介绍了mT5，它是T5的多语言变体，已在包含101种语言的新的基于Common Crawl的数据集中进行了预训练。我们描述了mT5的设计和经过改进的训练，并在许多多语言基准上展示了其最新的性能。这项工作中使用的所有代码和模型检查点都已公开。

内容中包含的图片若涉及版权问题，请及时与我们联系删除

mT5：Google的多语言文本-文本预训练Transformer模型

评论