不久前,Gary Marcus 和 Elliot Murphy 联合发文,讨论了AI从业者需要从语言学家借鉴的三个方面(点击这里阅读原文),摘录要点如下:
每个人都知道,像GPT-3和LaMDA这样的大型语言模型至少在某些方面已经取得了巨大的进步,并为许多基准(benchmark)提供了动力。Cosmo最近描述了DALL-E,但该领域的大多数人也同意它仍然少些什么。例如,Facebook的一组工程师在2019年写道(点击这里阅读原文):
“越来越多的证据表明,最先进的模型学会利用数据集中虚假的统计模式......而不是以人类灵活和可概括的方式学习意义"。
从那时起,基准的结果已经变得更好,但仍然缺少些东西。
如果我们必须投入精力到仍然缺失的东西上,我们会关注这三个关键因素:
- 参考(reference):词语和句子并不是孤立存在的。语言是关于词语或句子和世界之间的联系,而大型语言模型中的单词序列很缺乏与外部世界的联系。
- 认知模型(Cognitive models):一个语言系统的最终目标应该是更新一个持续的但动态的世界感知,而大型语言模型并不产生这样的认知模型,至少没有人能够可靠地利用这种模型。
- 构成性(Compositionality):复杂的整体大多数情况下被系统地按照其各个部分解释 ,这些部分之间是如何组织的即为构成性。像DALL-E这样的系统在涉及构成性时面临明显的挑战。(像GPT这样的大语言模型能产生格式良好的散文,但却不能产生反映这些句子各部分之间结构关系的可解释的表述)。
在我们看来,对这三个因素的关注不足会产生严重的后果,包括:
(a) 大型语言模型有随着时间的推移而失去连贯性的趋势,堕落到与现实没有明确联系的 "空洞 "语言。
(b) 大型语言模型在区分真假方面产生困难。
(c) 模型在避免延续偏见和有毒言论方面产生纠结。
现在问题来了:我们一直强调的这三个要素对语言学家来说都不是新闻。事实上,至少从19世纪末Gottlob Frege的工作开始,它们就一直是许多语言学家所担心的核心问题。可以肯定的是,到目前为止,这三个问题都没有得到解决。例如,关于我们日常语言的使用 "有多少 "真正依赖于构成性,以及语言的正确认知模型应该是什么,仍然存在争论。但我们确实认为,在制定和思考这些问题方面,语言学可以提供很多东西。
下面相关的英文原文:
Everybody knows that large language models like GPT-3 and LaMDA have made tremendous strides, at least in some respects, and powered past many benchmarks, and Cosmo recently described DALL-E but most in the field also agree that something is still missing. A group of engineers at Facebook, for example, wrote in 2019 that:
A growing body of evidence shows that state-of-the-art models learn to exploit spurious statistical patterns in datasets... instead of learning meaning in the flexible and generalizable way that humans do."
Since then, the results on benchmarks have gotten better, but there’s still something missing.
If we had to put our finger on what is still missing, we would focus on these three key elements:
-
Reference: Words and sentence don’t exist in isolation. Language is about a connection between words (or sentence) and the world; the sequences of words that large language models utter lack connection to the external world.
-
Cognitive models: The ultimate goal of a language system should be to update a persisting but dynamic sense of the world. Large language models don’t produce such cognitive models, at least not in a way that anybody has been able to make reliable use of.
-
Compositionality: Complex wholes are (mostly) systematically interpreted in terms of their parts, and how these parts are arranged. Systems like DALL-E face clear challenges when it comes to compositionality. (LLM’s like GPT produce well-formed prose but do not produce interpretable representations of utterances that reflect structured relationships between the parts of those sentences.)
In our view, inadequate attention to these three factors has serious consequences, including:
(a) the tendency of large language models to lose coherence over time, drifting into “empty” language with no clear connection to reality;
(b) the difficulty of large language models in distinguishing truth from falsehoods;
(c) the struggle in these models to avoid perpetuating bias and toxic speech.
Now here’s the thing: none of these three elements we have been stressing are news to linguists. In fact, at least since the work of Gottlob Frege in the late 19th century, they have been pretty central to what many linguists worry about. To be sure, none of these three issues has been solved so far; for example, there is still debate about “how much” of our everyday language use actually relies on compositionality, and what the right cognitive models of language should be. But we do think that linguistics has a lot to offer in terms of formulating and thinking about these questions.
内容中包含的图片若涉及版权问题,请及时与我们联系删除
评论
沙发等你来抢