来自今天的爱可可AI前沿推介

[CL] Is ChatGPT a General-Purpose Natural Language Processing Task Solver?

C Qin...
[Nanyang Technological University & Amazon Web Services & Shanghai Jiao Tong University & Georgia Institute of Technology & Stanford University]

ChatGPT是通用NLP任务处理器吗?

要点:

  1. ChatGPT有一定的通用 NLP 能力,但往往不如特定任务的模;
  2. 在推理和对话任务中表现出色;
  3. 在序列标记方面很吃力;
  4. 可生成较长的摘要,但在摘要任务中表现比 GPT-3.5 差。

一句话总结:
ChatGPT 显示了作为通用 NLP 模型的一些能力,但往往不如特定任务模型,在推理和对话任务中表现出色,而在序列标记方面略显逊色。

Spurred by advancements in scale, large language models (LLMs) have demonstrated the ability to perform a variety of natural language processing (NLP) tasks zero-shot -- i.e., without adaptation on downstream data. Recently, the debut of ChatGPT has drawn a great deal of attention from the natural language processing (NLP) community due to the fact that it can generate high-quality responses to human input and self-correct previous mistakes based on subsequent conversations. However, it is not yet known whether ChatGPT can serve as a generalist model that can perform many NLP tasks zero-shot. In this work, we empirically analyze the zero-shot learning ability of ChatGPT by evaluating it on 20 popular NLP datasets covering 7 representative task categories. With extensive empirical studies, we demonstrate both the effectiveness and limitations of the current version of ChatGPT. We find that ChatGPT performs well on many tasks favoring reasoning capabilities (e.g., arithmetic reasoning) while it still faces challenges when solving specific tasks such as sequence tagging. We additionally provide in-depth analysis through qualitative case studies.

论文链接:https://arxiv.org/abs/2302.06476
图片
图片
图片
图片