Chumor 1.0: A Truly Funny and Challenging Chinese Humor Understanding Dataset from Ruo Zhi Ba

2024年06月18日
  • 简介
    现有的幽默数据集和评估主要集中在英语上,缺乏像中文这样的非英语语言中具有文化细微差别的幽默资源。为了解决这个问题,我们构建了Chumor数据集,该数据集来源于若知吧(RZB),这是一个致力于分享具有知识性和文化特色的笑话的类似Reddit的中文平台。我们为每个笑话注释了解释,并通过母语为中文的人进行A/B测试,评估了人类解释与两个最先进的LLM(GPT-4o和ERNIE Bot)的解释。我们的评估表明,即使对于SOTA LLMs来说,Chumor也具有挑战性,并且Chumor笑话的人类解释明显优于LLMs生成的解释。
  • 作者讲解
  • 图表
  • 解决问题
    Chumor: A Chinese Dataset and Challenge for Humor Recognition in Context
  • 关键思路
    The paper presents Chumor, a dataset of culturally nuanced Chinese jokes sourced from Ruo Zhi Ba (RZB) and annotated with explanations. The paper evaluates the performance of two state-of-the-art LLMs, GPT-4o and ERNIE Bot, on the dataset and finds that Chumor is challenging even for these models. Human explanations for Chumor jokes are significantly better than explanations generated by the LLMs.
  • 其它亮点
    Chumor is a valuable resource for culturally specific humor in Chinese. The paper provides insights into the challenges of recognizing humor in context and highlights the limitations of current LLMs. The evaluation involves A/B testing by native Chinese speakers. The paper calls for further research on humor recognition in non-English languages.
  • 相关研究
    Recent related work includes studies on humor recognition in English, such as SemEval-2020 Task 7 and the Humicroedit dataset. Other related work includes studies on humor generation and humor understanding in various languages, such as German and Arabic.
许愿开讲
PDF
原文
点赞 收藏
向作者提问
NEW
分享到Link

提问交流

提交问题,平台邀请作者,轻松获得权威解答~

向作者提问