CCKS 2021丨RCWI: A Dataset for Chinese Complex Word Identification (Mengxi Que Yufei Zhang Dong Yu)

Reasonable evaluation of lexical complexity is the premise of multiple downstream NLP tasks. At present, there lacks of reliable Chinese lexical complexity datasets. This paper constructs the RCWI-Dataset for native Chinese speakers, which contains three complexity categories. Each example is annotated by at least three annotators. We provide baseline experiments based on feature engineering and the results show the validity of the dataset.

内容中包含的图片若涉及版权问题，请及时与我们联系删除

CCKS 2021丨RCWI: A Dataset for Chinese Complex Word Identification (Mengxi Que Yufei Zhang Dong Yu)

评论列表

评论