Reasonable evaluation of lexical complexity is the premise of multiple downstream NLP tasks. At present, there lacks of reliable Chinese lexical complexity datasets. This paper constructs the RCWI-Dataset for native Chinese speakers, which contains three complexity categories. Each example is annotated by at least three annotators. We provide baseline experiments based on feature engineering and the results show the validity of the dataset.
CCKS 2021丨RCWI: A Dataset for Chinese Complex Word Identification (Mengxi Que Yufei Zhang Dong Yu)
沙发等你来抢
去评论
评论
沙发等你来抢