Reasonable evaluation of lexical complexity is the premise of multiple downstream NLP tasks. At present, there lacks of reliable Chinese lexical complexity datasets. This paper constructs the RCWI-Dataset for native Chinese speakers, which contains three complexity categories. Each example is annotated by at least three annotators. We provide baseline experiments based on feature engineering and the results show the validity of the dataset.

内容中包含的图片若涉及版权问题,请及时与我们联系删除