Published 2024
| Version v2
Dataset
Open
ARTS Datasets - ARTS94, ARTS300, ARTS3000, ARTS160
Description
Datasets for readability and text simplicity evaluation in three sizes: 94, 300, 3000 and 160 disjunctive data entries. One data entry contains the following information:
- Text_original: Text from a parallel corpus for text simplification
- Text_formatted: Text_original where formatting issues have been resolved either manually (ARTS94) or automatically (ARTS300, ARTS3000, ARTS160)
- Dataset: Parallel corpus for text simplification, from which the original text has been extracted
- Label: information, if the text has been from the simplified (simp) or source (src) part of the corpus
- ID: Unique ID
- Score: Simplicity/readability score of the formatted text, between 0 and 1, the higher a score, the more complex/less readable the text
Licenses of the different datasets apply for the respective texts.