Published January 24, 2019
| Version v1
Dataset
Open
Simple Italian sentences ranked by readability
Description
The dataset contains 500,000 sentences extracted from the Paisà corpus (https://www.corpusitaliano.it/) which have been selected for being easy to read according to four parameters: token number, average word length, depth of the parse tree and verb "arity". The sentences are ranked by readability.
Files
IT-simple-monolingual.txt
Files
(27.0 MB)
Name | Size | Download all |
---|---|---|
md5:e352f51f8d6032176c5b0f4402f7e446
|
27.0 MB | Preview Download |