Dataset Open Access

Annotated Corpus for Occitan

Bras, Myriam; Esher, Louise; Sibille, Jean; Vergez-Couret, Marianne

This corpus contains a collection of texts in Occitan which were manually annotated with parts-of-speech, lemmas.

The corpus was produced in the context of the RESTAURE project, funded by the French ANR. The current version of the corpus contains 28 documents and 12,425 tokens. The annotation process is detailed in the following article: http://hal.archives-ouvertes.fr/hal-01704806

The annotated versions are provided in a TSV CoNLL-U format.

Files (89.1 kB)
Name Size
CorpusRestaureOccitan.zip
md5:9917bf4095704e635ca3d484b8b58ae1
89.1 kB Download
322
23
views
downloads
All versions This version
Views 322322
Downloads 2323
Data volume 2.0 MB2.0 MB
Unique views 302302
Unique downloads 2222

Share

Cite as