Annotated Corpus for the Alsatian Dialects
Authors/Creators
- 1. LiLPa, Université de Strasbourg
Contributors
Other:
Description
This corpus contains a collection of texts in the Alsatian dialects which were manually annotated with parts-of-speech, lemmas, translations into French and location entities.
The corpus was produced in the context of the RESTAURE project, funded by the French ANR. The current version of the corpus contains 21 documents and 12,570 tokens. The annotation process is detailed in the following article: http://hal.archives-ouvertes.fr/hal-01704806
Information about version 2
Version 2 contains the same annotated documents as version 1, but some errors have been corrected and the annotated corpus is provided in the CoNLL-U format
The untokenised and unannotated versions of the documents are found in the “txt” folder. The annotated versions of the documents are found in the "ud" folder (CoNLL-U format).
In addition to the form, the lemma and the part-of-speechn additional information is also provided:
- translation of the lemma into French (Gloss field)
- annotation of location names (NamedType field)
Files
Corpus_Release2_090119.zip
Files
(188.4 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:c95597cd7c4f1492d7598f52fcf4889b
|
188.4 kB | Preview Download |
Additional details
Related works
- Is cited by
- http://hal.archives-ouvertes.fr/hal-01704806 (URL)
- Is documented by
- 10.5281/zenodo.1171925 (DOI)