There is a newer version of the record available.

Published February 9, 2018 | Version 1.0
Dataset Open

Annotated Corpus for the Alsatian Dialects

  • 1. LiLPa, Université de Strasbourg

Contributors

Description

This corpus contains a collection of texts in the Alsatian dialects which were manually annotated with parts-of-speech, lemmas, translations into French and location entities.

The corpus was produced in the context of the RESTAURE project, funded by the French ANR. The current version of the corpus contains 21 documents and 12,570 tokens. The annotation process is detailed in the following article: http://hal.archives-ouvertes.fr/hal-01704806

The untokenised and unannotated versions of the documents are found in the “txt” folder. The annotated versions of the documents are found in the “annotated” folder. They are provided in a TSV format with the following columns:

  • id: token index in the document
  • form: word form
  • translation: translation into French
  • lemma: word lemma
  • pos: part-of-speech
  • location: Begin-Inside tags for location entities

Files

Corpus_Release1_180209.zip

Files (156.9 kB)

Name Size Download all
md5:5f329674cde52ecc04da6f3c0f12e095
156.9 kB Preview Download

Additional details

Related works