Annotated Corpus for the Alsatian Dialects

Bernhard, Delphine; Erhart, Pascale; Huck, Dominique; Steiblé, Lucie

doi:10.5281/zenodo.2536041

There is a newer version of the record available.

Published January 9, 2019 | Version 2.0

Dataset Open

Annotated Corpus for the Alsatian Dialects

1. LiLPa, Université de Strasbourg

Contributors

Other:

Dorffer, Clément

This corpus contains a collection of texts in the Alsatian dialects which were manually annotated with parts-of-speech, lemmas, translations into French and location entities.

The corpus was produced in the context of the RESTAURE project, funded by the French ANR. The current version of the corpus contains 21 documents and 12,570 tokens. The annotation process is detailed in the following article: http://hal.archives-ouvertes.fr/hal-01704806

Information about version 2

Version 2 contains the same annotated documents as version 1, but some errors have been corrected and the annotated corpus is provided in the CoNLL-U format

The untokenised and unannotated versions of the documents are found in the “txt” folder. The annotated versions of the documents are found in the "ud" folder (CoNLL-U format).

In addition to the form, the lemma and the part-of-speechn additional information is also provided:

translation of the lemma into French (Gloss field)
annotation of location names (NamedType field)

Files

Corpus_Release2_090119.zip

Files (188.4 kB)

Name	Size	Download all
Corpus_Release2_090119.zip md5:c95597cd7c4f1492d7598f52fcf4889b	188.4 kB	Preview Download

Additional details

Is cited by: http://hal.archives-ouvertes.fr/hal-01704806 (URL)
Is documented by: 10.5281/zenodo.1171925 (DOI)

Views

287

Downloads

Show more details

	All versions	This version
Views	1,944	1,017
Downloads	287	108
Data volume	135.8 MB	20.9 MB

More info on how stats are collected....

DOI

Resource type

Dataset

Publisher

Zenodo

License: Creative Commons Attribution Share Alike 4.0 International

Permits almost any use subject to providing credit and license notice. Frequently used for media assets and educational materials. The most common license for Open Access scientific publications. Not recommended for software. Read more

Technical metadata

Created: January 9, 2019
Modified: February 19, 2021

Annotated Corpus for the Alsatian Dialects

Authors/Creators

Contributors

Other:

Description

Files

Corpus_Release2_090119.zip

Files (188.4 kB)

Additional details

Related works