Corpus of Occitan Written Traditional Folktales Annotated with Part-Of-Speech (OWT-Tag)
Description
This resource contains 5 extracts of texts in Occitan which were manually annotated with lemmas and parts-of-speech, following the Grace standard. It was produced during the ExpressioNarration project, funded by a Marie Curie Individual Fellowship, in order to evaluate the performance of an Occitan Part-Of-Speech tagger, Talismane, to the specifities of the corpus of the project called Oral Occitan (OcOr), also available on https://zenodo.org/record/1451753#.W78FJWOYSpo.
Each extract contains around 1500 words. They are extracted from 'Contes et proverbes populaires recueillis en armagnac et Contes populaires recueillis en agenais' de J.-F. Bladé, 'Coundes biarnés, couéilhuts aüs parsàas miéytadès dou péys dé Biarn' de J.-V. Lalanne, 'Contes populaires du Languedoc' de L. Lambert and 'Contes populaires recueillis dans la Grande-Lande' de F. Arnaudin.
The annotation process is described in the following article available on https://www.openscience.fr/IMG/pdf/iste_modocv1n1_2.pdf.
Files
OWT-tag.zip
Files
(35.1 kB)
Name | Size | Download all |
---|---|---|
md5:8931fd2ac6bf619177c705eabe5f83ec
|
35.1 kB | Preview Download |
Additional details
Funding
References
- Vergez-Couret M. (2017). « Constitution et annotation d'un corpus écrit de contes et récits en occitan », Analyses et méthodes formelles pour les humanités numériques, ISTE OpenScience, 1-1, publication en ligne : https://www.openscience.fr/Constitution-et-annotation-d-un-corpus-ecrit-de-contes-et-recits-en-occitan.