Published October 11, 2018 | Version 0.0.1
Dataset Open

Corpus of Occitan Written Traditional Folktales Annotated with Part-Of-Speech (OWT-Tag)

  • 1. Queen's University

Contributors

Contact person:

  • 1. Queen's University

Description

This resource contains 5 extracts of texts in Occitan which were manually annotated with lemmas and parts-of-speech, following the Grace standard. It was produced during the ExpressioNarration project, funded by a Marie Curie Individual Fellowship, in order to evaluate the performance of an Occitan Part-Of-Speech tagger, Talismane, to the specifities of the corpus of the project called Oral Occitan (OcOr), also available on https://zenodo.org/record/1451753#.W78FJWOYSpo.
Each extract contains around 1500 words. They are extracted from 'Contes et proverbes populaires recueillis en armagnac et Contes populaires recueillis en agenais' de J.-F. Bladé, 'Coundes biarnés, couéilhuts aüs parsàas miéytadès dou péys dé Biarn' de J.-V. Lalanne, 'Contes populaires du Languedoc' de L. Lambert and 'Contes populaires recueillis dans la Grande-Lande' de F. Arnaudin.
The annotation process is described in the following article available on https://www.openscience.fr/IMG/pdf/iste_modocv1n1_2.pdf.

Files

OWT-tag.zip

Files (35.1 kB)

Name Size Download all
md5:8931fd2ac6bf619177c705eabe5f83ec
35.1 kB Preview Download

Additional details

Funding

EXPRESSIONARRATION – Narration, linguistic expression and discourse structure: explorations of orality in Occitan and French 655034
European Commission

References