Published October 11, 2018 | Version 0.0.1
Dataset Open

Corpus of Occitan Written Traditional Folktales Annotated with Part-Of-Speech (OWT-Tag)

  • 1. Queen's University


Contact person:

  • 1. Queen's University


This resource contains 5 extracts of texts in Occitan which were manually annotated with lemmas and parts-of-speech, following the Grace standard. It was produced during the ExpressioNarration project, funded by a Marie Curie Individual Fellowship, in order to evaluate the performance of an Occitan Part-Of-Speech tagger, Talismane, to the specifities of the corpus of the project called Oral Occitan (OcOr), also available on
Each extract contains around 1500 words. They are extracted from 'Contes et proverbes populaires recueillis en armagnac et Contes populaires recueillis en agenais' de J.-F. Bladé, 'Coundes biarnés, couéilhuts aüs parsàas miéytadès dou péys dé Biarn' de J.-V. Lalanne, 'Contes populaires du Languedoc' de L. Lambert and 'Contes populaires recueillis dans la Grande-Lande' de F. Arnaudin.
The annotation process is described in the following article available on


Files (35.1 kB)

Name Size Download all
35.1 kB Preview Download

Additional details


European Commission
EXPRESSIONARRATION – Narration, linguistic expression and discourse structure: explorations of orality in Occitan and French 655034
