Info: Zenodo’s user support line is staffed on regular business days between Dec 23 and Jan 5. Response times may be slightly longer than normal.

Published March 19, 2024 | Version v1
Dataset Open

Loflòc: A Morphological Lexicon for Occitan using Universal Dependencies

  • 1. Université de Poitiers
  • 2. ROR icon Université Toulouse - Jean Jaurès
  • 3. ROR icon University of Helsinki
  • 1. Université de Poitiers
  • 2. ROR icon Université Toulouse - Jean Jaurès
  • 3. ROR icon University of Helsinki

Description

LOFLOC -- Lexic obèrt flechit Occitan (Open Inflected Lexicon of Occitan)

Loflòc is a morphological lexicon for Occitan, a Romance language spoken in the south of France and in parts of Italy and Spain. Occitan is not recognized as an official language in France and no standard variety is shared across the linguistic area. To the best of our knowledge, Loflòc is the first publicly available lexicon for Occitan. It contains 680 thousand entries for 57 thousand lemmas. Each entry contains an inflected form, its lemma and its part-of-speech tag according to the Universal Dependencies guidelines. Currently, the lexicon only contains the Lengadocian variety and the classical spelling norm. Nevertheless, it has been shown to be useful even for processing texts from other varieties (for more details, see Vergez-Couret et al., 2024; full reference below).

Files

lofloc_UD_v1.0.zip

Files (2.1 MB)

Name Size Download all
md5:d26c6c99b83cb49e844347af15f4f8f0
2.1 MB Preview Download

Additional details

Funding

RESTAURE – Computational Resources and Processing for Regional Languages ANR-14-CE24-0003
Agence Nationale de la Recherche
DIVITAL – Increase the DIgital VITALity and visibility of languages of France: linguistic descriptions and annotated corpora ANR-21-CE27-0004
Agence Nationale de la Recherche
CorCoDial - Corpus-based computational dialectology: exploiting machine translation techniques to extract, visualize and interpret dialectal patterns 342859
Research Council of Finland

Dates

Available
2024-06

References

  • Marianne Vergez-Couret, Myriam Bras, Aleksandra Miletić, and Clamença Poujade. 2024. Loflòc: A Morphological Lexicon for Occitan using Universal Dependencies. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 10716–10724, Torino, Italia. ELRA and ICCL.