Loflòc: A Morphological Lexicon for Occitan using Universal Dependencies
Contributors
Data collector:
Data curators:
Description
LOFLOC -- Lexic obèrt flechit Occitan (Open Inflected Lexicon of Occitan)
Loflòc is a morphological lexicon for Occitan, a Romance language spoken in the south of France and in parts of Italy and Spain. Occitan is not recognized as an official language in France and no standard variety is shared across the linguistic area. To the best of our knowledge, Loflòc is the first publicly available lexicon for Occitan. It contains 680 thousand entries for 57 thousand lemmas. Each entry contains an inflected form, its lemma and its part-of-speech tag according to the Universal Dependencies guidelines. Currently, the lexicon only contains the Lengadocian variety and the classical spelling norm. Nevertheless, it has been shown to be useful even for processing texts from other varieties (for more details, see Vergez-Couret et al., 2024; full reference below).
Files
lofloc_UD_v1.0.zip
Files
(2.1 MB)
Name | Size | Download all |
---|---|---|
md5:d26c6c99b83cb49e844347af15f4f8f0
|
2.1 MB | Preview Download |
Additional details
Funding
- RESTAURE – Computational Resources and Processing for Regional Languages ANR-14-CE24-0003
- Agence Nationale de la Recherche
- DIVITAL – Increase the DIgital VITALity and visibility of languages of France: linguistic descriptions and annotated corpora ANR-21-CE27-0004
- Agence Nationale de la Recherche
- CorCoDial - Corpus-based computational dialectology: exploiting machine translation techniques to extract, visualize and interpret dialectal patterns 342859
- Research Council of Finland
Dates
- Available
-
2024-06
References
- Marianne Vergez-Couret, Myriam Bras, Aleksandra Miletić, and Clamença Poujade. 2024. Loflòc: A Morphological Lexicon for Occitan using Universal Dependencies. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 10716–10724, Torino, Italia. ELRA and ICCL.