Published November 7, 2025 | Version 1.0
Dataset Open

Lilamorph Wikibase: Latin Inflected Forms and Corpus Attestations

  • 1. ROR icon University of the Basque Country
  • 1. ROR icon University of the Basque Country
  • 2. ROR icon Università Cattolica del Sacro Cuore

Description

We are using a Wikibase instance (https://lilamorph.wikibase.cloud) for publishing a Latin verb forms dataset, with the final goal of enriching Wikidata Latin lexemes, and for corpus annotation (matching tokens in morphologically annotated corpora to Wikibase forms).

Building on the PrinParLat lexicon of Latin verb principal parts, we generate the complete set of inflected forms for over 8,000 verbs, encoded as RDF in a dedicated Wikibase instance. These data are linked to the Index Thomisticus Treebank (ITTB), whose morphologically annotated tokens are related to corresponding forms based on segmental identity, lemma alignment, and mapped morphological features. Our method achieves over 95% coverage of ITTB verbal tokens, demonstrating the robustness of our generation and linking pipeline even for Medieval Latin data. By aligning Paralex, Wikidata, and LiLa ontologies, we ensure semantic interoperability and facilitate future integration into Wikidata. Beyond Latin, this workflow provides a reproducible model for linking inflectional paradigms and corpus attestations in other languages. 

With the different forms lexica built on our Wikibase instance, we are now in the position to contribute to a discussion in the Wikidata community, comparing different options of representation of inflected forms. We would like to highlight corpus token linking as central use case for Wikibase forms, which entails to adopt the data model that caters best for that application, namely a separate listing of orthographically identical but morphologically ambiguous forms. 

Having chosen Wikibase as platform for the experiments presented here, all datasets remain now ready for intervention of human or algorithmic users, who would mark ambiguous links (from token to form, or from token to lila lemma), as “preferred” or “deprecated”, so that the ambiguity is resolved.

Files

lilamorph_wikibase.zip

Files (81.8 MB)

Name Size Download all
md5:4a08550217c0047502b18985402e134f
81.8 MB Preview Download

Additional details

Dates

Created
2025-11-07
Date of ZIP file creation, from Github

Software