Published April 28, 2026 | Version v1
Preprint Open

Data sharing and standardization in Linguistics

  • 1. ROR icon University of Surrey
  • 2. ROR icon Université Paris Cité
  • 3. ROR icon Laboratoire de Linguistique Formelle
  • 4. ROR icon University of Melbourne
  • 5. Surrey Morphology Group, University of Surrey, Guildford, United Kingdom
  • 6. Laboratoire de linguistique formelle
  • 7. ROR icon University of York
  • 8. Department of Linguistic, Literary and Aesthetic Studies, Universitetet i Bergen, Bergen, Norway
  • 9. ROR icon University of Zurich
  • 10. ROR icon The Ohio State University
  • 11. ROR icon University of California, San Diego
  • 12. ROR icon San Diego State University
  • 13. ROR icon Brigham Young University

Description

Linguistic typology stands to gain significantly from advances in the use of extremely large datasets. However, our ability to secure these gains will depend on the availability of machine-readable data that is precise and comparable. Here we identify the challenges and opportunities ahead, relating to the quality, longevity, and (re-)usability of linguistic data in typology. Then in response, we introduce the DeAR principles (Decentralized, Automatically verified, Revisable), designed to guide and assist researchers to create diverse, high-resolution and robust datasets. We demonstrate the DeAR principles in action through the example of Paralex, a data standard (i.e., set of scientific conventions) developed collaboratively for lexicons of morphologically inflected forms. Our proposals aim to foster a more resilient and equitable infrastructure for the future of linguistic research.

Files

Paralex_paper_LITY__preprint.pdf

Files (434.9 kB)

Name Size Download all
md5:1176d1b2fc2aa2cb673f5d29b12568c7
434.9 kB Preview Download

Additional details

Additional titles

Subtitle (English)
Introducing the DeAR principles and Paralex

Related works

Describes
Standard: https://paralex-standard.org/ (URL)

Funding

Leverhulme Trust
What Is Understood ? Simulating Human-Scale Word Comprehension Using AI ECF-2022-286
UK Research and Innovation
REVOLUPHON - Rational Evolutionary Phonology EP/Y02429X/1
European Commission
MOLOR - Morphologically Linked Old Irish Resource 101106220
Leverhulme Trust
Enhancing comparability: modelling linguistic systems to advance typology IF-2021-015