Data sharing and standardization in Linguistics
Authors/Creators
-
Beniamine, Sacha1
-
Bouton, Jules2, 3
-
Carroll, Mae4
-
Pellegrini, Matteo1
-
Anderson, Cormac1
-
Round, Erich5
-
Bonami, Olivier2, 6
-
Brown, Dunstan Patrick7
-
Guzmán Naranjo, Matías8
-
Herce, Borja9
-
Sims, Andrea10
-
Sims-Williams, Helen1
-
Ackerman, Farrell11
-
Corbett, Greville1
-
Malouf, Robert12
-
Parker, Jeff13
-
Fransen, Theodorus
-
1.
University of Surrey
-
2.
Université Paris Cité
-
3.
Laboratoire de Linguistique Formelle
-
4.
University of Melbourne
- 5. Surrey Morphology Group, University of Surrey, Guildford, United Kingdom
- 6. Laboratoire de linguistique formelle
-
7.
University of York
- 8. Department of Linguistic, Literary and Aesthetic Studies, Universitetet i Bergen, Bergen, Norway
-
9.
University of Zurich
-
10.
The Ohio State University
-
11.
University of California, San Diego
-
12.
San Diego State University
-
13.
Brigham Young University
Description
Linguistic typology stands to gain significantly from advances in the use of extremely large datasets. However, our ability to secure these gains will depend on the availability of machine-readable data that is precise and comparable. Here we identify the challenges and opportunities ahead, relating to the quality, longevity, and (re-)usability of linguistic data in typology. Then in response, we introduce the DeAR principles (Decentralized, Automatically verified, Revisable), designed to guide and assist researchers to create diverse, high-resolution and robust datasets. We demonstrate the DeAR principles in action through the example of Paralex, a data standard (i.e., set of scientific conventions) developed collaboratively for lexicons of morphologically inflected forms. Our proposals aim to foster a more resilient and equitable infrastructure for the future of linguistic research.
Files
Paralex_paper_LITY__preprint.pdf
Files
(434.9 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:1176d1b2fc2aa2cb673f5d29b12568c7
|
434.9 kB | Preview Download |
Additional details
Additional titles
- Subtitle (English)
- Introducing the DeAR principles and Paralex
Related works
- Describes
- Standard: https://paralex-standard.org/ (URL)
Funding
- Leverhulme Trust
- What Is Understood ? Simulating Human-Scale Word Comprehension Using AI ECF-2022-286
- UK Research and Innovation
- REVOLUPHON - Rational Evolutionary Phonology EP/Y02429X/1
- European Commission
- MOLOR - Morphologically Linked Old Irish Resource 101106220
- Leverhulme Trust
- Enhancing comparability: modelling linguistic systems to advance typology IF-2021-015