Planned intervention: On Wednesday April 3rd 05:30 UTC Zenodo will be unavailable for up to 2-10 minutes to perform a storage cluster upgrade.
Published May 27, 2022 | Version V1.1
Poster Open

Multilingual Data Challenges in Professionalizing Data Stewardship worldwide

  • 1. ERINHA (European Research Infrastructure on Highly Pathogenic Agents) AISBL
  • 2. The University of Queensland
  • 3. University of California Santa Barbara
  • 4. Australian National University
  • 5. Educopia Institute
  • 6. DataCite
  • 7. University of São Paulo
  • 8. Freelance
  • 9. Center for Environmental Data Analysis
  • 10. Embrapa Digital Agriculture,
  • 11. American Geophysical Union
  • 12. University of Cape Town

Description

Profound changes in our world are exacerbating data availability challenges at the global level, in particular between scientists and other knowledge workers from regions separated by various features including historical, financial, cultural, political aspects, aside from time and space . Very few, if any, of our present problems such as biodiversity decline, climate change, and viral pandemics stop at national, disciplinary and linguistic boundaries, yet our most vital responses to the shared problems, the information generated to analyze and derive solutions, is still siloed in different languages and locations throughout the world. It is clear that in order for us to effectively respond, we need to collaborate globally and communicate  information more effectively. Globalization of research requires interoperability of our observations and experimentation systems.

The use of common FAIR vocabularies, that are both human and machine readable, is a key criterion in the FAIR principles (Principle I2 of Wilkinson et al 2016 specifies ‘(meta)data use vocabularies that follow FAIR principles’). Using common FAIR vocabularies will enable data interoperability and the necessary meta-analyses even when data have different origins and are based on multiple vocabularies. The objective of this poster is to offer an overview of the many multi-language challenges for effective Data Stewardship. For instance, some bottlenecks are highly dependent on community approval processes, because they are linked to data dictionary understandability, and/or related training challenges.

The discrepancies between regions (cultural, data content, means and translation) are numerous and occur both at the global level and for end users. We must anticipate issues such as choosing a preferred language, polysemy (1 term, multiple meanings), confusion (multiple terms for 1 meaning or ‘false friends’ between 2 languages), plus existing and evolving nuances (not an exact match between languages and during time). Furthermore, terms are often adopted from another language with different contexts and disciplinary realms (that might decrease interoperability) and impedes translation of all versions at the same time. Specifically regarding translation, a key point is that it occurs at the concept level, not as a simple one-on-one translation of (consecutive) words. Care must be taken to ensure that translation to indigenous languages results in datasets that can be used by the affected communities as part of projects that practice co-creation and co-evolution of knowledge.

Taking these challenges into account, we have to consider human efforts and the level of translation, e.g. a low or minimum yet sustainable level, that is legally allowable. How would these minimal objectives be linked with FAIR principle compliance? In several case studies, translation was voluntary. One of the sustainability challenges is how to keep interested groups involved, and the need for ongoing engagement. Finally, we bring up the need for expert translators, to maintain the quality level critical to achieve effective harmonization among languages.

Files

31. Multilingual Data Challenges in Professionalizing Data Stewardship worldwide.pdf

Additional details

Funding

EOSC-Life – Providing an open collaborative space for digital biology in Europe 824087
European Commission