Published February 2, 2023 | Version v1
Presentation Open

Workshop: CESSDA Data Catalogue - What can be Done at the Catalogue End and What Needs Harmonized Metadata?

  • 1. CESSDA

Description

CESSDA is the Consortium of European Social Science Data Archives. The CESSDA Data Catalogue provides metadata on research datasets in various languages, harvested from 20+ data repositories located in different countries. All use the same DDI metadata standard but it is not sufficient for a functional user interface (UI), harmonizing the metadata content itself is sometimes needed. For instance, country and data collection time filters in search interfaces rely on the machine-actionable codes in the metadata. The big question is: are there alternatives to this type of resource intensive metadata harmonization? Are there open source tools that would map all variants of country and other geographical names in a different language to the standardized codes, or different ways of documenting date information, and matching it with an ISO code? Social sciences would benefit from dialogue with other research domains relating to this.  Another big issue is how to design the search interface so that users can navigate confidently between the language(s) of the metadata, the actual data files and the user interface where the languages may be the same or different? The CESSDA Catalogue team decided to reduce choice by providing the interface in English only but the metadata in different languages. According to our testing, multilingual search gave incomplete results in some languages. Are there open tools where the language analyzers are good enough to manage multilingual search across languages correctly?

The aim of the workshop is to discuss possibilities for multilingual data catalogues and their user-facing interfaces in cases where there are no resource for AI enhancements; exchange information on useful tools and best practices; potential cooperation ideas.

Posssible discussion items are:

– Geographical location information

  • GeoNames.org  open source geographical database providing 27 million geographical names, matching country names in different languages to ISO codes and continents, administrative divisions etc.

– EU Semantic Interoperability  Catalogue

– Search interface functionalities and language choices

– Language analyzers

Notes

The TRIPLE project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 863420 Disclaimer. The content of this publication is the sole responsibility of the author and can in no way be taken to reflect the views of the European Commission. The European Commission is not responsible for any use that may be made of the information it contains.

Files

Taina Jaaskelainen TRIPLE workshop.pdf

Files (540.6 kB)

Name Size Download all
md5:5b6cb354e9e63b3d9a8a4a5edd9777a4
540.6 kB Preview Download

Additional details

Related works

Is part of
Report: 10.5281/zenodo.7704572 (DOI)

Funding

European Commission
TRIPLE – Transforming Research through Innovative Practices for Linked interdisciplinary Exploration 863420