Published January 13, 2023 | Version DRAFT
Project deliverable Open

TRIPLE Deliverable: D2.4 Report on identification and creation of new vocabularies

  • 1. EKT
  • 2. IBL PAN

Description

The GoTriple platform is a discovery service for SSH publications. It can be classed as an aggregator since it harvests publication metadata records from distributed sources (namely other aggregators or repositories). During the ingestion pipeline, it transforms metadata records into the Triple Data Model, it performs a series of cleansing, normalisation and enrichment procedures - in order to deal with metadata heterogeneity, increase multilingualism and improve content searchability and discoverability - and, finally, it stores and indexes the enriched metadata records, making them searchable via the GoTriple search engine.
Two of the most important enrichment procedures that metadata records undergo are classification and annotation. The former uses machine learning technology to automatically classify each publication using the MORESS classification scheme (D2.31). The latter searches specific metadata fields of a record (titles, descriptions/abstracts and subjects/keywords) to assign them concepts from a multilingual LOD vocabulary of SSH concepts. The record is then updated with the respective links (concept URIs) to the concepts, as well as all available labels in the different languages. We call a concept URI with all the available labels that we add to a
metadata record an annotation or Triple Keyword. Triple keywords are distinguished from the subjects/keywords of the original metadata. Since objects are indexed with annotation labels in all available languages, they are found when a search term matches an annotation label in any of the available languages. This way, both searchability and multilingualism are increased.
This deliverable describes the work and presents the outcome of task T2.4 “Cartography and creation of new vocabularies”. The objective of the task was to create a vocabulary of SSH concepts with labels in the 10 languages supported by the annotation service. The outcome is the GoTriple Vocabulary, a multilingual hierarchical set of 3,375 SSH-related concepts. It is a subset of LCSH (Library of Congress Subject Headings) that covers popular SSH subject areas. The English labels are enhanced with labels in Greek, French, Polish, German, Italian, Portuguese, Spanish, Croatian and Ukrainian. The vocabulary conforms to the SKOS data model and is published as Linked Open Data (LOD) under http://semantics.gr/authorities/vocabularies/SSH-LCSH in Semantics.gr, which is a platform developed by EKT for managing and publishing LOD vocabularies, thesauri and authority files of any schema. The vocabulary is used by the annotation service but, at the same time, is a standalone product, since it is published under an open license and can be used by the SSH research communities. The biggest challenges we faced in creating the vocabulary were a) choosing a base vocabulary b) defining a reasonable number of SSH concepts and c) adding labels in all GoTriple languages

Notes

The TRIPLE project (https://project.gotriple.eu/), which is financed under the Horizon 2020 framework (https://cordis.europa.eu/project/id/863420), under Grant Agreement No. 863420, with approx. 5.6 million Euros for a duration of 42 months (2019-2023). The content of this deliverable reflects only TRIPLE's view and the Commission is not responsible for any use that may be made of the information it contains. --- At the heart of the project is the development of the GoTriple platform (https://www.gotriple.eu/), an innovative multilingual and multicultural discovery solution.

Files

D2.4 Report on identification and creation of new vocabularies_DRAFT.pdf

Additional details

Funding

TRIPLE – Transforming Research through Innovative Practices for Linked interdisciplinary Exploration 863420
European Commission