Project deliverable Open Access

TRIPLE Deliverable: D2.5 - Report on Data Enrichment

De Santis, Luca

JSON-LD ( Export

  "inLanguage": {
    "alternateName": "eng", 
    "@type": "Language", 
    "name": "English"
  "description": "<p>In this deliverable, the strategies for data enrichment in TRIPLE are presented. Through the Core&nbsp;Pipeline, named SCRE, metadata regarding publications and projects for the Social Sciences and&nbsp;Humanities are automatically harvested, mapped in the TRIPLE data model, curated, enriched&nbsp;and finally saved in the GoTriple platform&rsquo;s indexes.<br>\nThe document starts by presenting the ways SCRE imports publications metadata from&nbsp;OAI-PMH endpoints, OpenAIRE and Isidore data dumps. This reflects the strategies for&nbsp;integrating content which was planned in the project. On the one hand, OAI-PMH is a<br>\nwell-known and established standard for content harvesting: many data providers, especially&nbsp;those of small dimension, support it, facilitating therefore their onboarding in GoTriple. The&nbsp;support for OpenAIRE and Isidore, on the other hand, responds to the wish to also harvest data&nbsp;from large aggregators, a strategy that allowed GoTriple to quickly present a significant amount<br>\nof publications in its index (more than 4 million at the time of writing).<br>\nThen the normalisation strategies applied to the acquired metadata are described. By analysing&nbsp;the first batches of acquired data, it has been decided to define the rules to normalise and clean&nbsp;the attributes for the following metadata: publication date, language codes, keywords,&nbsp;document types, licences, access rights and authors&rsquo; names. In the document, the definition of<br>\ncontrolled vocabularies for some of these attributes is also presented.&nbsp;</p>\n\n<p>Then enrichment services are explained, including language recognition, translation, automatic&nbsp;classification and annotation.<br>\nThe services to detect duplicate publications and to disambiguate authors are also discussed,&nbsp;followed by the presentation of the acquisition and processing of project metadata&nbsp;</p>\n\n<p>Some final remarks on the data enrichment process, including the difficulties that have been<br>\nfaced and solved, conclude the document.</p>", 
  "license": "", 
  "creator": [
      "affiliation": "Net7", 
      "@id": "", 
      "@type": "Person", 
      "name": "De Santis, Luca"
  "url": "", 
  "datePublished": "2022-09-30", 
  "version": "Draft", 
  "keywords": [
    "Data enrichment", 
    "Open Science", 
  "@context": "", 
  "identifier": "", 
  "@id": "", 
  "@type": "CreativeWork", 
  "name": "TRIPLE Deliverable: D2.5 - Report on Data Enrichment"
All versions This version
Views 183183
Downloads 120120
Data volume 185.0 MB185.0 MB
Unique views 171171
Unique downloads 111111


Cite as