Project deliverable Open Access
<?xml version='1.0' encoding='utf-8'?> <resource xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://datacite.org/schema/kernel-4" xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4.1/metadata.xsd"> <identifier identifierType="DOI">10.5281/zenodo.7359654</identifier> <creators> <creator> <creatorName>De Santis, Luca</creatorName> <givenName>Luca</givenName> <familyName>De Santis</familyName> <nameIdentifier nameIdentifierScheme="ORCID" schemeURI="http://orcid.org/">0000-0003-0527-840X</nameIdentifier> <affiliation>Net7</affiliation> </creator> </creators> <titles> <title>TRIPLE Deliverable: D2.5 - Report on Data Enrichment</title> </titles> <publisher>Zenodo</publisher> <publicationYear>2022</publicationYear> <subjects> <subject>SSH</subject> <subject>Data enrichment</subject> <subject>Metdata</subject> <subject>Open Science</subject> <subject>OPERAS</subject> <subject>TRIPLE</subject> </subjects> <dates> <date dateType="Issued">2022-09-30</date> </dates> <language>en</language> <resourceType resourceTypeGeneral="Text">Project deliverable</resourceType> <alternateIdentifiers> <alternateIdentifier alternateIdentifierType="url">https://zenodo.org/record/7359654</alternateIdentifier> </alternateIdentifiers> <relatedIdentifiers> <relatedIdentifier relatedIdentifierType="DOI" relationType="IsVersionOf">10.5281/zenodo.7359653</relatedIdentifier> <relatedIdentifier relatedIdentifierType="URL" relationType="IsPartOf">https://zenodo.org/communities/operaseu</relatedIdentifier> </relatedIdentifiers> <version>Draft</version> <rightsList> <rights rightsURI="https://creativecommons.org/licenses/by/4.0/legalcode">Creative Commons Attribution 4.0 International</rights> <rights rightsURI="info:eu-repo/semantics/openAccess">Open Access</rights> </rightsList> <descriptions> <description descriptionType="Abstract"><p>In this deliverable, the strategies for data enrichment in TRIPLE are presented. Through the Core&nbsp;Pipeline, named SCRE, metadata regarding publications and projects for the Social Sciences and&nbsp;Humanities are automatically harvested, mapped in the TRIPLE data model, curated, enriched&nbsp;and finally saved in the GoTriple platform&rsquo;s indexes.<br> The document starts by presenting the ways SCRE imports publications metadata from&nbsp;OAI-PMH endpoints, OpenAIRE and Isidore data dumps. This reflects the strategies for&nbsp;integrating content which was planned in the project. On the one hand, OAI-PMH is a<br> well-known and established standard for content harvesting: many data providers, especially&nbsp;those of small dimension, support it, facilitating therefore their onboarding in GoTriple. The&nbsp;support for OpenAIRE and Isidore, on the other hand, responds to the wish to also harvest data&nbsp;from large aggregators, a strategy that allowed GoTriple to quickly present a significant amount<br> of publications in its index (more than 4 million at the time of writing).<br> Then the normalisation strategies applied to the acquired metadata are described. By analysing&nbsp;the first batches of acquired data, it has been decided to define the rules to normalise and clean&nbsp;the attributes for the following metadata: publication date, language codes, keywords,&nbsp;document types, licences, access rights and authors&rsquo; names. In the document, the definition of<br> controlled vocabularies for some of these attributes is also presented.&nbsp;</p> <p>Then enrichment services are explained, including language recognition, translation, automatic&nbsp;classification and annotation.<br> The services to detect duplicate publications and to disambiguate authors are also discussed,&nbsp;followed by the presentation of the acquisition and processing of project metadata&nbsp;</p> <p>Some final remarks on the data enrichment process, including the difficulties that have been<br> faced and solved, conclude the document.</p></description> <description descriptionType="Other">The TRIPLE project (https://project.gotriple.eu/), which is financed under the Horizon 2020 framework https://cordis.europa.eu/project/id/863420), under Grant Agreement No. 863420, with approx. 5.6 million Euros for a duration of 42 months (2019-2023). The content of this deliverable reflects only TRIPLE's view and the Commission is not responsible for any use that may be made of the information it contains. --- At the heart of the project is the development of the GoTriple platform (https://www.gotriple.eu/), an innovative multilingual and multicultural discovery solution.</description> </descriptions> <fundingReferences> <fundingReference> <funderName>European Commission</funderName> <funderIdentifier funderIdentifierType="Crossref Funder ID">10.13039/100010661</funderIdentifier> <awardNumber awardURI="info:eu-repo/grantAgreement/EC/H2020/863420/">863420</awardNumber> <awardTitle>Transforming Research through Innovative Practices for Linked interdisciplinary Exploration</awardTitle> </fundingReference> </fundingReferences> </resource>
All versions | This version | |
---|---|---|
Views | 183 | 183 |
Downloads | 120 | 120 |
Data volume | 185.0 MB | 185.0 MB |
Unique views | 171 | 171 |
Unique downloads | 111 | 111 |