Project deliverable Open Access

TRIPLE Deliverable: D2.5 - Report on Data Enrichment

De Santis, Luca


DCAT Export

<?xml version='1.0' encoding='utf-8'?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:adms="http://www.w3.org/ns/adms#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dct="http://purl.org/dc/terms/" xmlns:dctype="http://purl.org/dc/dcmitype/" xmlns:dcat="http://www.w3.org/ns/dcat#" xmlns:duv="http://www.w3.org/ns/duv#" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:frapo="http://purl.org/cerif/frapo/" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:gsp="http://www.opengis.net/ont/geosparql#" xmlns:locn="http://www.w3.org/ns/locn#" xmlns:org="http://www.w3.org/ns/org#" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:prov="http://www.w3.org/ns/prov#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:schema="http://schema.org/" xmlns:skos="http://www.w3.org/2004/02/skos/core#" xmlns:vcard="http://www.w3.org/2006/vcard/ns#" xmlns:wdrs="http://www.w3.org/2007/05/powder-s#">
  <rdf:Description rdf:about="https://doi.org/10.5281/zenodo.7359654">
    <rdf:type rdf:resource="http://www.w3.org/ns/dcat#Dataset"/>
    <dct:type rdf:resource="http://purl.org/dc/dcmitype/Text"/>
    <dct:identifier rdf:datatype="http://www.w3.org/2001/XMLSchema#anyURI">https://doi.org/10.5281/zenodo.7359654</dct:identifier>
    <foaf:page rdf:resource="https://doi.org/10.5281/zenodo.7359654"/>
    <dct:creator>
      <rdf:Description rdf:about="http://orcid.org/0000-0003-0527-840X">
        <rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Agent"/>
        <dct:identifier rdf:datatype="http://www.w3.org/2001/XMLSchema#string">0000-0003-0527-840X</dct:identifier>
        <foaf:name>De Santis, Luca</foaf:name>
        <foaf:givenName>Luca</foaf:givenName>
        <foaf:familyName>De Santis</foaf:familyName>
        <org:memberOf>
          <foaf:Organization>
            <foaf:name>Net7</foaf:name>
          </foaf:Organization>
        </org:memberOf>
      </rdf:Description>
    </dct:creator>
    <dct:title>TRIPLE Deliverable: D2.5 - Report on Data Enrichment</dct:title>
    <dct:publisher>
      <foaf:Agent>
        <foaf:name>Zenodo</foaf:name>
      </foaf:Agent>
    </dct:publisher>
    <dct:issued rdf:datatype="http://www.w3.org/2001/XMLSchema#gYear">2022</dct:issued>
    <dcat:keyword>SSH</dcat:keyword>
    <dcat:keyword>Data enrichment</dcat:keyword>
    <dcat:keyword>Metdata</dcat:keyword>
    <dcat:keyword>Open Science</dcat:keyword>
    <dcat:keyword>OPERAS</dcat:keyword>
    <dcat:keyword>TRIPLE</dcat:keyword>
    <frapo:isFundedBy rdf:resource="info:eu-repo/grantAgreement/EC/H2020/863420/"/>
    <schema:funder>
      <foaf:Organization>
        <dct:identifier rdf:datatype="http://www.w3.org/2001/XMLSchema#string">10.13039/100010661</dct:identifier>
        <foaf:name>European Commission</foaf:name>
      </foaf:Organization>
    </schema:funder>
    <dct:issued rdf:datatype="http://www.w3.org/2001/XMLSchema#date">2022-09-30</dct:issued>
    <dct:language rdf:resource="http://publications.europa.eu/resource/authority/language/ENG"/>
    <owl:sameAs rdf:resource="https://zenodo.org/record/7359654"/>
    <adms:identifier>
      <adms:Identifier>
        <skos:notation rdf:datatype="http://www.w3.org/2001/XMLSchema#anyURI">https://zenodo.org/record/7359654</skos:notation>
        <adms:schemeAgency>url</adms:schemeAgency>
      </adms:Identifier>
    </adms:identifier>
    <dct:isVersionOf rdf:resource="https://doi.org/10.5281/zenodo.7359653"/>
    <dct:isPartOf rdf:resource="https://zenodo.org/communities/operaseu"/>
    <owl:versionInfo>Draft</owl:versionInfo>
    <dct:description>&lt;p&gt;In this deliverable, the strategies for data enrichment in TRIPLE are presented. Through the Core&amp;nbsp;Pipeline, named SCRE, metadata regarding publications and projects for the Social Sciences and&amp;nbsp;Humanities are automatically harvested, mapped in the TRIPLE data model, curated, enriched&amp;nbsp;and finally saved in the GoTriple platform&amp;rsquo;s indexes.&lt;br&gt; The document starts by presenting the ways SCRE imports publications metadata from&amp;nbsp;OAI-PMH endpoints, OpenAIRE and Isidore data dumps. This reflects the strategies for&amp;nbsp;integrating content which was planned in the project. On the one hand, OAI-PMH is a&lt;br&gt; well-known and established standard for content harvesting: many data providers, especially&amp;nbsp;those of small dimension, support it, facilitating therefore their onboarding in GoTriple. The&amp;nbsp;support for OpenAIRE and Isidore, on the other hand, responds to the wish to also harvest data&amp;nbsp;from large aggregators, a strategy that allowed GoTriple to quickly present a significant amount&lt;br&gt; of publications in its index (more than 4 million at the time of writing).&lt;br&gt; Then the normalisation strategies applied to the acquired metadata are described. By analysing&amp;nbsp;the first batches of acquired data, it has been decided to define the rules to normalise and clean&amp;nbsp;the attributes for the following metadata: publication date, language codes, keywords,&amp;nbsp;document types, licences, access rights and authors&amp;rsquo; names. In the document, the definition of&lt;br&gt; controlled vocabularies for some of these attributes is also presented.&amp;nbsp;&lt;/p&gt; &lt;p&gt;Then enrichment services are explained, including language recognition, translation, automatic&amp;nbsp;classification and annotation.&lt;br&gt; The services to detect duplicate publications and to disambiguate authors are also discussed,&amp;nbsp;followed by the presentation of the acquisition and processing of project metadata&amp;nbsp;&lt;/p&gt; &lt;p&gt;Some final remarks on the data enrichment process, including the difficulties that have been&lt;br&gt; faced and solved, conclude the document.&lt;/p&gt;</dct:description>
    <dct:description>The TRIPLE project (https://project.gotriple.eu/), which is financed under the Horizon 2020 framework https://cordis.europa.eu/project/id/863420), under Grant Agreement No. 863420, with approx. 5.6 million Euros for a duration of 42 months (2019-2023). The content of this deliverable reflects only TRIPLE's view and the Commission is not responsible for any use that may be made of the information it contains. --- At the heart of the project is the development of the GoTriple platform (https://www.gotriple.eu/), an innovative multilingual and multicultural discovery solution.</dct:description>
    <dct:accessRights rdf:resource="http://publications.europa.eu/resource/authority/access-right/PUBLIC"/>
    <dct:accessRights>
      <dct:RightsStatement rdf:about="info:eu-repo/semantics/openAccess">
        <rdfs:label>Open Access</rdfs:label>
      </dct:RightsStatement>
    </dct:accessRights>
    <dcat:distribution>
      <dcat:Distribution>
        <dct:license rdf:resource="https://creativecommons.org/licenses/by/4.0/legalcode"/>
        <dcat:accessURL rdf:resource="https://doi.org/10.5281/zenodo.7359654"/>
      </dcat:Distribution>
    </dcat:distribution>
    <dcat:distribution>
      <dcat:Distribution>
        <dcat:accessURL rdf:resource="https://doi.org/10.5281/zenodo.7359654"/>
        <dcat:byteSize>1541955</dcat:byteSize>
        <dcat:downloadURL rdf:resource="https://zenodo.org/record/7359654/files/D2.5-Report on data enrichment-1.0_TRIPLE.pdf"/>
        <dcat:mediaType>application/pdf</dcat:mediaType>
      </dcat:Distribution>
    </dcat:distribution>
  </rdf:Description>
  <foaf:Project rdf:about="info:eu-repo/grantAgreement/EC/H2020/863420/">
    <dct:identifier rdf:datatype="http://www.w3.org/2001/XMLSchema#string">863420</dct:identifier>
    <dct:title>Transforming Research through Innovative Practices for Linked interdisciplinary Exploration</dct:title>
    <frapo:isAwardedBy>
      <foaf:Organization>
        <dct:identifier rdf:datatype="http://www.w3.org/2001/XMLSchema#string">10.13039/100010661</dct:identifier>
        <foaf:name>European Commission</foaf:name>
      </foaf:Organization>
    </frapo:isAwardedBy>
  </foaf:Project>
</rdf:RDF>
183
120
views
downloads
All versions This version
Views 183183
Downloads 120120
Data volume 185.0 MB185.0 MB
Unique views 171171
Unique downloads 111111

Share

Cite as