Published February 20, 2017 | Version v1
Conference paper Open

Mapping Large Scale Research Metadata to Linked Data: A Performance Comparison of HBase, CSV and XML

  • 1. University of Bonn, Germany
  • 2. University of Bonn,Germany
  • 3. National Chung Hsing University, Taiwan
  • 4. University of Bonn & Fraunhofer IAIS, Germany

Description

OpenAIRE, the Open Access Infrastructure for Research in Europe, comprises a database of all EC FP7 and H2020 funded research projects, including metadata of their results (publications and datasets).
These data are stored in an HBase NoSQL database, post-processed, and exposed as HTML for human consumption, and as XML through a web service interface. As an intermediate format to facilitate statistical computations, CSV is generated internally. To interlink the OpenAIRE data with related data on the Web, we aim at exporting them as Linked Open Data (LOD). The LOD export is required to integrate into the overall data processing workflow, where derived data are regenerated from the base data every day. We thus faced the challenge of identifying the best-performing conversion approach.We evaluated the performances of creating LOD by a MapReduce job on top of HBase, by mapping the intermediate CSV files, and by mapping the XML output.

Files

1506.04006.pdf

Files (534.7 kB)

Name Size Download all
md5:0579bc44786049a9113cb9672c42a51c
534.7 kB Preview Download

Additional details

Funding

European Commission
OpenAIRE2020 - Open Access Infrastructure for Research in Europe 2020 643410