Published August 31, 2021
| Version v1
Dataset
Open
FDup deduplication software data benchmark: 10Mi OpenAIRE Publications Dump
Description
This dataset is a random subset of publications extracted from the OpenAIRE Research Graph (http://doi.org/10.5281/zenodo.4707307). The dataset contains ~10Mi JSON publications records.
The file is a zip archive containing gz files, each with one JSON per line. Each JSON is compliant to the schema available at http://doi.org/10.5281/zenodo.4723403.
Learn more about the OpenAIRE Research Graph at https://graph.openaire.eu.
Files
publications_dump_10Mi.zip
Files
(10.5 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:48a6bf9959b859828d3a0377affc4b9f
|
10.5 GB | Preview Download |
Additional details
References
- FDup: a framework for general-purpose and efficient entity deduplication of record collections: https://peerj.com/articles/cs-1058/