Published September 28, 2018 | Version 2018.09.15
Dataset Open

Resource Metadata Harvested from Government and Research Open Data Portals

Authors/Creators

Description

This dataset consists of resource metadata harvested from the APIs of hundreds of government and research data portals from all over the world. This dataset was harvested between the 13th and 15th of September 2018. The metadata harvested from these portals was translated to a single metadata format (see metadata_format.odt). An overview of all harvested domains is given in portal_list.txt.

The harvested data is divided into five gzipped json-lines files, based on the ‘type’ of the resource that is derived from the data of the APIs:

  • dataset_metadata.jsonl.gz: Resources classified as a Dataset, or subsets of dataset (e.g. Dataset:Image and Dataset:Audio) [6 246 250 resources]
  • document_metadata.jsonl.gz: Resources classified as a Document, or subset of document (e.g. Document:Paper:Conference and Document:Book) [15 626 541 resources]
  • software_metadata.jsonl.gz: Resources classified as Sofware (including Software:Model) [42 036 resources]
  • service_metadata.jsonl.gz: Resources classified as a service (e.g. WMS, APIs) [1257 resources]
  • other_metadata.jsonl.gz: Resources of which the ‘type’ could not be determined from the data the API returned. This set still contains many datasets [1 502 979 resources]

Notes

The data is provided as Gzipped JSON lines files, further defined here: http://jsonlines.org/

Files

portal_list.txt

Files (10.4 GB)

Name Size Download all
md5:31caa4dd5fd01091d5a3a60198a8bcb9
1.7 GB Download
md5:b3d199ca5b271850bdf6b370a800ec3e
8.0 GB Download
md5:11e4e41c38119f349db18ebde78f963b
26.8 kB Download
md5:e09a02f822965a035ef72b896c84189a
643.3 MB Download
md5:c775e5d4dcf7e8aa57b2e912d8febf61
7.4 kB Preview Download
md5:43c38775eecde9032d9590352941b85e
273.4 kB Download
md5:eddb0474511e6c6af447f37d721c9732
8.6 MB Download