Published September 28, 2018
| Version 2018.09.15
Dataset
Open
Resource Metadata Harvested from Government and Research Open Data Portals
Authors/Creators
Description
This dataset consists of resource metadata harvested from the APIs of hundreds of government and research data portals from all over the world. This dataset was harvested between the 13th and 15th of September 2018. The metadata harvested from these portals was translated to a single metadata format (see metadata_format.odt). An overview of all harvested domains is given in portal_list.txt.
The harvested data is divided into five gzipped json-lines files, based on the ‘type’ of the resource that is derived from the data of the APIs:
- dataset_metadata.jsonl.gz: Resources classified as a Dataset, or subsets of dataset (e.g. Dataset:Image and Dataset:Audio) [6 246 250 resources]
- document_metadata.jsonl.gz: Resources classified as a Document, or subset of document (e.g. Document:Paper:Conference and Document:Book) [15 626 541 resources]
- software_metadata.jsonl.gz: Resources classified as Sofware (including Software:Model) [42 036 resources]
- service_metadata.jsonl.gz: Resources classified as a service (e.g. WMS, APIs) [1257 resources]
- other_metadata.jsonl.gz: Resources of which the ‘type’ could not be determined from the data the API returned. This set still contains many datasets [1 502 979 resources]
Notes
Files
portal_list.txt
Files
(10.4 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:31caa4dd5fd01091d5a3a60198a8bcb9
|
1.7 GB | Download |
|
md5:b3d199ca5b271850bdf6b370a800ec3e
|
8.0 GB | Download |
|
md5:11e4e41c38119f349db18ebde78f963b
|
26.8 kB | Download |
|
md5:e09a02f822965a035ef72b896c84189a
|
643.3 MB | Download |
|
md5:c775e5d4dcf7e8aa57b2e912d8febf61
|
7.4 kB | Preview Download |
|
md5:43c38775eecde9032d9590352941b85e
|
273.4 kB | Download |
|
md5:eddb0474511e6c6af447f37d721c9732
|
8.6 MB | Download |