Info: Zenodo’s user support line is staffed on regular business days between Dec 23 and Jan 5. Response times may be slightly longer than normal.

Published November 19, 2020 | Version 1.0
Dataset Open

Dataset relating a study on Geospatial Open Data usage and metadata quality

  • 1. Institute for Applied Mathematics and Information Technologies National Research Council, Genoa, Italy

Contributors

Contact person:

  • 1. Institute for Applied Mathematics and Information Technologies National Research Council, Genoa, Italy

Description

The Open Government Data portals (OGD) thanks to the presence of thousands of geo-referenced datasets, containing spatial information, are of extreme interest for any analysis or process relating to the territory. For this to happen, users must be enabled to access these datasets and reuse them. An element often considered hindering the full dissemination of OGD data is the quality of their metadata. Starting from an experimental investigation conducted on over 160,000 geospatial datasets belonging to six national and international OGD portals, this work has as its first objective to provide an overview of the usage of these portals measured in terms of datasets views and downloads. Furthermore, to assess the possible influence of the quality of the metadata on the use of geospatial datasets, an assessment of the metadata for each dataset was carried out, and the correlation between these two variables was measured. The results obtained showed a significant underutilization of geospatial datasets and a generally poor quality of their metadata. Besides, a weak correlation was found between the use and quality of the metadata, not such as to assert with certainty that the latter is a determining factor of the former.

 

The dataset consists of six zipped CSV files, containing the collected datasets' usage data, full metadata, and computed quality values, for about 160,000 geospatial datasets belonging to the three national and three international portals considered in the study, i.e. US (catalog.data.gov), Colombia (datos.gov.co), Ireland (data.gov.ie), HDX (data.humdata.org), EUODP (data.europa.eu), and NASA (data.nasa.gov).

Data collection occurred in the period:   2019-12-19  --   2019-12-23.

 

The header for each CSV file is:

[ ,portalid,id,downloaddate,metadata,overallq,qvalues,assessdate,dviews,downloads,engine,admindomain]

where for each row (a portal's dataset) the following fields are defined as follows:

  • portalid: portal identifier
  • id: dataset identifier 
  • downloaddate: date of data collection
  • overallq: overall quality values computed by applying the methodology presented in [1]
  • qvalues:  json object containing the quality values computed for the 17 metrics presented in [1]
  • assessdate: date of quality assessment
  • dviews: number of total views for the dataset
  • downloads: number of total downloads for the dataset (made available only by the Colombia, HDX, and NASA portals)
  • engine: identifier of the supporting portal platform: 1(CKAN), 2 (Socrata)
  • admindomain: 1 (national), 3 (international)
  • metadata: the overall dataset's metadata downloaded via API from the portal according to the supporting platform schema

 

[1] Neumaier, S.; Umbrich, J.; Polleres, A. Automated Quality Assessment of Metadata Across Open Data Portals.J. Data and Information Quality2016,8, 2:1–2:29. doi:10.1145/2964909

Notes

the dataset is created to support the analysis presented in: Quarati, Alfonso; De Martino, Monica; Rosim, Sergio. 2021. "Geospatial Open Data Usage and Metadata Quality" ISPRS Int. J. Geo-Inf. 10, no. 1: 30. https://doi.org/10.3390/ijgi10010030

Files

Colombia-Portal-Geo-data.csv

Files (2.2 GB)

Name Size Download all
md5:f44d04c18838de5b1fbd9e00ea20bcad
2.0 MB Preview Download
md5:3724c925b8d1b506146c0a8b221ca47d
15.4 MB Preview Download
md5:4ca742a73e9a9207fde8b9a5360d91c0
60.7 MB Preview Download
md5:29b945a84dc309f02c13d39ff73ec6aa
8.4 MB Preview Download
md5:e9f2223269bdc6028d9fe8b3272e8eaf
3.6 MB Preview Download
md5:e46e9b5a24c2bda0e1efb8e63d632618
2.1 GB Preview Download

Additional details

References

  • Neumaier, S.; Umbrich, J.; Polleres, A. Automated Quality Assessment of Metadata Across Open Data Portals. J. Data and Information Quality 2016,8, 2:1–2:29. doi:10.1145/2964909