Dataset relating to the study "Open government data: usage trends and metadata quality"
Creators
- 1. Institute for Applied Mathematics and Information Technologies - National Research Council, Italy
Contributors
Contact person:
- 1. Institute for Applied Mathematics and Information Technologies, National Research Council, Italy
Description
Open Government Data (OGD) has the potential to support social and economic progress. However, this potential can be frustrated if this data remains unused. Although the literature suggests that OGD datasets' metadata quality is one of the main factors affecting their use, to the best of our knowledge, no quantitative study provided evidence of this relationship. Considering about 400,000 datasets of 28 national, municipal, and international OGD portals, we have programmatically analyzed their usage, their metadata quality, and the relationship between the two. Our analysis has highlighted three main findings. First of all, regardless of their size, the software platform adopted, and their administrative and territorial coverage, most OGD datasets are underutilized. Second, OGD portals pay varying attention to the quality of their datasets’ metadata. Third, we did not find clear evidence that datasets usage is positively correlated to better metadata publishing practices. Finally, we have considered other factors, such as datasets’ category, and some demographic characteristics of the OGD portals, and analyzed their relationship with datasets usage, obtaining partially affirmative answers.
The dataset consists of three zipped CSV files, containing the collected datasets' usage data, full metadata, and computed quality values, for about 400,000 datasets belonging to the 8 national, 4 international, and 16 US municipalities OGD portals considered in the study.
Data collection occurred in the period: 2019-12-19 -- 2019-12-23.
________________________________________
Portal #Datasets Platform
________________________________________
US 261,514 CKAN
France 39,412 Other
Colombia 9,795 Socrata
IE 9,598 CKAN
Slovenia 4,892 CKAN
Poland 1,032 Other
Latvia 336 CKAN
Puerto Rico 178 Socrata
New York, NY 2,771 Socrata
Baltimore, MD 2,617 Socrata
Austin, TX 2,353 Socrata
Chicago, IL 1,368 Socrata
San Francisco, CA 1,001 Socrata
Dallas, TX 1,001 Socrata
Los Angeles, CA 943 Socrata
Seattle, WA 718 Socrata
Providence, RI 288 Socrata
Honolulu, HI 244 Socrata
New Orleans, LA 215 Socrata
Buffalo, NY 213 Socrata
Nashville, TN 172 Socrata
Boston, MA 170 CKAN
Albuquerque, NM 60 CKAN
Albany, NY 50 Socrata
HDX 17,325 CKAN
EUODP 14,058 CKAN
NASA 9,664 Socrata
World Bank Finances 2,177 Socrata
________________________________________
The three datasets share the same table structure:
Table Fields
- portalid: portal identifier
- id: dataset identifier
- engine: identifier of the supporting portal platform: 1(CKAN), 2 (Socrata)
- admindomain: 1 (National), 2 (US), 3 (International)
- downloaddate: date of data collection
- views: number of total views for the dataset
- downloads: number of total downloads for the dataset
- overallq: overall quality values computed by applying the methodology presented by Neumaier et al. in [1]
- qvalues: json object containing the quality values computed for the 17 metrics presented in by Neumaier et al. [1]
- assessdate: date of quality assessment
- metadata: the overall dataset's metadata downloaded via API from the portal according to the supporting platform schema
[1] Neumaier, S.; Umbrich, J.; Polleres, A. Automated Quality Assessment of Metadata Across Open Data Portals.J. Data and Information Quality2016,8, 2:1–2:29. doi:10.1145/2964909