Published March 5, 2021 | Version 1.0
Dataset Open

Dataset relating to the study "Open government data: usage trends and metadata quality"

  • 1. Institute for Applied Mathematics and Information Technologies - National Research Council, Italy

Contributors

Contact person:

  • 1. Institute for Applied Mathematics and Information Technologies, National Research Council, Italy

Description

Open Government Data (OGD) has the potential to support social and economic progress. However, this potential can be frustrated if this data remains unused. Although the literature suggests that OGD datasets' metadata quality is one of the main factors affecting their use, to the best of our knowledge, no quantitative study provided evidence of this relationship. Considering about 400,000 datasets of 28 national, municipal, and international OGD portals, we have programmatically analyzed their usage, their metadata quality, and the relationship between the two. Our analysis has highlighted three main findings. First of all, regardless of their size, the software platform adopted, and their administrative and territorial coverage, most OGD datasets are underutilized. Second, OGD portals pay varying attention to the quality of their datasets’ metadata. Third, we did not find clear evidence that datasets usage is positively correlated to better metadata publishing practices. Finally, we have considered other factors, such as datasets’ category, and some demographic characteristics of the OGD portals, and analyzed their relationship with datasets usage, obtaining partially affirmative answers.

The dataset consists of three zipped CSV files, containing the collected datasets' usage data, full metadata, and computed quality values, for about 400,000 datasets belonging to the 8 national, 4 international, and 16 US municipalities OGD portals considered in the study.

Data collection occurred in the period:   2019-12-19  --   2019-12-23.

________________________________________

Portal                                #Datasets   Platform     

________________________________________ 

US                                      261,514        CKAN

France                                  39,412        Other

Colombia                                9,795        Socrata

IE                                            9,598       CKAN

Slovenia                                 4,892        CKAN

Poland                                    1,032        Other

Latvia                                        336        CKAN

Puerto Rico                               178        Socrata

 

New York, NY                         2,771      Socrata

Baltimore, MD                        2,617       Socrata

Austin, TX                              2,353       Socrata

Chicago, IL                            1,368        Socrata

San Francisco, CA                1,001        Socrata

Dallas, TX                             1,001        Socrata

Los Angeles, CA                     943         Socrata

Seattle, WA                             718         Socrata

Providence, RI                        288         Socrata

Honolulu, HI                            244         Socrata

New Orleans, LA                     215         Socrata

Buffalo, NY                              213         Socrata

Nashville, TN                          172          Socrata

Boston, MA                             170          CKAN

Albuquerque, NM                     60          CKAN

Albany, NY                               50           Socrata

 

HDX                                  17,325           CKAN

EUODP                             14,058           CKAN

NASA                                  9,664           Socrata

World Bank Finances         2,177           Socrata

________________________________________

 

The three datasets share the same table structure:

Table Fields

  • portalid: portal identifier
  • id: dataset identifier
  • engine: identifier of the supporting portal platform: 1(CKAN), 2 (Socrata)
  • admindomain: 1 (National), 2 (US), 3 (International)
  • downloaddate: date of data collection
  • views: number of total views for the dataset
  • downloads: number of total downloads for the dataset 
  • overallq: overall quality values computed by applying the methodology presented by Neumaier et al. in [1]
  • qvalues:  json object containing the quality values computed for the 17 metrics presented in by Neumaier et al. [1]
  • assessdate: date of quality assessment
  • metadata: the overall dataset's metadata downloaded via API from the portal according to the supporting platform schema

[1] Neumaier, S.; Umbrich, J.; Polleres, A. Automated Quality Assessment of Metadata Across Open Data Portals.J. Data and Information Quality2016,8, 2:1–2:29. doi:10.1145/2964909

 

Notes

the dataset is created to support the analysis presented in: Quarati, Alfonso; "Open government data: usage trends and metadata quality", Journal of Information Science, 2021, DOI:10.1177/01655515211027775

Files

International-datasets.csv

Files (4.2 GB)

Name Size Download all
md5:35a768ed4d28c86acd3b2c7e19384f09
406.8 MB Preview Download
md5:a4cad32b9429c127c104fdd524494da3
3.8 GB Preview Download
md5:81e330ff16f8618e67ff062dbdcd5166
53.4 MB Preview Download