Published April 25, 2023 | Version v2
Dataset Open

A dataset from a survey investigating disciplinary differences in data citation

  • 1. Université de Montréal
  • 2. University of Ottawa
  • 3. ZBW Leibniz Information Center for Economics



Title of Dataset:  A dataset from a survey investigating disciplinary differences in data citation

Date of data collection: January to March 2022

Collection instrument: SurveyMonkey

Funding: Alfred P. Sloan Foundation


Licenses/restrictions placed on the data:  These data are available under a CC BY 4.0 license 

Links to publications that cite or use the data: 

Gregory, K., Ninkov, A., Ripp, C., Peters, I., & Haustein, S. (2022). Surveying practices of data citation and reuse across disciplines. Proceedings of the 26th International Conference on Science and Technology Indicators. International Conference on Science and Technology Indicators, Granada, Spain.

Gregory, K., Ninkov, A., Ripp, C., Roblin, E., Peters, I., & Haustein, S. (2023). Tracing data:
A survey investigating disciplinary differences in data citation.


File List

  • Filename: MDCDatacitationReuse2021Codebookv2.pdf
  • Filename: MDCDataCitationReuse2021surveydatav2.csv
    Dataset format in csv
  • Filename: MDCDataCitationReuse2021surveydatav2.sav
    Dataset format in SPSS
  • Filename: MDCDataCitationReuseSurvey2021QNR.pdf

Additional related data collected that was not included in the current data package: Open ended questions asked to respondents


Description of methods used for collection/generation of data: 

The development of the questionnaire (Gregory et al., 2022) was centered around the creation of two main branches of questions for the primary groups of interest in our study: researchers that reuse data (33 questions in total) and researchers that do not reuse data (16 questions in total). The population of interest for this survey consists of researchers from all disciplines and countries, sampled from the corresponding authors of papers indexed in the Web of Science (WoS) between 2016 and 2020. 

Received 3,632 responses, 2,509 of which were completed, representing a completion rate of 68.6%. Incomplete responses were excluded from the dataset. The final total contains 2,492 complete responses and an uncorrected response rate of 1.57%. Controlling for invalid emails, bounced emails and opt-outs (n=5,201) produced a response rate of 1.62%, similar to surveys using comparable recruitment methods (Gregory et al., 2020).

Methods for processing the data: 

Results were downloaded from SurveyMonkey in CSV format and were prepared for analysis using Excel and SPSS by recoding ordinal and multiple choice questions and by removing missing values.

Instrument- or software-specific information needed to interpret the data: 

The dataset is provided in SPSS format, which requires IBM SPSS Statistics. The dataset is also available in a coded format in CSV. The Codebook is required to interpret to values.

DATA-SPECIFIC INFORMATION FOR: MDCDataCitationReuse2021surveydata

Number of variables: 95

Number of cases/rows: 2,492

Missing data codes: 999        Not asked

Refer to MDCDatacitationReuse2021Codebook.pdf for detailed variable information.



Files (2.5 MB)

Name Size Download all
1.1 MB Preview Download
615.6 kB Preview Download
643.8 kB Download
167.6 kB Preview Download
5.5 kB Preview Download

Additional details

Related works

Is supplement to
Preprint: 10.5281/zenodo.7555266 (DOI)
Conference paper: 10.5281/ZENODO.6951437 (DOI)