﻿*** Dataset: High_Value_Datasets_SLR_2023 ***
Authors: Anastasija Nikiforova, Nina Rizun, Magdalena Ciesielska, Charalampos Alexopoulos, Andrea Miletič
University of Tartu, Faculty of Science and Technology, Institute of Computer Science, Chair of Software Engineering

Corresponding author: Anastasija Nikiforova
Contact Information: nikiforova.anastasija@ut.ee

**General Introduction***
This dataset contains data collected during a study ("Towards High-Value Datasets determination for data-driven development: a systematic literature review") 
conducted by Anastasija Nikiforova (University of Tartu).
It being made public both to act as supplementary data for "Towards High-Value Datasets determination for data-driven development: a systematic literature review" paper and in order for other researchers to use these data in their own work. 

***Purpose of the protocol***
The protocol is intended for the Systematic Literature review on the topic of High-value Datasets with the aim to gather information on how the topic of High-value datasets (HVD) and their determination has been reflected in the literature over the years and what has been found by these studies to date, incl. the indicators used in them, involved stakeholders, data-related aspects, and frameworks. 
This is done by conducting a Systematic Literature Review (SLR). 
To achieve the research objective, the following research questions (RQ) were established: 
•	(RQ1) how is the value of the open government data perceived / defined? In which contexts has the topic of HVD been investigated by previous research (e.g., research disciplines, countries)? Are local efforts being made at the country levels to identify the datasets that provide the most value to stakeholders of the local open data ecosystem? 
	(RQ1.1) how the high-value data are defined, if this definition differs from the definition introduced in the PSI /OD Directive, 
	(RQ1.2) what datasets are considered to be of higher value in terms of data nature, data type, data format, data dynamism? 
•	(RQ2) What indicators are used to determine high-value datasets? How can these indicators be classified? Can they be measured? And whether this can be done (semi-)automatically? 
•	(RQ3) Whether there is a framework for determining country specific HVD? In other words, is it possible to determine what datasets are of particular value and interest for their further reuse and value creation, taking into account the specificities of the country under consideration, e.g., culture, geography, ethnicity, likeli-hood of crises and/or catastrophes.

The data in this dataset were collected in the result of the SLR over Scopus, Web of Science, and Digital Government Research library (DGRL) in 2023.
These databases were queried for keywords ("open data" OR "open government data") AND ("high-value data*" OR "high value data*"), which were applied to the article title, keywords, and abstract to limit the number of papers to those, where these objects were primary research objects rather than mentioned in the body, e.g., as a future work. Only articles in English were considered, while in terms of scope, both journal articles, conference papers, and chapters were studied. 
As a result, a total of 9 articles were further examined in accordance with the developed protocol (see Protocol_HVD_SLR.docx). Each study was independently examined by at least two authors.

***Test procedure***
Each study was independently examined by at least two authors, where after the in-depth examination of the full-text of the article, the structured protocol has been filled for each study. 
The structure of the survey is available in the supplementary file available (see Protocol_HVD_SLR.odt, Protocol_HVD_SLR.docx) 
The data collected for each study by two researchers were then synthesized in one final version by the third researcher.

***Description of the data in this data set*** 
Spreadsheets #1 provides the filled protocol for relevant studies. 
Spreadsheet#2 provides the list of results after the seacrh over three indexing databases, i.e. before filtering out irrelevant studies

The information on each selected study was collected in four categories: 
(1) descriptive information, 
(2) approach- and research design- related information, 
(3) quality-related information, 
(4) HVD determination-related information 

Descriptive information	
1) Article number - a study number, corresponding to the study number assigned in an Excel worksheet 
2) Complete reference - the complete source information to refer to the study
3) Year of publication - the year in which the study was published
4) Journal article / conference paper / book chapter - the type of the paper -{journal article, conference paper, book chapter}
5) DOI / Website- a link to the website where the study can be found
6) Number of citations - the number of citations of the article in Google Scholar, Scopus, Web of Science 
7) Availability in OA - availability of an article in the Open Access
8) Keywords - keywords of the paper as indicated by the authors 
9) Relevance for this study - what is the relevance level of the article for this study? {high / medium / low}

Approach- and research design-related information
10) Objective / RQ - the research objective / aim, established research questions
11) Research method (including unit of analysis) - the methods used to collect data, including the unit of analy-sis (country, organisation, specific unit that has been ana-lysed, e.g., the number of use-cases, scope of the SLR etc.)
12) Contributions - the contributions of the study
13) Method - whether the study uses a qualitative, quantitative, or mixed methods approach?
14) Availability of the underlying research data- whether there is a reference to the publicly available underly-ing research data e.g., transcriptions of interviews, collected data, or explanation why these data are not shared?
15) Period under investigation - period (or moment) in which the study was conducted
16) Use of theory / theoretical concepts / approaches - does the study mention any theory / theoretical concepts / approaches? If any theory is mentioned, how is theory used in the study? 

Quality- and relevance- related information	
17) Quality concerns - whether there are any quality concerns (e.g., limited infor-mation about the research methods used)?
18) Primary research object - is the HVD a primary research object in the study? (primary - the paper is focused around the HVD determination, sec-ondary - mentioned but not studied (e.g., as part of discus-sion, future work etc.))

HVD determination-related information	
19) HVD definition and type of value - how is the HVD defined in the article and / or any other equivalent term? 
20) HVD indicators - what are the indicators to identify HVD? How were they identified? (components & relationships, “input -> output")
21) A framework for HVD determination - is there a framework presented for HVD identification? What components does it consist of and what are the rela-tionships between these components? (detailed description)
22) Stakeholders and their roles - what stakeholders or actors does HVD determination in-volve? What are their roles?
23) Data - what data do HVD cover? 
24) Level (if relevant) - what is the level of the HVD determination covered in the article? (e.g., city, regional, national, international)


***Format of the file***
.xls, .csv (for the second spreadsheet only), .odt, .docx

***Licenses or restrictions***
CC-BY