Delle Donne, Roberto;
Salvadó Estivill, Ignasi;
González Ugarte, José Luis;
Grijp, Nicolien van der
Beukering, Pieter van de
Besselaar, Peter van den;
Work package leader(s)
This dataset contains information on what papers and concepts researchers find relevant to map domain specific research output to the 17 Sustainable Development Goals (SDGs).
Sustainable Development Goals are the 17 global challenges set by the United Nations. Within each of the goals specific targets and indicators are mentioned to monitor the progress of reaching those goals by 2030. In an effort to capture how research is contributing to move the needle on those challenges, we earlier have made an initial classification model than enables to quickly identify what research output is related to what SDG. (This Aurora SDG dashboard is the initial outcome as proof of practice.)
In order to validate our current classification model (on soundness/precision and completeness/recall), and receive input for improvement, a survey has been conducted to capture expert knowledge from senior researchers in their research domain related to the SDG. The survey was open to the world, but mainly distributed to researchers from the Aurora Universities Network. The survey was open from October 2019 till January 2020, and captured data from 244 respondents in Europe and North America.
17 surveys were created from a single template, where the content was made specific for each SDG. Content, like a random set of publications, of each survey was ingested by a data provisioning server. That collected research output metadata for each SDG in an earlier stage. It took on average 1 hour for a respondent to complete the survey. The outcome of the survey data can be used for validating current and optimizing future SDG classification models for mapping research output to the SDGs.
The survey contains the following questions (see inside dataset for exact wording):
Are you familiar with this SDG?
Respondents could only proceed if they were familiar with the targets and indicators of this SDG. Goal of this question was to weed out un knowledgeable respondents and to increase the quality of the survey data.
Suggest research papers that are relevant for this SDG (upload list)
This question, to provide a list, was put first to reduce influenced by the other questions. Goal of this question was to measure the completeness/recall of the papers in the result set of our current classification model. (To lower the bar, these lists could be provided by either uploading a file from a reference manager (preferred) in .ris of bibtex format, or by a list of titles. This heterogenous input was processed further on by hand into a uniform format.)
Select research papers that are relevant for this SDG (radio buttons: accept, reject)
A randomly selected set of 100 papers was injected in the survey, out of the full list of thousands of papers in the result set of our current classification model. Goal of this question was to measure the soundness/precision of our current classification model.
Select and Suggest Keywords related to SDG (checkboxes: accept | text field: suggestions)
The survey was injected with the top 100 most frequent keywords that appeared in the metadata of the papers in the result set of the current classification model. respondents could select relevant keywords we found, and add ones in a blank text field. Goal of this question was to get suggestions for keywords we can use to increase the recall of relevant papers in a new classification model.
Suggest SDG related glossaries with relevant keywords (text fields: url)
Open text field to add URL to lists with hundreds of relevant keywords related to this SDG. Goal of this question was to get suggestions for keywords we can use to increase the recall of relevant papers in a new classification model.
Select and Suggest Journals fully related to SDG (checkboxes: accept | text field: suggestions)
The survey was injected with the top 100 most frequent journals that appeared in the metadata of the papers in the result set of the current classification model. Respondents could select relevant journals we found, and add ones in a blank text field. Goal of this question was to get suggestions for complete journals we can use to increase the recall of relevant papers in a new classification model.
Suggest improvements for the current queries (text field: suggestions per target)
We showed respondents the queries we used in our current classification model next to each of the targets within the goal. Open text fields were presented to change, add, re-order, delete something (keywords, boolean operators, etc. ) in the query to improve it in their opinion. Goal of this question was to get suggestions we can use to increase the recall and precision of relevant papers in a new classification model.
In the dataset root you'll find the following folders and files:
This contains the survey questions for all the individual SDGs. It also contains lists of EIDs categorised to the SDGs we used to make randomized selections from to present to the respondents.
This contains the raw survey output. (Excluding privacy sensitive information for public release.) This data needs to be combined with the data on the provisioning server to make sense.
This data is where individual responses are aggregated. Also the survey data is combined with the provisioning server, of all sdg surveys combined, responses are aggregated, and split per question type.
This contains scripts to split data, and to add descriptive metadata for text analysis in a later stage.
This is the main final result that can be used for further analysis. Data is split by SDG into subdirectories, in there you'll find files per question type containing the aggregated data of the respondents.
images of the results used in this README.md.
terms and conditions for reusing this data.
description of the dataset; each subfolders contains a README.md file to futher describe the content of each sub-folder.
In the /04-processed-data/ you'll find in each SDG sub-folder the following files.:
This file contains the survey questions
This file contains the survey questions
Basic information about the survey and responses
Origin of the respondents per SDG survey
Formatted list of research papers researchers have uploaded or listed they want to see back in the result-set for this SDG.
same as above, only matched with an EID. EIDs are matched my Elsevier's internal fuzzy matching algorithm. Only papers with high confidence are show with a match of an EID, referring to a record in Scopus.
Based on our previous result set of papers, researchers were presented random samples, they selected papers they believe represent this SDG. (TRUE=accepted)
Based on our previous result set of papers, researchers were presented random samples, they selected papers they believe not to represent this SDG. (FALSE=rejected)
Based on our previous result set of papers, we presented researchers the keywords that are in the metadata of those papers, they selected keywords they believe represent this SDG.
As "selected-keywords", this is the list of keywords that respondents have not selected to represent this SDG.
List of keywords researchers suggest to use to find papers related to this SDG
List of glossaries, containing keywords, researchers suggest to use to find papers related to this SDG
Based on our previous result set of papers, we presented researchers the journals that are in the metadata of those papers, they selected journals they believe represent this SDG.
As "selected-journals", this is the list of journals that respondents have not selected to represent this SDG.
List of journals researchers suggest to use to find papers related to this SDG
List of query improvements researchers suggest to use to find papers related to this SDG
Survey data of "Mapping Research output to the SDGs" by Aurora Universities Network (AUR); Alessandro Arienzo (UNA); Roberto Delle Donne (UNA); Ignasi Salvadó Estivill (URV); José Luis González Ugarte (URV); Didier Vercueil (UGA); Nykohla Strong (UAB); Eike Spielberg (UDE); Felix Schmidt (UDE); Linda Hasse (UDE); Ane Sesma (UEA); Baldvin Zarioh (UIC); Friedrich Gaigg (UIN); René Otten (VUA); Nicolien van der Grijp (VUA); Yasin Gunes (VUA); Peter van den Besselaar (VUA); Joeri Both (VUA); Maurice Vanderfeesten (VUA); is licensed under a Creative Commons Attribution 4.0 International License.https://aurora-network.global/project/sdg-analysis-bibliometrics-relevance/
version 1.0.1 contains minor changes in the README files to match the description in the data repository.