Text Analyses of Survey Data on "Mapping Research Output to the Sustainable Development Goals (SDGs)"
Besselaar, Peter van den
This package contains data on five text analysis types (term extraction, contract analysis, topic modeling, network mapping), based on the survey data where researchers selected research output that are related to the 17 Sustainable Development Goals (SDGs). This is used as input to improve the current SDG classification model v4.0 to v5.0
Sustainable Development Goals are the 17 global challenges set by the United Nations. Within each of the goals specific targets and indicators are mentioned to monitor the progress of reaching those goals by 2030. In an effort to capture how research is contributing to move the needle on those challenges, we earlier have made an initial classification model than enables to quickly identify what research output is related to what SDG. (This Aurora SDG dashboard is the initial outcome as proof of practice.)
The initiative started from the Aurora Universities Network in 2017, in the working group "Societal Impact and Relevance of Research", to investigate and to make visible 1. what research is done that are relevant to topics or challenges that live in society (for the proof of practice this has been scoped down to the SDGs), and 2. what the effect or impact is of implementing those research outcomes to those societal challenges (this also have been scoped down to research output being cited in policy documents from national and local governments an NGO's).
Context of this dataset | classification model improvement workflow
The classification model we have used are 17 different search queries on the Scopus database.
SDG search queries version 4.0 (SQv4) have been created, Published here:
Term Extraction: after text normalisation (stemming, etc) we extracted 2 terms in bigrams and trigrams that co-occurred the most per document, in the title, abstract and keyword
Contrast analysis: the co-occurring terms in publications (title, abstract, keywords), of the papers that respondents have indicated relate to this SDG (y-axis: True), and that have been rejected (x-axis: False). In the top left you'll see term co-occurrences that a clearly relate to this SDG. The bottom-right are terms that are appear in papers that have been rejected for this SDG. The top-right terms appear frequently in both and cannot be used to discriminate between the two groups.
Network map: This diagram shows the cluster-network of terms co-occurring in the publications related to this SDG, selected by the respondents (accepted publications only).
Topic model: This diagram shows the topics, and the related terms that make up that topic. The number of topics is related to the number of of targets of this SDG.
Contingency matrix: This diagram shows the top 10 of co-occurring terms that correlate the most.
Software used to do the text analyses
CorTexT: The CorTexT Platform is the digital platform of LISIS Unit and a project launched and sustained by IFRIS and INRAE. This platform aims at empowering open research and studies in humanities about the dynamic of science, technology, innovation and knowledge production.