Zenodo.org will be unavailable for 2 hours on September 29th from 06:00-08:00 UTC. See announcement.

Poster Open Access

Mapping STI ecosystems via Open Data: overcoming the limitations of conflicting taxonomies. A case study for Climate Change Research in Denmark

Nicandro Bovenzi; Nicolau Duran-Silva; Francesco Alessandro Massucci; Francesco Multari; César Parra-Rojas; Josep Pujol-Llatse

This is a poster presented at the International Conference on Theory and Practice of Digital Libraries (TPDL) 2022

The paper as it appears in the conference proceedings is available in pre-print format at this link, while a longer version, which delves more into the methodological details, is available in pre-print format at this link.


To inform their decisions, policy-makers in the Science, Technology and Innovation (STI) sector typically need “maps'', to understand what are the relevant research domains and key actors within their territorial or institutional boundaries of interest. Generally, those maps need to enable effective policy-actions, so that they should generally be comprehensive to extensively cover i. the whole STI value chain (from basic research up to industrial innovation), ii. the different scientific domains of relevance and iii. all possible pertinent actors. As such, these maps should rely on different data sources that could offer the broadest possible view of STI inputs and outputs. 

Some major challenges faced at a policy level arise because many of those data sources are not openly available (undermining therefore possible participatory processes), they are not interoperable in terms of data classification schemes and institutional identification (therefore limiting transversal analyses) and they are hardly manageable by non-expert users.

In this paper, we present a proof of concept of an hypothetical analytical work to support STI policy-making which only makes use of open data to overcome the above challenges. To do so, we merge different open datasets and we analyse them with a common classification scheme.

After gathering the records from their respective data sources, we use open knowledge-bases and text mining to:

  • Identify STI documents linked with the Sustainable Development Goal (SDG) 13 - Climate Action,

  • Categorise documents within the 25 panels of the European Research Council (ERC)

  • Automatically identify thematic clusters by topic modelling

In this way, we aim at showcasing how research in emerging fields (such as the SDGs) can be gathered from open data sources and identified by means of modern, openly available AI models. Finally, we demonstrate how gaps in taxonomic classifications across datasets may be filled by means of Deep Learning textual classifiers, by using the ERC panels as a paradigmatic example.

All versions This version
Views 126126
Downloads 8888
Data volume 80.6 MB80.6 MB
Unique views 112112
Unique downloads 8080


Cite as