Aurora SDG Research Dashboard - including Build-Your-Own guide

Vanderfeesten, Maurice

doi:10.5281/zenodo.7274401

Published November 2, 2022 | Version 8.0

Software Open

Aurora SDG Research Dashboard - including Build-Your-Own guide

Vanderfeesten, Maurice¹

1. Vrije Universiteit Amsterdam

[go to latest version]

This data set contains a guide how to build your own SDG Research Dashboard. It contains all the files required in a zip file.

Software

PowerBi (dashboarding)

We use Microsoft PowerBi Desktop to create the dashboard. This can be downloaded, installed and used for free. https://powerbi.microsoft.com/en-us/downloads/

First thing you need to do is to download and install PowerBi.

Publishing the PowerBi Dashboard for public use online, we use a paid license that is part of the the campus Office365 agreement. https://learn.microsoft.com/en-us/power-bi/collaborate-share/service-publish-to-web

Knime (data processing)

We use Knime Data Analytics platform for wrangling the data without coding, to get it from the input to the output data formats that can be used in the dashboard. More about that data processing in section “02 temp”. Knime is open source and can be downloaded and installed for free. https://www.knime.com/downloads/download-knime

Dashboard File

Aurora-SDG-Research-Dashboard-v8.pbix

Data files

/01 input/

All universities have been asked to deliver two files. 1. A file with all their publications, and 2. Optionally a file with their authors and internal organisational structure.

publication table

Format: CSV (comma separated!), File name convention: [University Name]-publications.csv

requires the following columns:

Title
Abstract (required for text classifier. If not available, title will be used, but has less good results.)
Publication type (book / article / conference proceeding / etc)
DOI (required to add societal impact metrics data; open access status, policy and patent citations, etc.)
ISBN (required to add societal impact metrics data; open access status, policy and patent citations, etc.)
other identifier (local l/ internal / system identifier. fall back to uniquely identify the record)
Publication year (YYYY)
author names (semi-colon separated ; )
ORCID authors identifiers (semi-colon separated ; )
Local Author identifiers (local/ internal / system identifier) (semi-colon separated ; )

authors table

optional: If you want the SDG dashboard to be able to "zoom in" to the level of the faculties and departments of your university, we require an additional table.

Format: CSV (comma separated!), File name convention: [University Name]-authors.csv

requires the following columns:

Author full name
Author first name
Author last name
Local author identifier (use same as in publications table; for linking)
ORCiD author identifier (if available)
ISNI author identifier (if available)
University name (org level 0)
Faculty name (org level 1)
Department name (org level 2)
Other Organisational Unit name (org level 3)

/02 temp/

From input to enrichment to output.

When we have all the separate files, now we need to put them all together.

Then we need to enrich the publications with external data. The following external sources have been used.

SDG classfication probabilities.
- Service: Aurora SDG multi-lingual classification AI
  - https://aurora-sdg-classifier.uni-due.de
  - https://aurora-sdg-classifier.uni-due.de/classifier/classify
- Input: abstract of the publications (title is used if abstract was not available)
- Output: probabilities of classification per SDG
Citation metrics
- Service: www.Scival.com (based on Scopus data from Elsevier)
- Input: DOI’s of the publications. (split and imported in batches)
- Output: Field Weighted Citation Impact scoresand Top percentiles, Topics, Topic Clusters, All Science Journal Classification (ASJC), Affiliated Institutions and Countries. (exported and appended from batches )
Open Access metrics
- Service: www.unpaywall.org (from OurResearch)
- Input: DOI’s of the publications
- Code: https://github.com/Aurora-Network-Global/data-transform-scripts
- Output: Open Access colors, status, url
Policy mentions (separate table; due to one-to-many relations)
- Service: www.overton.io (from OpenPolicy Ltd.)
- Input: DOI’s of the publications.
- Output: detailed information of policy documents mentioning publications.
- Remark: This file we got in bulk from Overton, and did not process this way, but inserted it in the dashboard directly.
News and Patents (separate table; due to one-to-many relations)
- Service. www.altmetric.com (from DigitalScience)
- Input: DOI’s of the publications.
  Code: https://github.com/Aurora-Network-Global/data-transform-scripts
- Output: basic information (type, source, title, url) of news and patent mention publications.
Scite
- Service: www.scite.ai
- Input: DOI’s of the publications.
- Code: https://github.com/Aurora-Network-Global/data-transform-scripts
- Output: citation sentiment : metioned, contradicted, supported.

Some columns contain strings with multiple values, like the author column, this we need to split is separate files, while keeping the relation with the publications using the row-index-id.

The main file contains 800 rows and over 40 columns, and is 1.1 GB uncompressed data. To increase performance on transferring and loading the data in the dashboard we used the Open standards Parquet format, with is able to compress the data to 300MB. The other split tables are in XSLX format, which is in fact a ZIP compressed XML file.

This whole process has been described in the appendix. The Knime file containing the data processing components is:

Aurora Data Processing – SDGResearch Dashboard v8.knwf

/03 output/

These are the files you need to replace. Below you’ll find the description of the file and the columns with the keys that are used to connect to the other tables.

aurora_all.parquet

Description: This is the main file all others connect to. It contains the details per publication, including source university, type, year, metrics data, categories, etc.

Keys: row-index-id, doi, university

aurora-parts-author-names.xlsx

Description: This is a split list of all author names, containing a single author name per row, and the row-index-id connecting it back to the publication in the main table.

Keys: row-index-id

aurora-parts-author-ids.xlsx

Description: This is a split list of all local author ids, containing a single author id per row, salted with the university name to make it unique, and the row-index-id connecting it back to the publication in the main table.

Keys: row-index-id, author-id

aurora-author-org-structure.xlsx

Description: This is a list of all authors, containing a single author per row, including the three organisational sublevels this author is affiliated with in the internal organisation. The local author-id is salted with the university name to make it unique, to connect it back to the author list.

Keys: author-id

aurora-parts-scival-labels-researchareas.xlsx

Description: This is a split list of the multiple value field containing the All Science Journal Classification names (ASJC) as we have reseived from Scival per publication. It is containing a single ASJC name per row and the row-index-id connecting it back to the publication in the main table.

Keys: row-index-id

aurora-publications-scival-institutions.xlsx

Description: This is a split list of the multiple value field containing the affiliated research institutes as we have reseived from Scival per publication. It is containing a single Institution name per row and the row-index-id connecting it back to the publication in the main table.

Keys: row-index-id

aurora-publications-scival-countries.xlsx

Description: This is a split list of the multiple value field containing the affiliated countries of the research institutes as we have reseived from Scival per publication. It is containing a single Country name per row and the row-index-id connecting it back to the publication in the main table.

Keys: row-index-id

aurora-publications-sdg-labels.xlsx

Description: This is a split list of the multiple value field containing SDG Goal labels (SDGshort) based on the SDG probabilities. It is containing two sheets, SDG95 and SDG98. SDG95 contains only an SDG label of a publication where the related to SDG probability is between 95% and 100%. SDG98 contains only an SDG label of a publication where the related to SDG probability is between 98% and 100%. In the dashboard we use only SDG98 publications. Each sheet is containing a single SDG Goal label (SDGshort) per row, and the row-index-id connecting it back to the publication in the main table.

Keys: row-index-id, SDGshort

aurora-altmetric-mentions.xlsx

Description: This is a table containing multiple mentions (news and patents) related to a publication. Each row is containing mention details (eg. Title, type, url, source, etc.), and the DOI connecting it back to the publication in the main table.

Keys: DOI

aurora_dois_cited_in_policy_aug_2022.xlsx

Description: This is a table containing multiple policy documents related to a publication. Each row is containing policy document (eg. title, type, url, source, etc.), and the DOI connecting it back to the publication in the main table.

Keys: DOI

/04 report/

The files in here are the report it self, and supporting data files.

Aurora-SDG-Research-Dashboard-v8.pbix

Description: This is the PowerBI report containing the SDG dashboard.

Keys: not relevant

20200820 SDGs - JSON Color Codes.json

Description: This is a PowerBI color template, containing the SDG color codes (HEX) in sequential order, from SDG01 to SDG17.

Keys: not relevant

list-countries-geographical-regions-UNSD-M49.xlsx

Description: This is the official UN country table containing, counties, categorised by region and continent, and developing state. The Country name is used to connecting it back to Scival table containing county affiliated research institute of the publications.

Keys: Country

list-sdgs.xlsx

Description: This is a table containing details about the SDG’s, like the image URL, name variants, full descriptions, etc. The SDGshort is connecting it back to the SDG labels table, that is connected to the publications in the main table.

Keys: SDGshort

list-universities.xlsx

(!) This also is the file you need to replace with details of your own universities, like name variations used in the main file, or in scival institution.

Description: This is a table containing details about the universities, like the image URLs, name variants used in different tables (aurora and Scival). University column is connecting it back to the publications in the main table.

Keys: University

Data file Relationships

The illustration shows how the different tables are connected in a relational database diagram. All connections are bi-rirectional in order to filter publications based on the information in the other tables and vise versa.

Notes

Acknowledgements

Many thanks to all project members for delivering the publications from the Aurora universities

Funded by

Funded by European Commission, Project ID: 101004013, Call: EAC-A02-2019-1, Programme: EPLUS2020, DG/Agency: EACEA

Read more

[ Project website | Zenodo Community ]

Change log

2022-11-02 | v8.0 | 8th iteration of the dashboard

Files

Aurora SDG Research Dashboard - public-uncompressed.zip

Files (10.0 GB)

Name	Size	Download all
Aurora SDG Research Dashboard - public-uncompressed.zip md5:50a92e5903f2cd0810d9ed072c7ab8a8	10.0 GB	Preview Download
Build you own SDG dashboard.pdf md5:442aa8f5c0352dd583e684884eec0d9e	1.2 MB	Preview Download

Additional details

Requires: Dataset: 10.5281/zenodo.6644946 (DOI)

Citations

Oops! Something went wrong while fetching results.

	All versions	This version
Views	294	21
Downloads	257	22
Data volume	432.1 GB	79.7 GB

Aurora SDG Research Dashboard - including Build-Your-Own guide

Creators

Description

Notes

Files

Aurora SDG Research Dashboard - public-uncompressed.zip

Files (10.0 GB)

Additional details

Related works