There is a newer version of the record available.

Published November 2, 2022 | Version 8.0
Software Open

Aurora SDG Research Dashboard - including Build-Your-Own guide

  • 1. Vrije Universiteit Amsterdam

Description

[go to latest version]

This data set contains a guide how to build your own SDG Research Dashboard. It contains all the files required in a zip file.

Software

PowerBi (dashboarding)

We use Microsoft PowerBi Desktop to create the dashboard. This can be downloaded, installed and used for free. https://powerbi.microsoft.com/en-us/downloads/

First thing you need to do is to download and install PowerBi.

Publishing the PowerBi Dashboard for public use online, we use a paid license that is part of the the campus Office365 agreement. https://learn.microsoft.com/en-us/power-bi/collaborate-share/service-publish-to-web

Knime (data processing)

We use Knime Data Analytics platform for wrangling the data without coding, to get it from the input to the output data formats that can be used in the dashboard. More about that data processing in section “02 temp”. Knime is open source and can be downloaded and installed for free. https://www.knime.com/downloads/download-knime

 

Dashboard File

Aurora-SDG-Research-Dashboard-v8.pbix

 

Data files

/01 input/

All universities have been asked to deliver two files. 1. A file with all their publications, and 2. Optionally a file with their authors and internal organisational structure.

publication table 

Format: CSV (comma separated!), File name convention: [University Name]-publications.csv

requires the following columns:

  • Title     
  • Abstract (required for text classifier. If not available, title will be used, but has less good results.)
  • Publication type (book / article / conference proceeding / etc)
  • DOI (required to add societal impact metrics data; open access status, policy and patent citations, etc.)
  • ISBN (required to add societal impact metrics data; open access status, policy and patent citations, etc.)
  • other identifier (local l/ internal / system identifier. fall back to uniquely identify the record)     
  • Publication year (YYYY)
  • author names (semi-colon separated ; ) 
  • ORCID authors identifiers (semi-colon separated ; )
  • Local Author identifiers (local/ internal / system identifier) (semi-colon separated ; )

 

authors table 

optional: If you want the SDG dashboard to be able to "zoom in" to the level of the faculties and departments of your university, we require an additional table.

Format: CSV (comma separated!), File name convention: [University Name]-authors.csv

requires the following columns:

  • Author full name
  • Author first name
  • Author last name
  • Local author identifier (use same as in publications table; for linking)
  • ORCiD author identifier (if available)
  • ISNI author identifier (if available)
  • University name (org level 0)
  • Faculty name (org level 1)
  • Department name (org level 2)
  • Other Organisational Unit name (org level 3)

 

/02 temp/

From input to enrichment to  output.

When we have all the separate files, now we need to put them all together.

Then we need to enrich the publications with external data. The following external sources have been used.

 

Some columns contain strings with multiple values, like the author column, this we need to split is separate files, while keeping the relation with the publications using the row-index-id.

The main file contains 800  rows and over 40 columns, and is 1.1 GB uncompressed data. To increase performance on transferring and loading the data in the dashboard we used the Open standards Parquet format, with is able to compress the data to 300MB. The other split tables are in XSLX format, which is in fact a ZIP compressed XML file.

This whole process has been described in the appendix. The Knime file containing the data processing components is:

Aurora Data Processing – SDGResearch Dashboard v8.knwf

 

/03 output/

These are the files you need to replace. Below you’ll find the description of the file and the columns with the keys that are used to connect to the other tables.

 

aurora_all.parquet

Description: This is the main file all others connect to. It contains the details per publication, including source university, type, year, metrics data, categories, etc.

Keys: row-index-id, doi, university

 

aurora-parts-author-names.xlsx

Description: This is a split list of all author names, containing a single author name per row, and the row-index-id connecting it back to the publication in the main table.

Keys: row-index-id

 

aurora-parts-author-ids.xlsx

Description: This is a split list of all local author ids, containing a single author id per row, salted with the university name to make it unique, and the row-index-id connecting it back to the publication in the main table.

Keys: row-index-id, author-id

 

aurora-author-org-structure.xlsx

Description: This is a list of all authors, containing a single author per row, including the three organisational sublevels this author is affiliated with in the internal organisation. The local author-id is salted with the university name to make it unique, to connect it back to the author list.

Keys: author-id

 

aurora-parts-scival-labels-researchareas.xlsx

Description: This is a split list of the multiple value field containing the All Science Journal Classification names (ASJC) as we have reseived from Scival per publication. It is containing a single ASJC name per row and the row-index-id connecting it back to the publication in the main table.

Keys: row-index-id

 

aurora-publications-scival-institutions.xlsx

Description: This is a split list of the multiple value field containing the affiliated research institutes as we have reseived from Scival per publication. It is containing a single Institution name per row and the row-index-id connecting it back to the publication in the main table.

Keys: row-index-id

 

aurora-publications-scival-countries.xlsx

Description: This is a split list of the multiple value field containing the affiliated countries of the research institutes as we have reseived from Scival per publication. It is containing a single Country name per row and the row-index-id connecting it back to the publication in the main table.

Keys: row-index-id

 

aurora-publications-sdg-labels.xlsx

Description: This is a split list of the multiple value field containing SDG Goal labels (SDGshort) based on the SDG probabilities. It is containing two sheets, SDG95 and SDG98. SDG95 contains only an SDG label of a publication where the related to SDG probability is between 95%  and 100%. SDG98 contains only an SDG label of a publication where the related to SDG probability is between 98%  and 100%. In the dashboard we use only SDG98 publications. Each sheet is containing a single SDG Goal label (SDGshort) per row, and the row-index-id connecting it back to the publication in the main table.

Keys: row-index-id, SDGshort

 

aurora-altmetric-mentions.xlsx

Description: This is a table containing multiple mentions (news and patents) related to a publication. Each row is containing mention details (eg. Title, type, url, source, etc.), and the DOI connecting it back to the publication in the main table.

Keys: DOI

 

aurora_dois_cited_in_policy_aug_2022.xlsx

Description: This is a table containing multiple policy documents related to a publication. Each row is containing policy document (eg. title, type, url, source, etc.), and the DOI connecting it back to the publication in the main table.

Keys: DOI

 

/04 report/

The files in here are the report it self, and supporting data files.

 

Aurora-SDG-Research-Dashboard-v8.pbix

Description: This is the PowerBI report containing the SDG dashboard.

Keys: not relevant

 

20200820 SDGs - JSON Color Codes.json

Description: This is a PowerBI color template, containing the SDG color codes (HEX) in sequential order, from SDG01 to SDG17.

Keys: not relevant

 

list-countries-geographical-regions-UNSD-M49.xlsx

Description: This is the official UN country table containing, counties, categorised by region and continent, and developing state. The Country name is used to connecting it back to Scival table containing county affiliated research institute of the publications.

Keys: Country

 

list-sdgs.xlsx

Description: This is a table containing details about the SDG’s, like the image URL, name variants, full descriptions, etc. The SDGshort is connecting it back to the SDG labels table, that is connected to the publications in the main table.

Keys: SDGshort

 

list-universities.xlsx

(!) This also is the file you need to replace with details of your own universities, like name variations used in the main file, or in scival institution.

Description: This is a table containing details about the universities, like the image URLs, name variants used in different tables (aurora and Scival). University column is  connecting it back to the publications in the main table.

Keys: University

 

Data file Relationships

The illustration shows how the different tables are connected in a relational database diagram. All connections are bi-rirectional in order to filter publications based on the information in the other tables and vise versa.

 

Notes

Acknowledgements

Many thanks to all project members for delivering the publications from the Aurora universities


Funded by

Funded by European Commission, Project ID: 101004013, Call: EAC-A02-2019-1, Programme: EPLUS2020, DG/Agency: EACEA


Read more

[ Project website | Zenodo Community ]


Change log

2022-11-02 | v8.0 | 8th iteration of the dashboard

Files

Aurora SDG Research Dashboard - public-uncompressed.zip

Files (10.0 GB)

Name Size Download all
md5:50a92e5903f2cd0810d9ed072c7ab8a8
10.0 GB Preview Download
md5:442aa8f5c0352dd583e684884eec0d9e
1.2 MB Preview Download

Additional details

Related works

Requires
Dataset: 10.5281/zenodo.6644946 (DOI)