Aurora SDG Research Dashboard - including Build-Your-Own guide
Description
This data set contains a guide how to build your own SDG Research Dashboard. It contains all the files required in a zip file.
Software
PowerBi (dashboarding)
We use Microsoft PowerBi Desktop to create the dashboard. This can be downloaded, installed and used for free. https://powerbi.microsoft.com/en-us/downloads/
First thing you need to do is to download and install PowerBi.
Publishing the PowerBi Dashboard for public use online, we use a paid license that is part of the the campus Office365 agreement. https://learn.microsoft.com/en-us/power-bi/collaborate-share/service-publish-to-web
Knime (data processing)
We use Knime Data Analytics platform for wrangling the data without coding, to get it from the input to the output data formats that can be used in the dashboard. More about that data processing in section “02 temp”. Knime is open source and can be downloaded and installed for free. https://www.knime.com/downloads/download-knime
Dashboard File
Aurora-SDG-Research-Dashboard-v8.pbix
Data files
/01 input/
All universities have been asked to deliver two files. 1. A file with all their publications, and 2. Optionally a file with their authors and internal organisational structure.
publication table
Format: CSV (comma separated!), File name convention: [University Name]-publications.csv
requires the following columns:
- Title
- Abstract (required for text classifier. If not available, title will be used, but has less good results.)
- Publication type (book / article / conference proceeding / etc)
- DOI (required to add societal impact metrics data; open access status, policy and patent citations, etc.)
- ISBN (required to add societal impact metrics data; open access status, policy and patent citations, etc.)
- other identifier (local l/ internal / system identifier. fall back to uniquely identify the record)
- Publication year (YYYY)
- author names (semi-colon separated ; )
- ORCID authors identifiers (semi-colon separated ; )
- Local Author identifiers (local/ internal / system identifier) (semi-colon separated ; )
authors table
optional: If you want the SDG dashboard to be able to "zoom in" to the level of the faculties and departments of your university, we require an additional table.
Format: CSV (comma separated!), File name convention: [University Name]-authors.csv
requires the following columns:
- Author full name
- Author first name
- Author last name
- Local author identifier (use same as in publications table; for linking)
- ORCiD author identifier (if available)
- ISNI author identifier (if available)
- University name (org level 0)
- Faculty name (org level 1)
- Department name (org level 2)
- Other Organisational Unit name (org level 3)
/02 temp/
From input to enrichment to output.
When we have all the separate files, now we need to put them all together.
Then we need to enrich the publications with external data. The following external sources have been used.
- SDG classfication probabilities.
- Service: Aurora SDG multi-lingual classification AI
- Input: abstract of the publications (title is used if abstract was not available)
- Output: probabilities of classification per SDG
- Citation metrics
- Service: www.Scival.com (based on Scopus data from Elsevier)
- Input: DOI’s of the publications. (split and imported in batches)
- Output: Field Weighted Citation Impact scoresand Top percentiles, Topics, Topic Clusters, All Science Journal Classification (ASJC), Affiliated Institutions and Countries. (exported and appended from batches )
- Open Access metrics
- Service: www.unpaywall.org (from OurResearch)
- Input: DOI’s of the publications
- Code: https://github.com/Aurora-Network-Global/data-transform-scripts
- Output: Open Access colors, status, url
- Policy mentions (separate table; due to one-to-many relations)
- Service: www.overton.io (from OpenPolicy Ltd.)
- Input: DOI’s of the publications.
- Output: detailed information of policy documents mentioning publications.
- Remark: This file we got in bulk from Overton, and did not process this way, but inserted it in the dashboard directly.
- News and Patents (separate table; due to one-to-many relations)
- Service. www.altmetric.com (from DigitalScience)
- Input: DOI’s of the publications.
Code: https://github.com/Aurora-Network-Global/data-transform-scripts - Output: basic information (type, source, title, url) of news and patent mention publications.
- Scite
- Service: www.scite.ai
- Input: DOI’s of the publications.
- Code: https://github.com/Aurora-Network-Global/data-transform-scripts
- Output: citation sentiment : metioned, contradicted, supported.
Some columns contain strings with multiple values, like the author column, this we need to split is separate files, while keeping the relation with the publications using the row-index-id.
The main file contains 800 rows and over 40 columns, and is 1.1 GB uncompressed data. To increase performance on transferring and loading the data in the dashboard we used the Open standards Parquet format, with is able to compress the data to 300MB. The other split tables are in XSLX format, which is in fact a ZIP compressed XML file.
This whole process has been described in the appendix. The Knime file containing the data processing components is:
Aurora Data Processing – SDGResearch Dashboard v8.knwf
/03 output/
These are the files you need to replace. Below you’ll find the description of the file and the columns with the keys that are used to connect to the other tables.
aurora_all.parquet
Description: This is the main file all others connect to. It contains the details per publication, including source university, type, year, metrics data, categories, etc.
Keys: row-index-id, doi, university
aurora-parts-author-names.xlsx
Description: This is a split list of all author names, containing a single author name per row, and the row-index-id connecting it back to the publication in the main table.
Keys: row-index-id
aurora-parts-author-ids.xlsx
Description: This is a split list of all local author ids, containing a single author id per row, salted with the university name to make it unique, and the row-index-id connecting it back to the publication in the main table.
Keys: row-index-id, author-id
aurora-author-org-structure.xlsx
Description: This is a list of all authors, containing a single author per row, including the three organisational sublevels this author is affiliated with in the internal organisation. The local author-id is salted with the university name to make it unique, to connect it back to the author list.
Keys: author-id
aurora-parts-scival-labels-researchareas.xlsx
Description: This is a split list of the multiple value field containing the All Science Journal Classification names (ASJC) as we have reseived from Scival per publication. It is containing a single ASJC name per row and the row-index-id connecting it back to the publication in the main table.
Keys: row-index-id
aurora-publications-scival-institutions.xlsx
Description: This is a split list of the multiple value field containing the affiliated research institutes as we have reseived from Scival per publication. It is containing a single Institution name per row and the row-index-id connecting it back to the publication in the main table.
Keys: row-index-id
aurora-publications-scival-countries.xlsx
Description: This is a split list of the multiple value field containing the affiliated countries of the research institutes as we have reseived from Scival per publication. It is containing a single Country name per row and the row-index-id connecting it back to the publication in the main table.
Keys: row-index-id
aurora-publications-sdg-labels.xlsx
Description: This is a split list of the multiple value field containing SDG Goal labels (SDGshort) based on the SDG probabilities. It is containing two sheets, SDG95 and SDG98. SDG95 contains only an SDG label of a publication where the related to SDG probability is between 95% and 100%. SDG98 contains only an SDG label of a publication where the related to SDG probability is between 98% and 100%. In the dashboard we use only SDG98 publications. Each sheet is containing a single SDG Goal label (SDGshort) per row, and the row-index-id connecting it back to the publication in the main table.
Keys: row-index-id, SDGshort
aurora-altmetric-mentions.xlsx
Description: This is a table containing multiple mentions (news and patents) related to a publication. Each row is containing mention details (eg. Title, type, url, source, etc.), and the DOI connecting it back to the publication in the main table.
Keys: DOI
aurora_dois_cited_in_policy_aug_2022.xlsx
Description: This is a table containing multiple policy documents related to a publication. Each row is containing policy document (eg. title, type, url, source, etc.), and the DOI connecting it back to the publication in the main table.
Keys: DOI
/04 report/
The files in here are the report it self, and supporting data files.
Aurora-SDG-Research-Dashboard-v8.pbix
Description: This is the PowerBI report containing the SDG dashboard.
Keys: not relevant
20200820 SDGs - JSON Color Codes.json
Description: This is a PowerBI color template, containing the SDG color codes (HEX) in sequential order, from SDG01 to SDG17.
Keys: not relevant
list-countries-geographical-regions-UNSD-M49.xlsx
Description: This is the official UN country table containing, counties, categorised by region and continent, and developing state. The Country name is used to connecting it back to Scival table containing county affiliated research institute of the publications.
Keys: Country
list-sdgs.xlsx
Description: This is a table containing details about the SDG’s, like the image URL, name variants, full descriptions, etc. The SDGshort is connecting it back to the SDG labels table, that is connected to the publications in the main table.
Keys: SDGshort
list-universities.xlsx
(!) This also is the file you need to replace with details of your own universities, like name variations used in the main file, or in scival institution.
Description: This is a table containing details about the universities, like the image URLs, name variants used in different tables (aurora and Scival). University column is connecting it back to the publications in the main table.
Keys: University
Data file Relationships
The illustration shows how the different tables are connected in a relational database diagram. All connections are bi-rirectional in order to filter publications based on the information in the other tables and vise versa.
Notes
Files
Aurora SDG Research Dashboard - public-uncompressed.zip
Files
(10.0 GB)
Name | Size | Download all |
---|---|---|
md5:50a92e5903f2cd0810d9ed072c7ab8a8
|
10.0 GB | Preview Download |
md5:442aa8f5c0352dd583e684884eec0d9e
|
1.2 MB | Preview Download |
Additional details
Related works
- Requires
- Dataset: 10.5281/zenodo.6644946 (DOI)