Published February 26, 2021 | Version 1.0
Dataset Open

PUMA pipeline output

  • 1. University of Liverpool
  • 2. Newcastle University
  • 3. University of Oxford

Description

Output of the PUMA (PUblications Metadata Augmentation) software pipeline which takes a list of journal articles and augments it with metadata from external sources. This augmented metadata is then processed to generate data files and an explorable/searchable set of HTML pages.

The PUMA pipeline is available at: https://github.com/OllyButters/puma and is described at: https://doi.org/10.12688/f1000research.25484.1

These attached files are the result of running the pipeline on the list of publications described at: https://doi.org/10.12688/wellcomeopenres.14986.1 on 2021-01-15. Rerunning the pipeline on this list may result in slightly different outputs due to the changing content of the external metadata sources.

Screenshots of the output HTML pages:

  • PUMA_home_2021-01-15.png   - Summary of all publications.
  • PUMA_2011_2021-01-15.png   - All publications from 2011.
  • PUMA_map_2021-01-15.png   - Choropleth map of first author's country.
  • PUMA_asthma_2021-01-15.png   - All publications with an asthma MeSH.
  • PUMA_metrics_2021-01-15.png   - Simple metrics.
  • PUMA_word_cloud_2021-01-15.png   - Word cloud of abstract text.
  • PUMA_coverage_2021-01-15.png   - Table showing completeness of metadata.

 

Generated data files

  • authors.csv   - Frequency of authors.
  • first_authors.csv   - Frequency of first authors.
  • first_authors_inst.csv   - Frequency of first authors' institutes.
  • journals.csv   - Frequency of journals published in.
  • abstract_lemmatized.csv   - Frequency of lemmatized abstract words.
  • abstract_lemmatized_by_year.csv   - Frequency of lemmatized abstract words broken down by year.
  • title_lemmatized.csv   - Frequency of lemmatized title words.
  • title_lemmatized_by_year.csv   - Frequency of lemmatized title words broken down by year.
  • keywords_lemmatized.csv   - Frequency of lemmatized keywords.
  • keywords_lemmatized_by_year.csv   - Frequency of lemmatized keywords broken down by year.

 

Files

abstract_lemmatized.csv

Files (9.0 MB)

Name Size Download all
md5:8118a40ee843e166e2b49d1364837301
117.0 kB Preview Download
md5:e303a848305e6c9e3e3258fcfc92b4fe
593.2 kB Preview Download
md5:7ae90b2a77c46a8c86235a78ddb1ff7f
56.8 kB Preview Download
md5:43840efea928e1f8135f6cd01fe9e261
8.2 kB Preview Download
md5:d925a73c167d38c37a4c570590e4be92
5.7 kB Preview Download
md5:d1fdf71d8f966e5875826a0764e883d4
7.5 kB Preview Download
md5:6a3544e6abb69b172559dc2c186f9a47
23.1 kB Preview Download
md5:c2a280fe52cbda331dffe352641d9ff9
118.5 kB Preview Download
md5:062603da983c655f5682479fd41692c0
4.4 MB Preview Download
md5:089e5c73efcd6b1b26d60c6f1e054448
1.9 MB Preview Download
md5:96e04cf25457d80af9a078c94119f604
781.1 kB Preview Download
md5:e1be84d09dd8beea9b7d592d8b7e08be
172.0 kB Preview Download
md5:8e21e37dea315d2bb3eb40d839d72ec3
169.5 kB Preview Download
md5:6f3b149758494486309bd475bfbccfc8
210.0 kB Preview Download
md5:6cbf6c73d2474f08c0783eb2e7db7ba7
264.8 kB Preview Download
md5:04e5c6daab6833cc86dd1a2e33ae105e
28.6 kB Preview Download
md5:cd899e69a61ba833df56909835ae7423
147.8 kB Preview Download

Additional details

Related works

Is derived from
Journal article: 10.12688/f1000research.25484.1 (DOI)
Journal article: 10.12688/wellcomeopenres.14986.1 (DOI)