Published March 6, 2017 | Version v1
Dataset Open

Pre-processed PubMed data for a study of coauthorship

  • 1. UConn Health

Description

This dataset was collected from the PubMed portal to MEDLINE and other repositories of biomedical research (https://www.ncbi.nlm.nih.gov/pubmed/). Analysis of the dataset led to the paper "Effects of research complexity and competition on the incidence and growth of coauthorship in biomedicine", published in PLOS One (http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0173444). The raw data were pre-processed using the script "clean.r" in the project directory on GitHub (https://github.com/corybrunson/coauthor) to obtain the file presented here.

The dataset is formatted as a data table (https://cran.r-project.org/web/packages/data.table/index.html), a class of data frame in R, and saved as a .RData file, which can be loaded into an R session via `load("path/to/dataset/pmDat.RData")`. The fields are as follows:

  • `pmid` - the unique publication identifier (PMID) used by PubMed
  • `jid` - the unique journal identifier used by PubMed
  • `issn` - the (print) ISSN of the journal
  • `ym` - the month and year of publication
  • `nau` - the number of authors credited by the publication (up to any limits imposed by PubMed, and counting each author collective as a single author)
  • `cau` - whether any corporate author was credited
  • `rev` - whether the publication was tagged as a review
  • `trial` - whether the publication was tagged as a clinical trial
  • `npmt` - the number of MeSH terms assigned to the publication that were flagged as "major" topics
  • `nmh` - the number of top-level MeSH headings assigned to the publication
  • `supp` - whether the publication was tagged as having received financial support
  • `ng` - the number of grants acknowledged by the publication
  • `co` - the country in which the journal was published

Note that the field values for any publication can be validated by searching for the PMID in PubMed.

Files

Files (47.7 MB)

Name Size Download all
md5:9876d667809bb71363609be7a977510e
47.7 MB Download