Pre-processed PubMed data for a study of coauthorship
Description
This dataset was collected from the PubMed portal to MEDLINE and other repositories of biomedical research (https://www.ncbi.nlm.nih.gov/pubmed/). Analysis of the dataset led to the paper "Effects of research complexity and competition on the incidence and growth of coauthorship in biomedicine", published in PLOS One (http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0173444). The raw data were pre-processed using the script "clean.r" in the project directory on GitHub (https://github.com/corybrunson/coauthor) to obtain the file presented here.
The dataset is formatted as a data table (https://cran.r-project.org/web/packages/data.table/index.html), a class of data frame in R, and saved as a .RData file, which can be loaded into an R session via `load("path/to/dataset/pmDat.RData")`. The fields are as follows:
- `pmid` - the unique publication identifier (PMID) used by PubMed
- `jid` - the unique journal identifier used by PubMed
- `issn` - the (print) ISSN of the journal
- `ym` - the month and year of publication
- `nau` - the number of authors credited by the publication (up to any limits imposed by PubMed, and counting each author collective as a single author)
- `cau` - whether any corporate author was credited
- `rev` - whether the publication was tagged as a review
- `trial` - whether the publication was tagged as a clinical trial
- `npmt` - the number of MeSH terms assigned to the publication that were flagged as "major" topics
- `nmh` - the number of top-level MeSH headings assigned to the publication
- `supp` - whether the publication was tagged as having received financial support
- `ng` - the number of grants acknowledged by the publication
- `co` - the country in which the journal was published
Note that the field values for any publication can be validated by searching for the PMID in PubMed.
Files
Files
(47.7 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:9876d667809bb71363609be7a977510e
|
47.7 MB | Download |