Pre-processed PubMed data for a study of coauthorship
Description
This dataset was collected from the PubMed portal to MEDLINE and other repositories of biomedical research (https://www.ncbi.nlm.nih.gov/pubmed/). Analysis of the dataset led to the paper "Effects of research complexity and competition on the incidence and growth of coauthorship in biomedicine", published in PLOS One (http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0173444). The raw data were pre-processed using the script "clean.r" in the project directory on GitHub (https://github.com/corybrunson/coauthor) to obtain the file presented here.
The dataset is formatted as a data table (https://cran.r-project.org/web/packages/data.table/index.html), a class of data frame in R, and saved as a .RData file, which can be loaded into an R session via `load("path/to/dataset/pmDat.RData")`. The fields are as follows:
- `pmid` - the unique publication identifier (PMID) used by PubMed
- `jid` - the unique journal identifier used by PubMed
- `issn` - the (print) ISSN of the journal
- `ym` - the month and year of publication
- `nau` - the number of authors credited by the publication (up to any limits imposed by PubMed, and counting each author collective as a single author)
- `cau` - whether any corporate author was credited
- `rev` - whether the publication was tagged as a review
- `trial` - whether the publication was tagged as a clinical trial
- `npmt` - the number of MeSH terms assigned to the publication that were flagged as "major" topics
- `nmh` - the number of top-level MeSH headings assigned to the publication
- `supp` - whether the publication was tagged as having received financial support
- `ng` - the number of grants acknowledged by the publication
- `co` - the country in which the journal was published
Note that the field values for any publication can be validated by searching for the PMID in PubMed.
Files
Files
(47.7 MB)
Name | Size | Download all |
---|---|---|
md5:9876d667809bb71363609be7a977510e
|
47.7 MB | Download |