Preprint Citations in PLOS Dataset
Authors/Creators
- 1. CRIT, Université de Franche-Comté, Institut Universitaire de France (IUF)
- 2. ELICO, Université Claude Bernard Lyon 1
Description
Preprints are research articles that have been published online before undergoing peer review. The role of preprints in the scientific production has been growing in recent years. Our objective is to study these practices and evaluate the differences that exist between citations to preprints and citations to peer-reviewed articles.
This dataset contains citation contexts to preprints extracted from the PLOS dataset. We have processed all PLOS articles published up to January 2021. Preprint citations were identified by matching cited source metadata against a list of existing preprint databases. For each citation we have extracted the sentence and its position in the IMRaD structure of the article.
The data is presented in a tsv file that contains the following columns :
- id: identifier.
- source_name: name of the preprint database where the preprint is published. In some cases source_name is "preprint kw" which means that it has been identified by the presence of the "preprint" keyword in the source metadata, but could not be linked to a known preprint database.
- jtitle: title of the PLOS journal from which the citation context is extracted.
- imrad_code: one of "I", "M", "R", "D", indicating the name of the section of the citation context in the IMRaD (Introduction, Methods, Results and Discussion) structure.
- perc: a number between 0 and 100, indicating the position of the citation context in terms of percentage of the text progression of the section in which it appears. This position has been calculated by dividing the number of the sentence of the citation context by the total number of sentences in the section.
- pub_year: publication year of the article
- sentence_text: sentence containing the citation to the preprint.
The full description of the dataset and the processing steps to obtain it are described in:
Bertin, Marc and Atanassova, Iana (2022). "Preprint Citation Praxis in PLOS". Scientometrics.
Files
Files
(1.8 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:3a34c65096f46ecec4ed256a3dd1e7d9
|
1.8 MB | Download |
Additional details
Funding
- Agence Nationale de la Recherche
- TheoScit - Study of citation contexts for a construction of semantic relational indicators ANR-20-CE38-0003