Dataset Open Access

InTeReC: In-text Reference Corpus - Single References Dataset

Bertin, Marc; Atanassova, Iana

This dataset contains a set of sentences extracted from articles published by the Public Library of Science (PLOS) up to September 2013. Information is given on the position of the sentences relative to the article and the section in which they appear, the section type with respect to the four main types of the IMRaD structure, as well as verb phrases that occur in the sentence. Each sentence contains one single in-text reference.

The dataset is in the CSV format. Size: 314023 sentences.

Column list:

  • journal: journal title
  • doi: DOI of the article from which the sentence was extracted
  • article-length: size of the article, as number of sentences
  • article-pos: position of the sentence in the article, as number of sentences from the beginning of the article
  • section-length: size of the section, as number of sentences
  • section-pos: position of the sentence in the section, as number of sentences from the beginning of the section
  • section-type: section type (see below)
  • sentence-text: full text of the sentence
  • verb-phrases: a list of verb phrases that occur in the sentence, comma separated

Possible section types are:

  • I: Introduction
  • M: Methods
  • R: Results
  • D: Discussion
  • MR: Methods and Results
  • RD: Results and Discussion

 

Full description of the construction of the dataset is published in:

Marc Bertin and Iana Atanassova (2018) InTeReC : an In-text Reference corpus for applying Natural Language Processing to Bibliometrics. Bibliometric-enhanced Information Retrieval: 7th International BIR workshop (7th BIR workshop) at the 40th European Conference on Information Retrieval (ECIR).

Files (84.2 MB)
Name Size
interec-singleref-v1.csv
md5:a2fbfc1f042e346dc7b52d094c02d278
84.2 MB Download
  • Marc Bertin and Iana Atanassova (2018) InTeReC : an In-text Reference corpus for Applying Natural Language Processing to Bibliometrics. Bibliometric-enhanced Information Retrieval: 7th International BIR workshop (7th BIR workshop) at the 40th European Conference on Information Retrieval (ECIR)

306
943
views
downloads
All versions This version
Views 306306
Downloads 943943
Data volume 79.4 GB79.4 GB
Unique views 268268
Unique downloads 886886

Share

Cite as