AURORA-MESS: Annotated Uncertainty and Reference in Open Research Articles for Multidisciplinary and Empirical Social Science
Description
The AURORA-MESS dataset offers valuable insights into the representation of uncertainty in scientific literature across various domains, along with associated authorial references. Researchers and practitioners can use this dataset to study and analyze the variations of uncertainty expressions in scholarly discourse.
This dataset contains sentences extracted from articles in a wide range of fields, covering both Science, Technology, and Medicine (STM); and Social Sciences and Humanities (SSH) and annotated with respect to uncertainty in science. The dataset is derived from PubMed, Scopus, Web of Science (WoS) and Social Science Open Access Repository (SSOAR). It has been produced as part of the ANR InSciM (Modelling Uncertainty in Science) project. For a more comprehensive understanding of the construction of the dataset, including the selection of journals, sampling procedure, and the annotation methodology, see Ningrum and Atanassova (2024); and Ningrum, Mayr, and Atanassova (2023).
The dataset is provided in CSV format. The columns in the table are as follows:
- sentence_id: A unique internal identifier for each sentence.
- journal_name: The name of the journal in which the article was published.
- article_title: The title of the article from which the sentence was extracted.
- publication_year: The year the article was published.
- document_id: The URL where the article is published.
- sentence: The text of the sentence.
- uncertainty: 'Y' if the sentence expresses uncertainty, and 'N' otherwise.
- reference: This column contains three classes:
- '1' indicates uncertainty arising directly from the authors’ statements.
- '2' represents uncertainty attributed to prior research, where the authors cite past studies to introduce uncertainty.
- '3' includes instances where the author(s) express shared uncertainty, drawing on both their findings or arguments and previous research.
Reference
Ningrum, P. K., & Atanassova, I. (2024). Annotation of scientific uncertainty using linguistic patterns. Scientometrics, 1-25. https://doi.org/10.1007/s11192-024-05009-z
Ningrum, P. K., Mayr, P., & Atanassova, I. (2023). UnScientify: Detecting Scientific Uncertainty in Scholarly Full Text. arXiv [Cs.CL]. Retrieved from http://arxiv.org/abs/2307.14236
Acknowledgment
We would like to acknowledge GESIS - Leibniz Institute for the Social Sciences for providing access to SSOAR, which has been instrumental in supporting the creation of the dataset.
Files
aorora-mess.csv
Files
(372.6 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:477a6b6b050aa011ec88831ac4cbede4
|
372.6 kB | Preview Download |
Additional details
Funding
- Agence Nationale de la Recherche
- InSciM - Modelling Uncertainty in Science ANR-21-CE38-0003
Dates
- Submitted
-
2025-03-10