Dataset for Multidisciplinary Uncertainty Mining - ver1

Ningrum, Panggih Kusuma; Atanassova, Iana

doi:10.5281/zenodo.8024787

Published June 11, 2023 | Version 1

Dataset Open

Dataset for Multidisciplinary Uncertainty Mining - ver1

1. Université de Franche-Comté, CRIT, France

This dataset contains sentences extracted from articles in various disciplines and annotated with respect to uncertainty in science. It has been produced as part of the ANR InSciM (Modelling Uncertainty in Science) project.

The dataset is drawn from reputable scientific articles from a variety of disciplines. It consists of two distinct samples of sentences, each annotated using a different method. The first sample is obtained through uncertainty cue mapping, while the second sample is derived from manual annotation of randomly selected articles. To ensure comprehensive annotation, both samples were manually annotated using our multidimensional annotation framework.

For a more comprehensive understanding of the construction of the dataset, including the selection of journals, sampling procedure, and the annotation methodology, see (Ningrum and Atanassova, 2023).

This dataset provides valuable insights into the representation of uncertainty within scientific literature across different domains. Researchers and practitioners can utilize this dataset to study and analyze the different dimensions of uncertainty in scientific discourse.

The dataset is presented as a CSV table where colons ( are used as delimiters. The columns of the table are as follows :

source : 'db' or 'manual' referring to the method used to identify and extract the sentence;
article_id : internal id of the article from which the sentence was extracted;
sen_id : internal unique id of the sentence;
cue : uncertainty cue present in the sentence;
text : sentence text;
journal_id : short name of the journal;
check : 'Y' if the sentence expresses uncertainty and 'N' otherwise;
ref, nature, context, timeline, expression : annotations of the type of uncertainty according to the annotation framework proposed by (Ningrum and Atanassova, 2023).

It is essential to highlight the presence of duplicate data in the dataset. These duplicates arise from the detection of multiple cues in sentences during the cue mapping procedure. While one might consider omitting these duplicates, we deliberately chose to retain them. This decision allows for a more comprehensive understanding of how the cues manifest within the sentences. By analyzing the duplicate instances, we can gain valuable insights into the various ways in which the cues are expressed.

Bibliography

Ningrum, P. K., Atanassova, I. (2023) "Scientific Uncertainty: an Annotation Framework and Corpus Study in Different Disciplines" In 19th International Conference of the International Society for Scientometrics and Informetrics (ISSI 2023), Bloomington, Indiana, US.

Files

dataset_for_multidisciplinary_uncertainty_mining_v1.csv

Files (119.0 kB)

Name	Size	Download all
dataset_for_multidisciplinary_uncertainty_mining_v1.csv md5:e8b0ba3b34ca7f8b7bda8fb9cdbc9080	119.0 kB	Preview Download

Additional details

Agence Nationale de la Recherche
InSciM - Modelling Uncertainty in Science ANR-21-CE38-0003

	All versions	This version
Views	361	346
Downloads	131	114
Data volume	20.6 MB	17.6 MB

Dataset for Multidisciplinary Uncertainty Mining - ver1

Creators

Description

Files

dataset_for_multidisciplinary_uncertainty_mining_v1.csv

Files (119.0 kB)

Additional details

Funding