orkg-R0: A Dataset of Structured Summaries for the R0 estimate of Infectious Diseases from Complex Scientific Abstracts

Shamsabadi, Mahsa; D'Souza, Jennifer

doi:10.5281/zenodo.10003640

Published 2023 | Version v2

Dataset Open

orkg-R0: A Dataset of Structured Summaries for the R0 estimate of Infectious Diseases from Complex Scientific Abstracts

1. Technische Informationsbibliothek (TIB)

This dataset is a curated dataset obtained by filtering, cleaning and manually annotating the metadata file available at CORD_19 dataset (https://allenai.org/data/cord-19). It contains structured summaries for the R0 estimate of infectious diseases from scientific abstracts.

The main data directory contains two subdirectories, "raw" sub-folder where it holds the train, test, dev splits of the annotated data.

The "processed" subdirectory contains train, test, and dev JSON files filled in a sub-selection of "Templates for FLAN." prompts.

The sub-selection is :

Drop and Squad_v2 templates. template number 8 from Drop and template number 3 from Squad_v2 have been excluded among all splits.
Templates 9 and 10 from Drop have been just used in the training sets.

Two main dataset types are included in this repository: Text_based and Json_based.

The "dev_templated_files" subdirectory contains two subdirectories of "text" and "json".

The "text" sub-folder contains the raw "dev" split filled in all suitable templates for dev where the responses are in the defined structured text_based format.
The "json" sub-folder contains the raw "dev" split filled in all suitable templates for dev where the responses are in the defined structured json_based format.

The "test_templated_files" subdirectory contains two subdirectories of "text" and "json".

The "text" sub-folder contains the raw "test" split filled in all suitable templates for dev where the responses are in the defined structured text_based format.
The "json" sub-folder contains the raw "test" split filled in all suitable templates for dev where the responses are in the defined structured json_based format.

The "train_templated_files" Subdirectory contains subdirectories each representing a train dataset obtained using the specific templates.

it contains 20 different train sets each having 2 json_based and text_based versions, resulting in 40 different training sets.

Files

data.zip

Files (60.3 MB)

Name	Size	Download all
data.zip md5:d92ac38000869b09f976484015d81517	60.3 MB	Preview Download

	All versions	This version
Views	344	144
Downloads	61	31
Data volume	3.7 GB	1.9 GB

orkg-R0: A Dataset of Structured Summaries for the R0 estimate of Infectious Diseases from Complex Scientific Abstracts

Creators

Description

Files

data.zip

Files (60.3 MB)