DAVI: A Dataset for Automatic Variant Interpretation

Longhin, Francesca; Guazzo, Alessandro; Longato, Enrico; Ferro, Nicola; Di Camillo, Barbara

doi:10.5281/zenodo.12697421

Published July 9, 2024 | Version v1

Dataset Open

DAVI: A Dataset for Automatic Variant Interpretation

1. University of Padova, Department of Information Engineering, Padova, 35131, Italy
2. University of Padova, Department of Comparative Biomedicine and Food Science, Legnaro (PD), 35020, Italy

The analysis of an individual’s genetic material may uncover genetic variants, which can be classified as disease-causing (pathogenic) or benign. Identifying pathogenic variants among millions of variants relies on the research of evidence in support of or against variant pathogenicity, a process regulated by the American College of Molecular Genetics (ACMG) guidelines, which leverages data from the scientific literature. Despite recent improvements towards automation, searching shreds of evidence for pathogenicity in the literature still requires manual curation, a time-consuming process, due to the ever-growing number of published papers.

In this work, we built DAVI (Dataset for Automatic Variant Interpretation), a reliable, manually curated dataset comprising 1239 sentences extracted from 311 (variant, article) associations for a pool of 41 variants. 597 sentences contain (positive) evidence activating two opposing ACGM criteria, namely PS3 and BS3, while the remaining 642 do not contain (negative) evidence activating either of the two considered ACGM criteria. (variant, article) associations containing at least one positive sentence are classified as positive, while (variant, article) associations containing any positive sentence are negative. Therefore DAVI also contains 154 positive and 157 negative (variant, article) associations.

Files

curated_sentences.csv

Files (576.5 kB)

Name	Size	Download all
curated_sentences.csv md5:369e19f5b9c721a844bc3aacb81edf7b	576.5 kB	Preview Download

Additional details

Is published in: 10.1007/978-3-031-42448-9_8 (DOI)

	All versions	This version
Views	100	100
Downloads	70	70
Data volume	57.1 MB	57.1 MB

DAVI: A Dataset for Automatic Variant Interpretation

Authors/Creators

Description

Files

curated_sentences.csv

Files (576.5 kB)

Additional details

Related works