ClinvArbitration data release - August 2025
Creators
Description
This file is a tarball representing the ClinvArbitration re-summary of ClinVar's raw submissions. The ClinvArbitration project represents an altered aggregation of the individual submissions, preferring to break ties when presented with submissions which don't all agree, instead of defaulting to a rating of "conflicting interpretations of pathogenicity". This leads to more variants being presented as either B/LB, or P/LP, and a reduced grey area between.
This data release contains a two items
- The results of the re-interpretation of ClinVar, presented as a Hail Table.
- A second project to build on top of this and deliver annotations approximating the PM5 consequence category according to the ACMG criteria, also as a Hail table
This second part is not easily applied by existing tools, and represents the following:
For each Pathogenic SNV in ClinVar, we annotate the variants using BCFtools CSQ. For each Pathogenic SNV which is also a Missense variant, we reogrganise the data to be indexed on Transcript and Codon number. This can then be inverted to annotate genetic variation - if a variant is a Missense, and a ClinVar pathogenic Missense variant exists affecting the same Codon, we annotate the Missense with co-located known pathogenic ClinVar entries, in case this contributes to the interpretation of the variant under investigation.
Two scripts in the ClinvArbitration repository have been created, one for each of these Hail Tables, to revert their contents back to TSV format for general use. The TSV representations of this data are 34x larger than the Hail Tables, so I've made the choice to distribute the Hail Table + write an adapter, instead of distributing all possible data formats. See the "convert_X" Scripts in the ClinvArbitration scripts folder.
I would like to acknowledge that since this side project started, a substantial curation effort has been made in ClinVar, so the gap between the standard and re-interpreted ClinVar results has closed substantially. The exact data format presented here is required by Talos, a whole-Exome/Genome variant prioritisation tool, so despite the increasing consistency between the two results sets this exact data format should continue to be distributed.
These monthly summaries were previously released on the GitHub Release Page for the ClinvArbitration tool, but the max data size for attachments has been reached. Going forward, Zenodo makes sense as a home for release of these files.
Files
Files
(52.1 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:e396ef923fd973b2be62e8ea6c6e9e29
|
52.1 MB | Download |
Additional details
Dates
- Updated
-
2025-06-16
Software
- Repository URL
- https://github.com/populationgenomics/ClinvArbitration
- Programming language
- Python
- Development Status
- Active