Published May 9, 2022 | Version v2
Journal article Open

UVC-delins: calling deletion-insertion variants using greedy search with haplotype inference

Authors/Creators

Contributors

Contact person:

  • 1. Genetron Health

Description

-> Short description
This dataset contains the BAM files selected from the SEQC2 somatic reference sets used for assessing the performance of MNV calls of different software packages. 
This dataset supports the conclusion that the software package UVC-delins (https://github.com/genetronhealth/uvc-delins) performed the best on this dataset. 
This dataset is in the public domain with no restriction on its use, as mentioned by the "NCBI Website and Data Usage Policies and Disclaimers" at https://www.ncbi.nlm.nih.gov/home/about/policies/

-> Methods
Download FASTQ files with accessions SRR7890887 and SRR7890889
Align with BWA MEM to GRCh38
Pre-process with GATK4 (MarkDuplicates, BaseRecalibration and ApplyBQSR)
Select only records mapped to chr22
 

-> Usage Notes

The software packages samtools and bcftools (very well-known in the high-throughput sequencing community) are needed to open the files presented in this dataset. 

 

-> Keywords

Computer and information sciences, deletion-insertion, variant calling


-> Article abstract: 
A deletion-insertion (delins) variant is observed as two or more single-nucleotide variants (SNVs) and/or insertion-deletions that co-occur in the supporting sequenced fragments. Namely, a delins variant is either a complex insertion-deletion (InDel) or a multiple-nucleotide variant (MNV). Some targeted cancer therapies require the accurate detection of complex InDels. For example, EGFR exon 19 simple/complex deletions show different sensitivities to drugs belonging to the class of tyrosine kinase inhibitor depending on their exact subtypes. MNVs are also clinically important. Unfortunately, the detection of complex InDels and MNVs, the two subclasses of delins variants, from next-generation sequencing data is challenging because delins variants are often either missed or called with the wrong forms. By using greedy search to find biologically plausible haplotypes and by using odds ratio to compute haplotype qualities, we developed UVC-delins, an algorithm for calling delins variants along with SNVs and InDels. We compared UVC-delins with several haplotype-based variant callers for calling SNV, InDel, and delins variants using four third-party datasets independently generated by simulation and by sequencing cancer cell lines and patient samples. The comparison shows that UVC-delins is characterized by almost 100% concordance with reference truth sets and with our manual review for calling SNV, InDel and delins variants. UVC-delins is publicly available under the APACHE-2 license at
https://github.com/genetronhealth/uvc-delins

Notes

In this version, we removed the *uvc-truth.vcf files since they were outdated and can be generated by the evaluation pipeline anyway.

Files

Files (55.9 MB)

Name Size Download all
md5:1add50658344f1fffef8735eb5d1c658
491 Bytes Download
md5:701002fbb4fef0e60e29d556e38e4382
55.9 MB Download