UVC-delins: calling deletion-insertion variants using greedy search with haplotype inference
Authors/Creators
Description
-> Short description
This dataset contains the BAM files selected from the SEQC2 somatic reference sets used for assessing the performance of MNV calls of different software packages.
This dataset supports the conclusion that the software package UVC-delins (https://github.com/genetronhealth/uvc-delins) performed the best on this dataset.
This dataset is in the public domain with no restriction on its use, as mentioned by the "NCBI Website and Data Usage Policies and Disclaimers" at https://www.ncbi.nlm.nih.gov/home/about/policies/
-> Methods
Download FASTQ files with accessions SRR7890887 and SRR7890889
Align with BWA MEM to GRCh38
Pre-process with GATK4 (MarkDuplicates, BaseRecalibration and ApplyBQSR)
Select only records mapped to chr22
-> Usage Notes
The software packages samtools and bcftools (very well-known in the high-throughput sequencing community) are needed to open the files presented in this dataset.
-> Keywords
Computer and information sciences, deletion-insertion, variant calling
-> Article abstract:
A deletion-insertion (delins) variant is observed as two or more single-nucleotide variants (SNVs) and/or insertion-deletions that co-occur in the supporting sequenced fragments. Namely, a delins variant is either a complex insertion-deletion (InDel) or a multiple-nucleotide variant (MNV). Some targeted cancer therapies require the accurate detection of complex InDels. For example, EGFR exon 19 simple/complex deletions show different sensitivities to drugs belonging to the class of tyrosine kinase inhibitor depending on their exact subtypes. MNVs are also clinically important. Unfortunately, the detection of complex InDels and MNVs, the two subclasses of delins variants, from next-generation sequencing data is challenging because delins variants are often either missed or called with the wrong forms. By using greedy search to find biologically plausible haplotypes and by using odds ratio to compute haplotype qualities, we developed UVC-delins, an algorithm for calling delins variants along with SNVs and InDels. We compared UVC-delins with several haplotype-based variant callers for calling SNV, InDel, and delins variants using four third-party datasets independently generated by simulation and by sequencing cancer cell lines and patient samples. The comparison shows that UVC-delins is characterized by almost 100% concordance with reference truth sets and with our manual review for calling SNV, InDel and delins variants. UVC-delins is publicly available under the APACHE-2 license at
https://github.com/genetronhealth/uvc-delins
Notes
Files
Files
(55.9 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:1add50658344f1fffef8735eb5d1c658
|
491 Bytes | Download |
|
md5:701002fbb4fef0e60e29d556e38e4382
|
55.9 MB | Download |