Published March 25, 2024
| Version v1
Dataset
Open
NanoVarBench variant truthset files
Description
These tarballs contain the variant truthsets used for each sample in our paper "Benchmarking reveals superiority of deep learning variant callers on bacterial nanopore sequence data".
Each directory contains the following files:
-
<sample>.bed- A BED file of all regions in the genome <sample>.repetitive_regions.bed- A BED file of all repetitive regions of the genome (see the paper for details of how these were identified).<sample>.unique_regions.bed- Non-repetitive regions of the genome. This is the result of performingbedtools complement -i <repetitive BED> -g <faidx of mutref>ani.tsv- skani output from skani search for the sample's assembly against all of the downloaded genomes for that species. The last three columns are not from skani. They are completeness_percentile, completeness, and contamination metrics, all obtained from NBCI for each assembly accession.apply.vcf.gz- the variants that were applied to the sample's reference assembly.apply.vcf.gz.csi- VCF index for the above VCFdnadiff.vcf.gz- Variants between the sample and donor genome from mummer4minimap2.vcf.gz- Variants between the sample and donor genome from minimap2mutdonor.fna- the FASTA file of the selected variant donormutreference.fna- the sample's reference assembly with the apply.vcf.gz applied to it. This is the genome that the sample's read are aligned to for calling variantsmutreference.fna.fai- the faidx of the above genomereference.fna- the reference assembly of the sample. These are also available on GenBank, but are included here for interoperabilitytruth.vcf.gz- the truthset of variants. This is essentially apply.vcf.gz with the REF and ALT invert and the POS adjusted for the difference in position between the sample and donor assemblies. (See this script)vcfstats.txt- VCF statistics produced by paftools.js vcfstat on the truth VCF
For information about each sample, refer to the samplesheet and paper.
Files
Files
(54.1 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:8c91b8d327c5629736df3f63c3f21380
|
5.8 MB | Download |
|
md5:1e60d320f798f788b13b1947ee5efafb
|
4.2 MB | Download |
|
md5:2e2d8463d3c4ac97c9284f72abf6b451
|
5.3 MB | Download |
|
md5:83b063edc463cfdba25a9ce30a442234
|
6.9 MB | Download |
|
md5:fd96288db15f3b1493b47fe0e6d56cc0
|
3.0 MB | Download |
|
md5:5b124aa615731f312b1939b554c5cd60
|
5.2 MB | Download |
|
md5:39599e31cb95741eb18edd47c619492e
|
1.9 MB | Download |
|
md5:fa7dd267965b5e10abc2c342488b350f
|
2.0 MB | Download |
|
md5:b411f7bbd997d0806af43a8eb3d44371
|
3.2 MB | Download |
|
md5:b266cdb5d7cff95b703ae991c333948d
|
3.2 MB | Download |
|
md5:371775c6e9466e3beab3b531831caacd
|
3.2 MB | Download |
|
md5:3d46e5e6aa0f049c19b23f71f568598c
|
6.0 MB | Download |
|
md5:25b38b7bcbd4c7493b2c7a9aa61839a4
|
2.4 MB | Download |
|
md5:30d2ec8845697e66538cc2aea313e719
|
1.9 MB | Download |
Additional details
Related works
- Is derived from
- Preprint: 10.1101/2024.03.15.585313 (DOI)
Software
- Repository URL
- https://github.com/mbhall88/NanoVarBench