Published May 13, 2025 | Version v1
Dataset Open

Charon dehosting long read index

Contributors

Data curator:

Description

This pre-built index can be used with Charon to dehost metagenomic long read datasets.

The file host_microbial.accessions.txt.gz contains a break down of the accessions included in each input file used to generate the index. The human files (one per chromosome) each include up to 100 complete references from assemblies generated by the HPRC (always including T2T). In addition, it includes the HLA alleles downloaded from here. The microbial files include references from FDA-ARGOS, along with a subset of complete RefSeq genomes. These represent archaea, bacteria, fungi, sar and viruses, split over a number of files per phylum where necessary.

The index was built from these reference FASTA using the command charon index host_microbial.tab --log index_host_microbial.log using default parameters (w=41, k=19, number_hashes=3) with release v1.0.4.

To extract the microbial reads, first unzip the index using `gunzip host_microbial.tab.idx.gz`. Then run 

charon dehost -t 8 --db host_microbial.tab.idx <reads.fq.gz> --extract microbial --prefix <out_prefix>

More details on charon dehost options available on the github page.

Files

Files (34.7 GB)

Name Size Download all
md5:e6bea46a5849d8ec237360f0893a6153
69.1 MB Download
md5:e2d768e6752077548e7e9e1d5331ca18
9.7 kB Download
md5:903010fe3eabb7c3be383b5faaafe087
34.6 GB Download

Additional details

Software

Repository URL
https://github.com/rmcolq/charon
Programming language
C++