Charon dehosting long read index
Authors/Creators
Contributors
Data curator:
Description
This pre-built index can be used with Charon to dehost metagenomic long read datasets.
The file host_microbial.accessions.txt.gz contains a break down of the accessions included in each input file used to generate the index. The human files (one per chromosome) each include up to 100 complete references from assemblies generated by the HPRC (always including T2T). In addition, it includes the HLA alleles downloaded from here. The microbial files include references from FDA-ARGOS, along with a subset of complete RefSeq genomes. These represent archaea, bacteria, fungi, sar and viruses, split over a number of files per phylum where necessary.
The index was built from these reference FASTA using the command charon index host_microbial.tab --log index_host_microbial.log using default parameters (w=41, k=19, number_hashes=3) with release v1.0.4.
To extract the microbial reads, first unzip the index using `gunzip host_microbial.tab.idx.gz`. Then run
charon dehost -t 8 --db host_microbial.tab.idx <reads.fq.gz> --extract microbial --prefix <out_prefix>
More details on charon dehost options available on the github page.
Files
Files
(34.7 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:e6bea46a5849d8ec237360f0893a6153
|
69.1 MB | Download |
|
md5:e2d768e6752077548e7e9e1d5331ca18
|
9.7 kB | Download |
|
md5:903010fe3eabb7c3be383b5faaafe087
|
34.6 GB | Download |
Additional details
Software
- Repository URL
- https://github.com/rmcolq/charon
- Programming language
- C++