Published July 11, 2023 | Version 1.0
Journal article Open

StrainIQ: A novel n-gram based method for taxonomic profiling of human microbiota at the strain-level

  • 1. University of Nebraska Medical center, Omaha, NE 68198, USA

Description

StrainIQ (Strain Identification and Quantification) is a novel tool that implements a new n-gram-based algorithm for predicting and quantifying strain-level taxa from whole genome metagenomic sequencing data. We tested our method using simulated (GI tract reference genomes) and mock metagenomic datasets (from ATCC microbial genomic mix) and compared its performance with existing methods. The following Supplementary Tables, Supplementary Figures, and Datasets are presented/used in our study. And the StrainIQ code is freely available at https://github.com/GudaLab/StrainIQ to the research community.  

SupplementaryTables-Figures.zip: This zip file contains supplemental material (tables and figures) presented in this study.

Figure2-datasets.zip: It contains simulated positive (from GI tract genomes) and negative (from non-GI tract genomes)  metagenomic datasets that were used for determining optimal cut-off based on GI tract DSEM.

Simulated_datasets.zip: It contains the simulated metagenomic sequencing data (from GI tract reference genomes). It was used to measure the sensitivity and specificity of StrainIQ software in strain identification.

Gut_even.zip: This file contains the metagenomic sequencing data (~120x coverage) of the ATCC Gut Microbiome Genomic Mix (MSA-1006). From this original sequencing data, we created additional datasets with 90x, 60x, 30x, 5x, 3x, and 1x coverage and used these varying coverage datasets to test the StrainIQ performance.

Gut_mix.zip: This file contains the metagenomic sequencing data (~120x coverage) of the ATCC 20 strain staggered genomic mix (MSA-1003). From this original sequencing data, we created additional datasets with 90x, 60x, 30x, 5x, 3x, and 1x coverage and used these varying coverage datasets to test the StrainIQ performance.

Files

Figure2-datasets.zip

Files (38.8 GB)

Name Size Download all
md5:87e61c0a9669c2ec1525c1df86a05147
7.4 GB Preview Download
md5:9c8fc48ec385c04ef0ff86d3a66d7602
11.0 GB Preview Download
md5:c4cb497c689851fd18c585ab935eb3c0
11.4 GB Preview Download
md5:5ef8858792e18acc0f34ca7f1a8390cc
9.1 GB Preview Download
md5:8fb14abc602c980124511d5c343852be
253.2 kB Preview Download