poyuliu/KTU: KTU
Description
KTU: K-mer-based taxonomic clustering algorithm improves biological relevance in microbiome associated study
The 16S rDNA amplicon sequencing is widely implemented for microbiome associated studies. Microbiota feature picking algorithms for taxonomic identification and quantification are greatly renewing in recent years. The amplicon sequence variant (ASV) denoising algorithm of unbiased sequence picking replaces the OTU clustering methods. The ASV features can detect and distinguish the biological variations under the species OTU level (≧97% similarity). However, the quantification of single ASV among sequencing samples are sparse and less prevalent in the same biological groups. Here, we introduce a k-mer based, sequence alignment-free algorithm – "KTU" (K-mer Taxonomic Unit) for re-clustering ASVs into taxonomic units with more biological relevance.
The "KTU" algorithm was designed with four parts (k-mer frequency counting, k-mer frequency similarity measurement, k-mer feature partitioning, and generating KTU table) and conducted in the R environment. The k-mer frequency counting was conducted by tetranucleotide frequency of amplicons; 256 tetranucleotide compositions were then converted to a 0-to-1 proportion. The similarities of k-mer frequency among amplicons were measured by cosine similarity. The similarity matrix then was converted to the distance matrix for the subsequent step. The KTUs were detected from the cosine distance matrix by using partition around medoids (PAM) clustering algorithm; the iterative PAM-KTU detecting process found the optimal cluster numbers of KTUs. The final step of the KTU algorithm was aggregating ASVs into the KTUs and generating the KTU table.
Files
poyuliu/KTU-v1.0.3.zip
Files
(14.2 kB)
Name | Size | Download all |
---|---|---|
md5:47710a989d88146a9fc654c6c00cac4d
|
14.2 kB | Preview Download |
Additional details
Related works
- Is supplement to
- https://github.com/poyuliu/KTU/tree/v1.0.3 (URL)