Info: Zenodo’s user support line is staffed on regular business days between Dec 23 and Jan 5. Response times may be slightly longer than normal.

Published October 20, 2021 | Version v1.0.3
Software Open

poyuliu/KTU: KTU

Creators

  • 1. National Taiwan University

Description

KTU: K-mer-based taxonomic clustering algorithm improves biological relevance in microbiome associated study

The 16S rDNA amplicon sequencing is widely implemented for microbiome associated studies. Microbiota feature picking algorithms for taxonomic identification and quantification are greatly renewing in recent years. The amplicon sequence variant (ASV) denoising algorithm of unbiased sequence picking replaces the OTU clustering methods. The ASV features can detect and distinguish the biological variations under the species OTU level (≧97% similarity). However, the quantification of single ASV among sequencing samples are sparse and less prevalent in the same biological groups. Here, we introduce a k-mer based, sequence alignment-free algorithm – "KTU" (K-mer Taxonomic Unit) for re-clustering ASVs into taxonomic units with more biological relevance.

The "KTU" algorithm was designed with four parts (k-mer frequency counting, k-mer frequency similarity measurement, k-mer feature partitioning, and generating KTU table) and conducted in the R environment. The k-mer frequency counting was conducted by tetranucleotide frequency of amplicons; 256 tetranucleotide compositions were then converted to a 0-to-1 proportion. The similarities of k-mer frequency among amplicons were measured by cosine similarity. The similarity matrix then was converted to the distance matrix for the subsequent step. The KTUs were detected from the cosine distance matrix by using partition around medoids (PAM) clustering algorithm; the iterative PAM-KTU detecting process found the optimal cluster numbers of KTUs. The final step of the KTU algorithm was aggregating ASVs into the KTUs and generating the KTU table.

Files

poyuliu/KTU-v1.0.3.zip

Files (14.2 kB)

Name Size Download all
md5:47710a989d88146a9fc654c6c00cac4d
14.2 kB Preview Download

Additional details

Related works