Clustering of Microbiome Data: Evaluation of Ensemble Design Approaches
- 1. BioSense Institute, University of Novi Sad
- 2. Faculty of Technical Sciences, University of Novi Sad
- 3. Biotechnical Faculty, University of Ljubljana
Description
Abstract—Microbiome studies are attracting increasing interest, especially in human health applications, where their use
for disease prognostics, diagnostics and treatment can have immense effects on life quality. The settings in the microbiome data preprocessing stage can lead to the great variability of the generated operational taxonomic unit (OTU) tables, reflected in the size and sparseness of this data matrix. As there are still no solid guidelines on the best practices, it is valuable to assess which machine learning algorithms provide higher stability of results under variable preprocessing settings. In this study, we have generated OTU tables using data from the Moving pictures of human microbiome study using two different reference databases (Greengenes and Silva) and four levels of the similarity threshold (ranging from 90 to 99%), processed in the QIIME bioinformatics package. The results of the two best-performing classification and clustering algorithms are presented in detail: Random Forest classifier (RF) and Spectral clustering (SC). The random forest classifier has outperformed spectral clustering in terms of accuracy. As the rate of data generation increases, while the cost of labeling remains high, further improvement of clustering performance and ensemble approaches should be explored.
Notes
Files
10.1109@BIBE.2019.00156.pdf
Files
(718.5 kB)
Name | Size | Download all |
---|---|---|
md5:f4a997f9492047eaeb1666ccc61ef09e
|
718.5 kB | Preview Download |