Clustering of Microbiome Data: Evaluation of Ensemble Design Approaches

doi:10.1109/BIBE.2019.00156

Published December 26, 2019 | Version v1

Conference paper Open

Clustering of Microbiome Data: Evaluation of Ensemble Design Approaches

1. BioSense Institute, University of Novi Sad
2. Faculty of Technical Sciences, University of Novi Sad
3. Biotechnical Faculty, University of Ljubljana

Abstract—Microbiome studies are attracting increasing interest, especially in human health applications, where their use
for disease prognostics, diagnostics and treatment can have immense effects on life quality. The settings in the microbiome data preprocessing stage can lead to the great variability of the generated operational taxonomic unit (OTU) tables, reflected in the size and sparseness of this data matrix. As there are still no solid guidelines on the best practices, it is valuable to assess which machine learning algorithms provide higher stability of results under variable preprocessing settings. In this study, we have generated OTU tables using data from the Moving pictures of human microbiome study using two different reference databases (Greengenes and Silva) and four levels of the similarity threshold (ranging from 90 to 99%), processed in the QIIME bioinformatics package. The results of the two best-performing classification and clustering algorithms are presented in detail: Random Forest classifier (RF) and Spectral clustering (SC). The random forest classifier has outperformed spectral clustering in terms of accuracy. As the rate of data generation increases, while the cost of labeling remains high, further improvement of clustering performance and ensemble approaches should be explored.

Notes

This research was in part supported by the grants III43002 and III44006 of the Ministry of Education, Science and Technological Development of the Republic of Serbia. It was based upon work done within the COST Action CA18131: Statistical and machine learning techniques in human microbiome studies, that's supported by the COST Association (European Cooperation in Science and Technology).

Files

10.1109@BIBE.2019.00156.pdf

Files (718.5 kB)

Name	Size	Download all
10.1109@BIBE.2019.00156.pdf md5:f4a997f9492047eaeb1666ccc61ef09e	718.5 kB	Preview Download

	All versions	This version
Views	84	84
Downloads	238	238
Data volume	175.3 MB	175.3 MB

Clustering of Microbiome Data: Evaluation of Ensemble Design Approaches

Creators

Description

Notes

Files

10.1109@BIBE.2019.00156.pdf

Files (718.5 kB)