Consensus Clustering for Cancer Gene Expression Data Large-Scale Analysis using Evidence Accumulation Approach

doi:10.5220/0006174501760183

Published February 21, 2017 | Version v1

Conference paper Open

Consensus Clustering for Cancer Gene Expression Data Large-Scale Analysis using Evidence Accumulation Approach

1. Faculty of Technical Sciences, University of Novi Sad, Serbia
2. BioSense Institute, University of Novi Sad, Serbia
3. Instituto de Telecomunicacoes, Instituto Superior Tecnico, Lisbon, Portugal

Clustering algorithms are extensively used on patient tissue samples in order to group and visualize the microarray data. The high dimensionality and probe specific noise make the selection of the appropriate clustering algorithm an uneasy task. This study presents a large-scale analysis of three clustering algorithms: k-means, hierarchical clustering (HC) and evidence accumulation clustering (EAC) on thirty-five cancer gene expression data sets selected to benchmark the performance of the clustering algorithms. Separated performance analysis was done on data sets from Affymetrix and cDNA chip platforms to examine the possible influence of the microarray technology. The study revealed no consistent algorithm ranking can be inferred, though in general EAC presented the best compromise of adjusted rand index (ARI) and variance. However, the results indicated that ARI variance under repeated k-means initializations offers useful information on the need to implement more complex clustering techniques. If repeated K-means converges to the same partition, also confirmed by the HC clustering, there is no need to run EAC. However, under moderate or highly variable ARI in repeated K-means, EAC should be used to reduce the uncertainty of clustering and unveil the data structure.

Notes

The work was in part financed by: the COST Action TD1405 ENJECT grant awarded to Tatjana LončarTurukalo for short term scientific mission hosted by prof. Ana Fred at Institute for Telecommunications, Instituto Superior Technico, Portugal, by Serbian Ministry of Education and Science (Project III 43002, TR32040), and by the Portuguese Foundation for Science and Technology, scholarship number SFRH/BPD/103127/2014 and grant PTDC/EEISII/7092/2014.

Files

61745 (1).pdf

Files (1.3 MB)

Name	Size	Download all
61745 (1).pdf md5:76035a85beb99c8751751b9271ed000c	1.3 MB	Preview Download

	All versions	This version
Views	43	43
Downloads	63	63
Data volume	85.4 MB	85.4 MB

Consensus Clustering for Cancer Gene Expression Data Large-Scale Analysis using Evidence Accumulation Approach

Creators

Description

Notes

Files

61745 (1).pdf

Files (1.3 MB)