Published January 23, 2009
| Version 13604
Journal article
Open
BIDENS: Iterative Density Based Biclustering Algorithm With Application to Gene Expression Analysis
Creators
Description
Biclustering is a very useful data mining technique for
identifying patterns where different genes are co-related based on a
subset of conditions in gene expression analysis. Association rules
mining is an efficient approach to achieve biclustering as in
BIMODULE algorithm but it is sensitive to the value given to its
input parameters and the discretization procedure used in the
preprocessing step, also when noise is present, classical association
rules miners discover multiple small fragments of the true bicluster,
but miss the true bicluster itself. This paper formally presents a
generalized noise tolerant bicluster model, termed as μBicluster. An
iterative algorithm termed as BIDENS based on the proposed model
is introduced that can discover a set of k possibly overlapping
biclusters simultaneously. Our model uses a more flexible method to
partition the dimensions to preserve meaningful and significant
biclusters. The proposed algorithm allows discovering biclusters that
hard to be discovered by BIMODULE. Experimental study on yeast,
human gene expression data and several artificial datasets shows that
our algorithm offers substantial improvements over several
previously proposed biclustering algorithms.
Files
13604.pdf
Files
(386.2 kB)
Name | Size | Download all |
---|---|---|
md5:7d3edef9f69f3eafe426c786a05b0480
|
386.2 kB | Preview Download |
Additional details
References
- G. Getz, E. Levine, and E. Domany, "Coupled Two-Way Clustering Analysis of Gene Microarray Data," Proc. Natural Academy of Sciences US, pp. 12079-12084, 2000.
- C.Tang, L.Zhang, I.Zhang, and M.Ramanathan, "Interrelated Two-Way Clustering: An Unsupervised Approach for Gene Expression Data Analysis," Proc. Second IEEE Int-l Symp. Bioinformatics and Bioeng., pp. 41-48, 2001.
- Y. Cheng and G. Church, "Biclustering of expression data," Proc. Eighth Int-l Conf. Intelligent Systems for Molecular Biology(ISMB -00), pp. 93-103, 2000.
- J. Yang, W. Wang, H. Wang, and P. Yu, "Enhanced Biclustering on Expression Data," Proc. Third IEEE Conf. Bioinformatics and Bioeng.,pp. 321-327, 2003.
- T.M. Murali and S. Kasif, "Extracting Conserved Gene Expression Motifs from Gene Expression Data," Proc. Pacific Symp. Biocomputing,vol. 8, pp. 77-88, 2003.
- L. Lazzeroni and A. Owen, "Plaid Models for Gene Expression Data," technical report, Stanford Univ., 2000.
- A. Ben-Dor, B. Chor, R. Karp, and Z. Yakhini, "Discovering Local Structure in Gene Expression Data: The Order-Preserving Submatrix Problem," Proc. Sixth Int-l Conf. Computational Biology (RECOMB -02), pp. 49-57, 2002.
- J. Ihmels, S. Bergmann, and N. Brkai, "Defining Transaction Modules using large scale gene expression data," Bioinformatics,Vol.20,No.13,pp.1993-2003, 2004.
- A. Tanay, R. Sharan, and R. Shamir, "Discovering Statistically Significant Biclusters in Gene Expression Data," Bioinformatics, vol. 18, pp. S136-S144, 2002. [10] A. Prelic, S. Bleuler, P. Zimmermann, A.Wille, P. Buhlmann, W. Gruissem, L. Hennig, L. Thiele, and E.Zitzler, "A Systematic comparison and evaluation of biclustering methods for gene expression data," Bioinformatics, 22:1122-1129, 2006. [11] H. Sharara M.A.Ismail, "╬▒CORR: A novel algorithm for clustering gene expression data," Bioinformatics and Bioengineering, 2007. BIBE 2007. Proceedings of the 7th IEEE International Conference, pp. 974-981, 2007. [12] J. Liu and W. Wang, "OP-Cluster: Clustering by Tendency in High Dimensional Space," Proc. Third IEEE Int-l Conf. Data Mining, pp. 187- 194, 2003. [13] LCM ver2 Available http://research.nii.ac.jp/~uno/codes-j.html. [14] G. Liu,Jinyan, L. Kelvin and L. Wong, "Distance Based Subspace Clustering with Flexible Dimension Partitioning," IEEE, pp. 1250-1254, 2007. [15] J. Pei, A. K. Tung, and J. Han., "Fault-tolerant frequent pattern mining: Problems and challenges,"Workshop on Research Issues in Data Mining and Knowledge Discovery, 2001. [16] M. P. Wand, "Data-Based Choice of Histogram Bin Width," The American Statistician, vol. 51, 1996, pp. 59-64. [17] Sara C. Madeira and Arlindo L. Oliveira, "Biclustering Algorithms for Biological Data Analysis: A Survey," IEEE TRANS. Computational Biology And Bioinformatics, vol. 1, 2004. [18] Yeast and Human Dataset. Available http://arep.med.harvard.edu/network discovery. [19] SyntheticDatasets. Available http://www.tik.ee.ethz.ch/sop/bimax/SupplementMatrials,Biclustering.ht ml. [20] Y. Okada, W. Fujibuchi and P. Horton, "Module Discovery in Gene Expression Data Using Closed Itemset Mining Algorithm," IPSG transactions in bioinformatics, vol.48, pp39-48, 2007. [21] A. B. Tchagang and A. H. Tewfik, "DNAMicroarray Data Analysis: A Novel Biclustering Algorithm Approach," EURASIP Journal on Applied Signal Processing, vol. 2006, pp. 1-12.