Info: Zenodo’s user support line is staffed on regular business days between Dec 23 and Jan 5. Response times may be slightly longer than normal.

Published February 2, 2015 | Version 10000619
Journal article Open

Imputation Technique for Feature Selection in Microarray Data Set

Description

Analyzing DNA microarray data sets is a great
challenge, which faces the bioinformaticians due to the complication
of using statistical and machine learning techniques. The challenge
will be doubled if the microarray data sets contain missing data,
which happens regularly because these techniques cannot deal with
missing data. One of the most important data analysis process on
the microarray data set is feature selection. This process finds the
most important genes that affect certain disease. In this paper, we
introduce a technique for imputing the missing data in microarray
data sets while performing feature selection.

Files

10000619.pdf

Files (183.8 kB)

Name Size Download all
md5:b658ee42d79cb25f653b2822cfa23870
183.8 kB Preview Download

Additional details

References

  • Ash A Alizadeh, Michael B Eisen, R Eric Davis, Chi Ma, Izidore S Lossos, Andreas Rosenwald, Jennifer C Boldrick, Hajeer Sabet, Truc Tran, Xin Yu, et al. Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling. Nature, 403(6769):503–511, 2000.
  • V Bol´on-Canedo, N S´anchez-Maro˜no, A Alonso-Betanzos, JM Ben´ıtez, and F Herrera. A review of microarray datasets and applied feature selection methods. Information Sciences, 282:111–135, 2014.
  • L´ıgia P Br´as and Jos´e C Menezes. Improving cluster-based missing value estimation of dna microarray data. Biomolecular engineering, 24(2):273–282, 2007.
  • Magalie Celton, Alain Malpertuy, Ga¨elle Lelandais, and Alexandre G De Brevern. Comparative analysis of missing value imputation methods to improve clustering and interpretation of microarray experiments. BMC genomics, 11(1):15, 2010.
  • Kyriacos Chrysostomou, M Lee, SY Chen, and X Liu. Wrapper feature selection., 2009.
  • Alexandre G De Brevern, Serge Hazout, and Alain Malpertuy. Influence of microarrays experiments missing values on the stability of gene groups by hierarchical clustering. BMC bioinformatics, 5(1):114, 2004.
  • Chris Ding and Hanchuan Peng. Minimum redundancy feature selection from microarray gene expression data. Journal of bioinformatics and computational biology, 3(02):185–205, 2005.
  • Eibe Frank, Mark Hall, Len Trigg, Geoffrey Holmes, and Ian H Witten. Data mining in bioinformatics using weka. Bioinformatics, 20(15):2479–2481, 2004.
  • Rebecka J¨ornsten, Hui-Yu Wang, William J Welsh, and Ming Ouyang. Dna microarray data imputation and significance analysis of differential expression. Bioinformatics, 21(22):4155–4161, 2005. [10] Hyunsoo Kim, Gene H Golub, and Haesun Park. Missing value estimation for dna microarray gene expression data: local least squares imputation. Bioinformatics, 21(2):187–198, 2005. [11] Ki-Yeol Kim, Byoung-Jin Kim, and Gwan-Su Yi. Reuse of imputed data in microarray analysis increases imputation efficiency. BMC bioinformatics, 5(1):160, 2004. [12] Alan Wee-Chung Liew, Ngai-Fong Law, and Hong Yan. Missing value imputation for gene expression data: computational techniques to recover missing data from available information. Briefings in bioinformatics, 12(5):498–513, 2011. [13] Rosa J Meijer and Jelle J Goeman. Efficient approximate k-fold and leave-one-out cross-validation for ridge regression. Biometrical Journal, 55(2):141–155, 2013. [14] Douglas C Montgomery, Elizabeth A Peck, and G Geoffrey Vining. Introduction to linear regression analysis, volume 821. John Wiley & Sons, 2012. [15] Shigeyuki Oba, Masa-aki Sato, Ichiro Takemasa, Morito Monden, Ken-ichi Matsubara, and Shin Ishii. A bayesian missing value estimation method for gene expression profile data. Bioinformatics, 19(16):2088–2096, 2003. [16] Yvan Saeys, I˜naki Inza, and Pedro Larra˜naga. A review of feature selection techniques in bioinformatics. bioinformatics, 23(19):2507–2517, 2007. [17] Henning Schmidt and Mats Jirstrand. Systems biology toolbox for matlab: a computational platform for research in systems biology. Bioinformatics, 22(4):514–515, 2006. [18] Muhammad Shoaib B Sehgal, Iqbal Gondal, and Laurence Dooley. Statistical neural networks and support vector machine for the classification of genetic mutations in ovarian cancer. In Computational Intelligence in Bioinformatics and Computational Biology, 2004. CIBCB'04. Proceedings of the 2004 IEEE Symposium on, pages 140–146. IEEE, 2004. [19] Olga Troyanskaya, Michael Cantor, Gavin Sherlock, Pat Brown, Trevor Hastie, Robert Tibshirani, David Botstein, and Russ B Altman. Missing value estimation methods for dna microarrays. Bioinformatics, 17(6):520–525, 2001. [20] Teruyuki Ueda, Masao Honda, Katsuhisa Horimoto, Sachiyo Aburatani, Shigeru Saito, Taro Yamashita, Yoshio Sakai, Mikiko Nakamura, Hajime Takatori, Hajime Sunagozaka, et al. Gene expression profiling of hepatitis b-and hepatitis c-related hepatocellular carcinoma using graphical gaussian modeling. Genomics, 101(4):238–248, 2013. [21] Xiaobai Zhang, Xiaofeng Song, Huinan Wang, and Huanping Zhang. Sequential local least squares imputation estimating missing value of microarray data. Computers in biology and medicine, 38(10):1112–1120, 2008.