A Novel Gene Selection Algorithm based on Sparse Representation and Minimum-redundancy Maximum-relevancy of Maximum Compatibility Center

Author(s): Min Chen, Yi Zhang*, Zejun Li*, Ang Li*, Wenhua Liu, Liubin Liu, Zheng Chen.

Journal Name: Current Proteomics

Volume 16 , Issue 5 , 2019

Become EABM
Become Reviewer

Graphical Abstract:


Background: Tumor classification is important for accurate diagnosis and personalized treatment and has recently received great attention. Analysis of gene expression profile has shown relevant biological significance and thus has become a research hotspot and a new challenge for bio-data mining. In the research methods, some algorithms can identify few genes but with great time complexity, some algorithms can get small time complex methods but with unsatisfactory classification accuracy, this article proposed a new extraction method for gene expression profile.

Methods: In this paper, we propose a classification method for tumor subtypes based on the Minimum- Redundancy Maximum-Relevancy (MRMR) of maximum compatibility center. First, we performed a fuzzy clustering of gene expression profiles based on the compatibility relation. Next, we used the sparse representation coefficient to assess the importance of the gene for the category, extracted the top-ranked genes, and removed the uncorrelated genes. Finally, the MRMR search strategy was used to select the characteristic gene, reject the redundant gene, and obtain the final subset of characteristic genes.

Results: Our method and four others were tested on four different datasets to verify its effectiveness. Results show that the classification accuracy and standard deviation of our method are better than those of other methods.

Conclusion: Our proposed method is robust, adaptable, and superior in classification. This method can help us discover the susceptibility genes associated with complex diseases and understand the interaction between these genes. Our technique provides a new way of thinking and is important to understand the pathogenesis of complex diseases and prevent diseases, diagnosis and treatment.

Keywords: Algorithm, bioinformatics, biomarkers, tumorigenesis, accuracy, spectrum.

Lamb, J.; Crawford, E.D.; Peck, D.; Modell, J.W.; Blat, I.C.; Wrobel, M.J.; Lerner, J.; Brunet, J-P.; Subramanian, A.; Ross, K.N. The connectivity map: using gene-expression signatures to connect small molecules, genes, and disease. Science, 2006, 313(5795), 1929-1935.
Huang, G.; Zhou, H.; Li, Y.; Xu, L. Alignment-free comparison of genome sequences by a new numerical characterization. J. Theor. Biol., 2011, 281(1), 107-112.
Li, X. A fast and exhaustive method for heterogeneity and epistasis analysis based on multi-objective optimization. Bioinformatics, 2017, 33(18), 2829-2836.
Emilsson, V.; Thorleifsson, G.; Zhang, B.; Leonardson, A.S.; Zink, F.; Zhu, J.; Carlson, S.; Helgason, A.; Walters, G.B.; Gunnarsdottir, S. Genetics of gene expression and its effect on disease. Nature, 2008, 452(7186), 423-428.
Huang, G.; Chu, C.; Huang, T.; Kong, X.; Zhang, Y.; Zhang, N.; Cai, Y-D. Exploring mouse protein function via multiple approaches. PLoS One, 2016, 11(11)e0166580
Golub, T.R.; Slonim, D.K.; Tamayo, P.; Huard, C.; Gaasenbeek, M.; Mesirov, J.P.; Coller, H.; Loh, M.L.; Downing, J.R.; Caligiuri, M.A. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 1999, 286(5439), 531-537.
Guyon, J.W.; Barnhill, S.; Vapnik, V. Gene selection for cancer classification using support vector machines. Mach. Learn., 2002, 46(1), 389-422.
Wang, S-L.; Li, X.; Zhang, S.; Gui, J.; Huang, D-S. Tumor classification by combining PNN classifier ensemble with neighborhood rough set based gene reduction. Comput. Biol. Med., 2010, 40(2), 179-189.
Xu, Y.; Selaru, F.M.; Yin, J.; Zou, T.T.; Shustova, V.; Mori, Y.; Sato, F.; Liu, T.C.; Olaru, A.; Wang, S. Artificial neural networks and gene filtering distinguish between global gene expression profiles of Barrett’s esophagus and esophageal cancer. Cancer Res., 2002, 62(12), 3493-3497.
Huang, G.; Li, J. Feature extractions for computationally predicting protein post-translational modifications. Curr. Bioinform., 2018, 13(4), 387-395.
Wang, S-L.; Sun, L.; Fang, J. Molecular cancer classification using a meta-sample-based regularized robust coding method. BMC Bioinformatics, 2014, 15(15), S2.
Jirapech-Umpai, T.; Aitken, S. Feature selection and classification for microarray data analysis: evolutionary methods for identifying predictive genes. BMC Bioinformatics, 2005, 6(1), 148.
Seiler, M.C.; Seiler, F.A. Numerical recipes in C: the art of scientific computing. Risk Anal., 1989, 9(3), 415-416.
Ding, C.H. In Analysis of gene expression profiles: class discovery and leaf ordering. Proc. 6th Annu. Int. Conf. Comput. Biol. ACM, 2002, pp. 127-136.
Ruan, X-G.; Chao, H. Selection of feature genes in cancer clsssification. Cont. Engr. China, 2007, 14(4), 373-375.
Arfin, S.M.; Long, A.D.; Ito, E.T.; Tolleri, L.; Riehle, M.M.; Paegle, E.S.; Hatfield, G.W. Global gene expression profiling in Esherichia coli K12. The effects of integration host factor. J. Biol. Chem., 2000, 275(38), 29672-29684.
Tanaka, T.S.; Jaradat, S.A.; Lim, M.K.; Kargul, G.J.; Wang, X.; Grahovac, M.J.; Pantano, S.; Sano, Y.; Piao, Y.; Nagaraja, R. Genome-wide expression profiling of mid-gestation placenta and embryo using a 15,000 mouse developmental cDNA microarray. Proc. Natl. Acad. Sci. USA, 2000, 97(16), 9127-9132.
Hsu, W.H. Genetic wrappers for feature selection in decision tree induction and variable ordering in Bayesian network structure learning. Inf. Sci., 2004, 163(1), 103-122.
Tabus, I.; Astola, J. On the use of MDL principle in gene expression prediction. J. Appl. Signal Process., 2001, 2001(1), 297-303.
Furey, T.S.; Cristianini, N.; Duffy, N.; Bednarski, D.W.; Schummer, M.; Haussler, D. Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics, 2000, 16(10), 906-914.
Xiong, M.; Fang, X.; Zhao, J. Biomarker identification by feature wrappers. Genome Res., 2001, 11(11), 1878-1887.
Haferlach, T.; Kohlmann, A.; Wieczorek, L.; Basso, G.; Kronnie, G.T.; Béné, M.C.; De Vos, J.; Hernández, J.M.; Hofmann, W.K.; Mills, K.I. Clinical utility of microarray-based gene expression profiling in the diagnosis and subclassification of leukemia: report from the international microarray innovations in leukemia study group. J. Clin. Oncol., 2010, 28(15), 2529-2537.
Mav, D.; Shah, R.R.; Howard, B.E.; Auerbach, S.S.; Bushel, P.R.; Collins, J.B.; Gerhold, D.L.; Judson, R.S.; Karmaus, A.L.; Maull, E.A.; Mendrick, D.L.; Merrick, B.A.; Sipes, N.S.; Svoboda, D.; Paules, R.S. A hybrid gene selection approach to create the S1500+ targeted gene sets for use in high-throughput transcriptomics. PLoS One, 2018, 13(2)e0191105
Aggarwal, A.; Jamwal, M.; Viswanathan, G.K.; Sharma, P.; Singh Sachdeva, M.U.; Bansal, D.; Malhotra, P.; Das, R. Optimal reference gene selection for expression studies in human reticulocytes. J. Mol. Diagn., 2018, 20(3), 326-333.
Sun, L.; Zhang, X.; Xu, J.; Wang, W.; Liu, R. A gene selection approach based on the fisher linear discriminant and the neighborhood rough set. Bioengineered, 2018, 9(1), 144-151.
Das, S.; Rai, A.; Mishra, D.C.; Rai, S.N. Statistical approach for selection of biologically informative genes. Gene, 2018, 655, 71-83.
Xu, J.; Mu, H.; Wang, Y.; Huang, F. Feature genes selection using supervised locally linear embedding and correlation coefficient for microarray classification. Comput. Math. Methods Med., 2018, 2018, 11.
Kaya, M. Bilge, H.S.Classification of pancreas tumor dataset using adaptive weighted k nearest neighbor algorithm; IEEE Int. Sympos. Innovat. Intell. Syst. Appl. Proc, 2014, pp. 253-257.
Huang, G. A novel neighborhood model to predict protein function from protein-protein interaction data. Curr. Proteomics, 2014, 11(4), 237-244.
Huang, G.; Zhou, Y.; Zhang, Y.; Li, B.Q.; Zhang, N.; Cai, Y.D. Prediction of carbamylated lysine sites based on the one-class k-nearest neighbor method. Mol. Biosyst., 2013, 9(11), 2729-2740.
Khan, J.; Wei, J.S.; Ringner, M.; Saal, L.H.; Ladanyi, M.; Westermann, F.; Berthold, F.; Schwab, M.; Antonescu, C.R.; Peterson, C. Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat. Med., 2001, 7(6), 673-679.
Liu, B.; Cui, Q.; Jiang, T.; Ma, S. A combinational feature selection and ensemble neural network method for classification of gene expression data. BMC Bioinformatics, 2004, 5(1), 136.
Zhou, X.; Wang, X.; Dougherty, E.R. A Bayesian approach to nonlinear probit gene selection and classification. J. Franklin Inst., 2004, 341(1), 137-156.
Cawley, G.C.; Talbot, N.L. Gene selection in cancer classification using sparse logistic regression with Bayesian regularization. Bioinformatics, 2006, 22(19), 2348-2355.
Donoho, D.L.; Huo, X. Uncertainty principles and ideal atomic decomposition. IEEE Trans. Inf. Theory, 2001, 47(7), 2845-2862.
Candès, E.J.; Romberg, J.; Tao, T. Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theory, 2006, 52(2), 489-509.
Tibshirani, R. Regression shrinkage and selection via the lasso. J. Royal Statist. Soc. Series B, 1996, 267-288.
Hang, X.; Wu, F.X. Sparse representation for classification of tumors using gene expression data. J. Biomed. Biotechnol., 2009, 2009403689
Hang, X. In Multiclass gene selection on microarray data using l1-norm least square regression. Int. Joint Conf. Bioinform. Syst. Biol. Intell. Comput, 2009, , pp. 52-55.
Zheng, C.H.; Zhang, L.; Ng, T.Y.; Shiu, C.K.; Huang, D.S. Metasample-based sparse representation for tumor classification. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 2011, 8(5), 1273-1282.
Gan, B.; Zheng, C.H.; Zhang, J.; Wang, H.Q. Sparse representation for tumor classification based on feature extraction using latent low-rank representation. BioMed Res. Int., 2014, 2014, 7.
Cai, R.; Hao, Z.; Yang, X.; Huang, H. A new hybrid method for gene selection. Pattern Anal. Appl., 2011, 14(1), 1-8.
Guyon, I.; Elisseeff, A. An introduction to variable and feature selection. J. Mach. Learn. Res., 2003, 3, 1157-1182.
Kannan, S.S.; Ramaraj, N. A novel hybrid feature selection via symmetrical Uncertainty ranking based local memetic search algorithm. Knowl. Base. Syst., 2010, 23(6), 580-585.
Huang, G.; Lu, L.; Feng, K.; Zhao, J.; Zhang, Y.; Xu, Y.; Zhang, N.; Li, B.Q.; Huang, W.; Cai, Y.D. Prediction of S-nitrosylation modification sites based on kernel sparse representation classification and mRMR algorithm. BioMed Res. Int., 2014, 2014438341
Peng, H.; Long, F.; Ding, C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell., 2005, 27(8), 1226-1238.
Chen, M.; He, X.; Duan, S.; Deng, Y. A novel gene selection method based on sparse representation and max-relevance and min-redundancy. Comb. Chem. High Throughput Screen., 2017, 20(2), 158-163.
Zadeh, L.A. Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets Syst., 1978, 1(1), 3-28.
Zadeh, L.A. Toward a generalized theory of uncertainty (GTU)-an outline. Inf. Sci., 2005, 172(1), 1-40.
Guan, Y.Y.; Wang, H.K. Set-valued information systems. Inf. Sci., 2006, 176(17), 2507-2525.
Singh, D.; Febbo, P.G.; Ross, K.; Jackson, D.G.; Manola, J.; Ladd, C.; Tamayo, P.; Renshaw, A.A.; D’Amico, A.V.; Richie, J.P. Gene expression correlates of clinical prostate cancer behavior. Cancer Cell, 2002, 1(2), 203-209.
Shipp, M.A.; Ross, K.N.; Tamayo, P.; Weng, A.P.; Kutok, J.L.; Aguiar, R.C.; Gaasenbeek, M.; Angelo, M.; Reich, M.; Pinkus, G.S. Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat. Med., 2002, 8(1), 68-74.
Armstrong, S.A.; Staunton, J.E.; Silverman, L.B.; Pieters, R.; den Boer, M.L.; Minden, M.D.; Sallan, S.E.; Lander, E.S.; Golub, T.R.; Korsmeyer, S.J. MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nat. Genet., 2002, 30(1), 41-47.
Wang, S.L.; Zhu, Y.H.; Jia, W.; Huang, D.S. Robust classification method of tumor subtype by using correlation filters. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 2012, 9(2), 580-591.
Leung, Y.; Hung, Y. A multiple-filter-multiple-wrapper approach to gene selection and microarray data classification. IEEE/ACM trans. Comput. Boil. Bioinform., 2010, 7(1), 108-117.
Kononenko, I. In Estimating attributes: analysis and extensions of RELIEF., European conference on machine learning, Springer. 1994, pp. 171-182.

Rights & PermissionsPrintExport Cite as

Article Details

Year: 2019
Page: [374 - 382]
Pages: 9
DOI: 10.2174/1570164616666190123144020
Price: $58

Article Metrics

PDF: 21