Performance Improvement of Gene Selection Methods using Outlier Modification Rule

Author(s): Md. Shahjaman* , Nishith Kumar , Md. Nurul Haque Mollah .

Journal Name: Current Bioinformatics

Volume 14 , Issue 6 , 2019

Become EABM
Become Reviewer

Graphical Abstract:


Background: DNA microarray technology allows researchers to measure the expression levels of thousands of genes simultaneously. The main objective of microarray gene expression (GE) data analysis is to detect biomarker genes that are Differentially Expressed (DE) between two or more experimental groups/conditions.

Objective: There are some popular statistical methods in the literature for the selection of biomarker genes. However, most of them often produce misleading results in presence of outliers. Therefore, in this study, we introduce a robust approach to overcome the problems of classical methods.

Methods: We use median and median absolute deviation (MAD) for our robust procedure. In this procedure, a gene was considered as outlying gene if at least one of the expressions of this gene does not belong to a certain interval of the proposed outlier detection rule. Otherwise, this gene was considered as a non-outlying gene.

Results: We investigate the performance of the proposed method in a comparison of the traditional method using both simulated and real gene expression data analysis. From a real colon cancer gene expression data analysis, the proposed method detected an additional fourteen (14) DE genes that were not detected by the traditional methods. Using the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis, we observed that these additional 14 DE genes are involved in three important metabolic pathways of cancer disease. The proposed method also detected nine (9) additional DE genes from another head-and-neck cancer gene expression data analysis; those involved in top ten metabolic pathways obtain from the KEGG pathway database.

Conclusion: The simulation as well as real cancer gene expression datasets results show better performance with our proposed procedure. Therefore, the additional genes detected by the proposed procedure require further wet lab validation.

Keywords: Gene expression data, outlier detection and modification, DE gene, MAD and robustness, KEGG.

Efron B, Tibshirani R, Storey J, Tusher V. Empirical Bayes analysis of a microarray experiment. J Am Stat Assoc 2001; 96: 1151-60.
Pawitan Y, Michiels S, Koscielny S, Gusnanto A, Ploner A. False discovery rate, sensitivity and sample size for microarray studies. Bioinformatics 2005; 21(13): 3017-24.
Kerr MK, Martin M, Churchill GA. Analysis of variance for gene expression microarray data. J Comput Biol 2000; 7(6): 819-37.
De Bin R, Risso D. A novel approach to the clustering of microarray data via nonparametric density estimation. BMC Bioinformatics 2011; 12: 49.
Kendziorski CM, Newton MA, Lan H, Gould MN. On parametric empirical Bayes methods for comparing multiple groups using replicated gene expression profiles. Stat Med 2003; 22(24): 3899-914.
Newton MA, Kendziorski CM. Parametric empirical Bayes methods for microarrays. New York: Springer 2003; pp. 254-71.
Gottardo R, Raftery AE, Yeung KY, Bumgarner RE. Bayesian robust inference for differential gene expression in microarrays with multiple samples. Biometrics 2006; 62(1): 10-8.
Kruskal WH, Wallis WA. Use of Ranks in One-Criterion Variance Analysis. J Am Stat Assoc 1952; 47: 583-621.
Wilcoxon F. Individual comparisons by ranking methods. Biom Bull 1945; 1(6): 80-3.
Jeanmougin M, de Reynies A, Marisa L, Paccard C, Nuel G, Guedj M. Should we abandon the t-test in the analysis of gene expression microarray data: A comparison of variance modeling strategies. PLoS One 2010; 5(9)e12336
Wright GW, Simon RM. A random variance model for detection of differential gene expression in small microarray experiments. Bioinformatics 2003; 19(18): 2448-55.
Tusher VG, Tibshirani R, Chu G. Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA 2001; 98(9): 5116-21.
Smyth GK. Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol 2004; 3(1)e3
Dembélé D, Kastner P. Fold change rank ordering statistics: A new method for detecting differentially expressed genes. BMC Bioinformatics 2014; 15: 14.
Zihua Y, Zheng R. Detection of non-structural outliers for microarray experiments. International Joint Conference on Neural Networks (IJCNN). 2014 July 6-11; Beijing, China.
Huber P. Robust Statistics. 2nd ed. New York: John Wiley & Sons 2004.
Basu A, Harris IR, Hjort NL, Jones MC. Robust and efficient estimation by minimizing a density power divergence. Biometrika 1998; 85: 549-59.
Hampel F, Ronchetti E, Rousseeuw P, Stahel W. Robust Statistics: The Approach Based on Influence Functions. 2nd ed. New York: John Wiley & Sons 1986.
Benjamini Y, Hochberg Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J R Stat Soc B 1995; 57: 289-300.
Alon U, Barkai N, Notterman DA, et al. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci USA 1999; 96(12): 6745-50.
Kuriakose M, Chen WT, He ZM, et al. Selection and validation of differentially expressed genes in head and neck cancer. Cell. Mol. Life Sci. 2004 61(11): 1372:1383.

Rights & PermissionsPrintExport Cite as

Article Details

Year: 2019
Page: [491 - 503]
Pages: 13
DOI: 10.2174/1574893614666181126110008
Price: $58

Article Metrics

PDF: 25