Performance Improvement of Gene Selection Methods using Outlier Modification Rule
Background: DNA microarray technology allows researchers to measure the expression
levels of thousands of genes simultaneously. The main objective of microarray gene expression
(GE) data analysis is to detect biomarker genes that are Differentially Expressed (DE) between two
or more experimental groups/conditions.
Objective: There are some popular statistical methods in the literature for the selection of biomarker
genes. However, most of them often produce misleading results in presence of outliers.
Therefore, in this study, we introduce a robust approach to overcome the problems of classical
Methods: We use median and median absolute deviation (MAD) for our robust procedure. In this
procedure, a gene was considered as outlying gene if at least one of the expressions of this gene
does not belong to a certain interval of the proposed outlier detection rule. Otherwise, this gene
was considered as a non-outlying gene.
Results: We investigate the performance of the proposed method in a comparison of the traditional
method using both simulated and real gene expression data analysis. From a real colon cancer gene
expression data analysis, the proposed method detected an additional fourteen (14) DE genes that
were not detected by the traditional methods. Using the Kyoto Encyclopedia of Genes and Genomes
(KEGG) pathway enrichment analysis, we observed that these additional 14 DE genes are
involved in three important metabolic pathways of cancer disease. The proposed method also detected
nine (9) additional DE genes from another head-and-neck cancer gene expression data analysis;
those involved in top ten metabolic pathways obtain from the KEGG pathway database.
Conclusion: The simulation as well as real cancer gene expression datasets results show better
performance with our proposed procedure. Therefore, the additional genes detected by the proposed
procedure require further wet lab validation.Journal Title: