Efficient Gene Selection for Cancer Prognostic Biomarkers Using Swarm Optimization and Survival Analysis

Author(s): Raul Aguirre-Gamboa, Emmanuel Martinez-Ledesma, Hugo Gomez-Rueda, Rebeca Palacios, Isabel Fuentes-Hernandez, Emilio Sánchez-Canales, Rafael Chacolla-Huaringa, Servando Cardona-Huerta, Luis Villela, Sean-Patrick Scott, Jose Tamez-Pena, Victor Trevino

Journal Name: Current Bioinformatics

Volume 11 , Issue 3 , 2016

Become EABM
Become Reviewer


The discovery of molecular prognostic cancer biomarkers is still a major scientific challenge. Some methodologies have been proposed to generate novel model biomarkers for clinical outcome using gene expression as predictors but involve some drawbacks. For example, (i) they heavily depend on a rank of the initial univariate relation to survival times, (ii) are unable to generate compact multivariate predictors, (iii) are based on survival models other than Cox, or (iv) use aggregation and transformations of expression values instead of the gene expression directly. These issues complicate the evaluation of biomarkers in clinical trials, its implementation in medical practice and obscures its biological association with cancer.

We propose a particle swarm optimization search engine coupled to multivariate Cox survival model fitting, constraining the number of genes while minimizing for deviance residuals to identify prognostic biomarkers cancer. By evaluating the concordance index, Log-rank, correlation, the integrated discrimination improvement per feature and the number of variables significantly associated to survival times, we show that many compact and highly predictive models can be found for six cancer datasets and a simulated cohort. We also show that our algorithm generates a competitive population of multivariate models with a wide variety of gene combinations, including genes that could not be found by a univariate methodology. In comparisons with other methods such as LASSO, Ridge, and Elastic Net, our algorithm shows similar or better results.

We conclude that our algorithm generates highly predictive and compact models for clinical outcomes with a unique gene content, and a superior or comparable prediction to other current feature selection methods. R and Java code are available in Supplementary Information and http://bioinformatica.mty.itesm.mx/?q=coxswarm.

Keywords: Clinical outcome, microarrays, gene expression, feature selection, biomarkers.

Rights & PermissionsPrintExport Cite as

Article Details

Year: 2016
Page: [310 - 323]
Pages: 14
DOI: 10.2174/1574893611999160610125628
Price: $65

Article Metrics

PDF: 34