Background: Various regularization methods have been proposed to improve the prediction
accuracy in cancer diagnosis. Elastic net regularized logistic regression has been widely
adopted for cancer classification and gene selection in genetics and molecular biology but is commonly
applied to binary classification and regression. However, usually, the cancer subtypes can
be more, and most likely cannot be decided precisely.
Objective: Besides the multi-class issue, the feature selection problem is also a critical problem for
cancer subtype classification.
Methods: An Elastic Net Regularized Softmax Regression (ENRSR) for multi-classification is put
forward to tackle the multiple classification issue. As an extension of elastic net regularized logistic
regression, ENRSR enforces structure sparsity and ‘grouping effect’ for gene selection based
on gene expression data, which may exhibit high correlation. The sparsity structure and ‘grouping
effect’ help to select more propriate discriminable features for multi-classification.
Result: It is demonstrated that ENRSR gains more accurate and robust performance compared to
the other 6 competing algorithms (K-means, Hierarchical Clustering, Expectation Maximization,
Nonnegative Matrix Factorization, Support Vector Machine and Random Forest) in predicting
cancer subtypes both on simulation data and real cancer gene expression data in terms of F measure.
Conclusion: Our proposed ENRSR method is a reliable regularized softmax regression for multisubtype