Title:Identification and Analysis of Cancer Diagnosis Using Probabilistic Classification Vector Machines with Feature Selection
VOLUME: 13 ISSUE: 6
Author(s):Xiuquan Du*, Xinrui Li, Wen Li, Yuanting Yan and Yanping Zhang
Affiliation:Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Anhui University, Anhui, School of Computer Science and Technology, Anhui University, Anhui, School of Computer Science and Technology, Anhui University, Anhui, Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Anhui University, Anhui, Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Anhui University, Anhui
Keywords:Probabilistic classification vector, feature selection, tumor classification, DX, machine learning, kernel function.
Abstract:Background: The accurate classification of tumors types is mainly important for the
treatment of cancer. With the progress of the microarray expression profile, many methods are
proposed to deal with these data. However, because of the feature dimension of tumor gene expression
profile is very high; many machine learning algorithms are failure.
Objective & Methods: In this paper, a novel method named probabilistic classification vector
machines (PCVM) with feature selection is proposed for tumor types detection using gene expression
data, PCVM adopt a signed and truncated Gaussian prior to solve the problem of unstable solutions
caused, and the complexity of the model can be controlled by the truncated Gaussian prior. The
performance of PCVM is evaluated on two datasets by using four metrics.
Results: This method achieves 84.21% accuracy and 95.24 % accuracy in the leukemia and prostrate
dataset respectively. As compared to other methods, PCVM obtain much higher performance than
Support Vector Machines (SVM), Naïve Bayes (NB), RBF Neural Networks (RBF), K-nearest
Neighbor (KNN), and Random Forest (RF) except SVM on Prostate dataset. In order to reduce
computational time, we adopt a feature selection method (DX) to rank the features and search the
optimal feature combination based on PCVM, PCVM with DX method (PCVM-DX) achieves 94.74%
accuracy, 100% sensitivity, 85.71% specificity and 92.31% precision on the leukemia dataset. PCVMDX
method obtained the same result as PCVM on the prostate dataset. We also compare DX with
other feature selection method; the result reveals that the PCVM-DX is efficient for tumor
classification in terms of performance.
Conclusion: PCVM-DX is observed to be better than the other methods in two data sets. The novelty
of this approach lies in applying PCVM to tackle the same prior for different classes may lead to
unstable solutions by RVMs and also exploring the important feature subset in the microarray
expression profile with feature selection.