Background: Protein subcellular localization is closely related to its function, and also
maintains highly ordered cell guarantee for normal operation of the system. Studies of protein subcellular
localization are very helpful to understand the properties and functions of protein, understand the
interaction between proteins and regulation mechanism, understand the pathogenesis of some diseases
and develop new drug. However, the traditional biological experiments are both time consuming and
costly. Therefore, development of fast and effective machine learning method for predicting protein
subcellular localization is very necessary.
Method: We propose a new method about extracting features based on pseudo amino acid composition
called λ-order factor method. At the same time, we combine principal component analysis with our
proposed method. Thus, not only protein sequences' physicochemical properties have been considered,
but also sub-sequences sort information. Meanwhile, this measure eliminates duplicate information and
reduces the dimension of feature vectors. Finally, the SVM and the10-fold cross validation test are employed
to predict and evaluate the method on three benchmark datasets: ZD98, ZW225 and CL317.
Results: With comprehensive comparison of the current state-of-the-art methods, the proposed method
achieves superior performance. The overall successful rate of ZD98, ZW225 and CL317 datasets is
90.8%, 85.3% and 89.6%, respectively. The results show that our method has a better classification
performance than others.
Conclusion: The numerical results show that our model successfully extracts the protein sequences'
physicochemical information and sort information based on pseudo amino acid composition (Pse-
AAC), and provides a reliable PseAAC-based method as a potential candidate for apoptosis protein
subcellular localization prediction.