Protein Subcellular Localization Prediction based on PSI-BLAST Profile and Principal Component Analysis

Author(s): Yuhua Yao* , Manzhi Li , Huimin Xu , Shoujiang Yan , Pingan He , Qi Dai , Zhaohui Qi , Bo Liao .

Journal Name: Current Proteomics

Volume 16 , Issue 5 , 2019

Become EABM
Become Reviewer

Graphical Abstract:


Abstract:

Background: Prediction of protein subcellular location is a meaningful task which attracts much attention in recent years. Particularly, the number of new protein sequences yielded by the highthroughput sequencing technology in the post genomic era has increased explosively.

Objective: Protein subcellular localization prediction based solely on sequence data remains to be a challenging problem of computational biology.

Methods: In this paper, three sets of evolutionary features are derived from the position-specific scoring matrix, which has shown great potential in other bioinformatics problems. A fusion model is built up by the optimal parameters combination. Finally, principal component analysis and support vector machine classifier is applied to predict protein subcellular localization on NNPSL dataset and Cell- PLoc 2.0 dataset.

Results: Our experimental results show that the proposed method remarkably improved the prediction accuracy, and the features derived from PSI-BLAST profile only are appropriate for protein subcellular localization prediction.

Keywords: Evolutionary information, feature representation, principal component analysis, support vector machine, homology, database.

[1]
Li, L.Q.; Yu, S.J.; Xiao, W.D.; Li, Y.S.; Li, M.L.; Huang, L.; Zheng, X.Q.; Zhou, S.W.; Yang, H. Prediction of bacterial protein subcellular localization by incorporating various features into Chou’s PseAAC and a backward feature selection approach. Biochimie, 2014, 104(1), 100-107.
[2]
Chou, K.C. Structural bioinformatics and its impact to biomedical science. Curr. Med. Chem., 2004, 11, 2105-2134.
[3]
Lubec, G.; Afjehi-Sadat, L.; Yang, J.W.; John, J.P.P. Searching for hypothetical proteins: theory and practice based upon original data and literature. Prog. Neurobiol., 2005, 77, 90-127.
[4]
Cai, Y.D.; He, J.F.; Li, X.L.; Feng, K.Y.; Lu, L.; Feng, K.R.; Kong, X.Y.; Lu, W.C. Prediction of protein subcellular locations with feature selection and analysis. Protein Pept. Lett., 2011, 17, 464-472.
[5]
Chen, J.; Xu, H.M.; He, P.A.; Dai, Q.; Yao, Y.H. A multiple information fusion method for predicting subcellular locations of two different types of bacterial protein simultaneously. Biosystems, 2016, 139, 37-45.
[6]
Zhang, S.L.; Jin, J. Prediction of protein subcellular localization by using λ-order factor and principal component analysis. Lett. Org. Chem., 2017, 14, 717-724.
[7]
Dehzanqi, A.; Sohrabi, S.; Heffernan, R.; Sharma, A.; Lyons, J.; Paliwal, K.; Sattar, A. Gram-positive and gram-negative protein subcellular localization by incorporating evolutionary-based descriptors into Chou’s general PseAAC. J. Theor. Biol., 2015, 364, 284-294.
[8]
Zhang, S.L.; Liang, Y.Y.; Bai, Z.G. A novel reduced triplet composition based method to predict apoptosis protein subcellular localization. MATCH Commun. Math. Comput. Chem, 2015, 73, 559-571.
[9]
Nakashima, H.; Nishikawa, K. Discrimination of intracellular and extracellular proteins using amino acid composition and residue-pair frequencies. J. Mol. Biol., 1994, 238, 54-61.
[10]
Cedano, J.; Aloy, P. PerezPons, J.A.; Querol, E. Relation between amino acid composition and cellular location of proteins. J. Mol. Biol., 1997, 266, 594-600.
[11]
Zhu, P.P.; Li, W.C.; Zhong, Z.J.; Deng, E.Z.; Ding, H.; Chen, W.; Lin, H. Predicting the subcellular localization of mycobacterial proteins by incorporating the optimal tripeptides into the general form of pseudo amino acid composition. Mol. Biosyst., 2015, 11, 558-563.
[12]
Chou, K.C.; Shen, H.B. Cell-PLoc: a package of web servers for predicting subcellular localization of proteins in various organisms. Nat. Protoc., 2008, 3, 153-162.
[13]
Wan, S.B.; Mak, M.W.; Kung, S.Y. GOASVM: a subcellular location predictor by incorporating term-frequency gene ontology into the general form of Chou’s pseudo-amino acid composition. J. Theor. Biol., 2013, 323, 40-48.
[14]
Chou, K.C.; Shen, H.B. Gpos-mPLoc: a top-down approach to improve the quality of predicting subcellular localization of gram-positive bacterial proteins. Protein Pept. Lett., 2009, 16, 1478-1484.
[15]
Chou, K.C.; Wu, Z.C.; Xiao, X. iLoc-Euk: a multi-label classifier for predicting the subcellular localization of singleplex and multiplex eukaryotic proteins. PLoS One, 2011, 6e18258
[16]
Chou, K.C.; Cai, Y.D. Prediction of protein subcellular locations by GO-FunD-PseAA predictor. Biochem. Biophys. Res. Commun., 2004, 320, 1236-1239.
[17]
Apweiler, R.; Attwood, T.K.; Bairoch, A.; Bateman, A.; Birney, E.; Biswas, M.; Bucher, P.; Cerutti, L.; Corpet, F.; Croning, M.D.; Durbin, R.; Falquet, L.; Fleischmann, W.; Gouzy, J.; Hermjakob, H.; Hulo, N.; Jonassen, I.; Kahn, D.; Kanapin, A.; Karavidopoulou, Y.; Lopez, R.; Marx, B.; Mulder, N.J.; Oinn, T.M.; Pagni, M.; Servant, F.; Sigrist, C.J.; Zdobnov, E.M. The InterPro database, an integrated documentation resource for protein families, domains and functional sites. Nucleic Acids Res., 2001, 29(1), 37-40.
[18]
Chou, K.C.; Cai, Y.D. Using functional domain composition and support vector machines for prediction of protein subcellular location. J. Biol. Chem., 2002, 277, 45765-45769.
[19]
Altschul, S.F.; Gish, W.; Miller, W.; Myers, E.W.; Lipman, D.J. Basic local alignment search tool. J. Mol. Biol., 1990, 215, 403-410.
[20]
Jones, D.T. Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol., 1999, 292, 195-202.
[21]
Xie, D.; Li, A.; Wang, M.; Fan, Z.; Feng, H. LOCSVMPSI: a web server for subcellular localization of eukaryotic proteins using SVM and profile of PSI-BLAST. Nucleic Acids Res., 2005, 33, W105-W110.
[22]
Reinhardt, A.; Hubbard, T. Using neural networks for prediction of the subcellular location of proteins. Nucleic Acids Res., 1998, 26, 2230-2236.
[23]
Rost, B.; Fariselli, P.; Casadio, R. Topology prediction for helical transmembrane proteins at 86% accuracy-topology prediction at 86% accuracy. Protein Sci., 1996, 5, 1704-1718.
[24]
Hirokawa, T.; Boon-Chieng, S.; Mitaku, S. SOSUI: classification and secondary structure prediction system for membrane proteins. Bioinformatics, 1998, 14, 378-379.
[25]
Lio, P.; Vannucci, M. Wavelet change-point prediction of transmembrane proteins. Bioinformatics, 2000, 16, 376-382.
[26]
Niu, B.; Jin, Y.H.; Feng, K.Y.; Lu, W.C.; Cai, Y.D.; Li, G.Z. Using AdaBoost for the prediction of subcellular location of prokaryotic and eukaryotic proteins. Mol. Divers., 2008, 12, 41-45.
[27]
Chou, K.C.; Shen, H.B. Cell-PLoc 2.0: an improved package of web-servers for predicting subcellular localization of proteins in various. Nat. Sci., 2010, 2, 1090-1103.
[28]
Liu, T.G.; Geng, X.B.; Zheng, X.Q.; Li, R.S.; Wang, J. Accurate prediction of protein structural class using auto covariance transformation of PSI-BLAST profiles. Amino Acids, 2012, 42, 2243-2249.
[29]
Stephenson, J.D.; Freeland, S.J. Unearthing the root of amino acid similarity. J. Mol. Evol., 2013, 77(4), 159-169.
[30]
Pearson, K. On lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosoph. Magaz. J. Sci., 1901, 6, 559-572.
[31]
Keeley, R.J.; McDonald, R.J. Principal component analysis: bridging the gap between strain, sex and drug effects. Behav. Brain Res., 2015, 15, 192-198.
[32]
Jian, G.; Zhang, Y.; Qian, P. Prediction of subcellular localization for apoptosis protein: approached with a novel representation and support vector machine. MATCH Commun. Math. Comput. Chem, 2012, 67, 867-878.
[33]
Shi, Z.X.; Dai, Q.; He, P.N.; Yao, Y.H.; Liao, B. Subcellular localization prediction of apoptosis proteins based on the data mining for amino acid index database. Int. Conf. Syst. Biol., 2013, pp. 43-48.
[34]
Mohabatkar, H.; Beigi, M.M.; Abdolahi, K.; Mohsenzadeh, S. Prediction of allergenic proteins and a machine learning approach. Med. Chem., 2013, 9, 133-137.
[35]
Yuan, Z. Better prediction of protein contact number using a support vector regression analysis of amino acid sequence. BMC Bioinformatics, 2005, 6, 248.
[36]
Hua, S.; Sun, Z. Support vector machine approach for protein subcellular localization prediction. Bioinformatics, 2001, 17, 721-728.
[37]
Yuan, Z. Prediction of protein subcellular locations using Markov chain models. FEBS Lett., 1999, 451, 23-26.
[38]
Chou, K.C.; Elrod, D.W. Using discriminant function for prediction of subcellular location of prokaryotic proteins. Biochem. Biophys. Res. Commun., 1998, 252, 63-68.
[39]
Chou, K.C.; Cai, Y.D. A new hybrid approach to predict subcellular localization of proteins by incorporating gene ontology. Biochem. Biophys. Res. Commun., 2003, 311, 743-747.
[40]
Chou, K.C. Prediction of protein subcellular locations by incorporating quasi-sequence-order effect. Biochem. Biophys. Res. Commun., 2000, 278, 477-483.
[41]
Feng, Z.P. Prediction of the subcellular location of prokaryotic proteins based on a new representation of the amino acid composition. Biopolymers, 2001, 58, 491-499.
[42]
Feng, Z.P.; Zhang, C.T. Prediction of the subcellular location of prokaryotic proteins based on the hydrophobicity index of amino acids. Int. J. Biol. Macromol., 2001, 28, 255-261.
[43]
Chou, K.C. Prediction of tight turns and their types in proteins. Anal. Biochem., 2000, 286, 1-16.


Rights & PermissionsPrintExport Cite as

Article Details

VOLUME: 16
ISSUE: 5
Year: 2019
Page: [402 - 414]
Pages: 13
DOI: 10.2174/1570164616666190126155744
Price: $58

Article Metrics

PDF: 36
HTML: 2