Generic placeholder image

Combinatorial Chemistry & High Throughput Screening


ISSN (Print): 1386-2073
ISSN (Online): 1875-5402

Research Article

iSP-RAAC: Identify Secretory Proteins of Malaria Parasite Using Reduced Amino Acid Composition

Author(s): Haoyue Zhang, Qilemuge Xi, Shenghui Huang, Lei Zheng, Wuritu Yang* and Yongchun Zuo*

Volume 23 , Issue 6 , 2020

Page: [536 - 545] Pages: 10

DOI: 10.2174/1386207323666200402084518

Price: $65


Background: As the pathogen of malaria, malaria parasite secretes a variety of proteins for its growth and reproduction.

Objective: The identification of the secretory proteins of malaria parasite has crucial reference significance for the development of anti-malaria vaccines as well as medicine.

Methods: In this study, a computational classification method was developed to identify the secreted proteins of Plasmodium. Amino acid composition, dipeptide composition, and tripeptide composition as well as reduced amino acids alphabets were proposed to illuminate protein sequences. We further used SVM to train and predict respectively and optimized the features.

Results: 74 types of reduced amino acids alphabets were employed to predict secretory proteins. The results showed that the accuracy improved to 91.67% with 0.84 Mathew’s correlation coefficient (MCC) by dipeptide composition, and the highest prediction accuracy reached 92.26% after feature selection, which demonstrated that our method is prominent and reliable in the field of malaria parasite secreted proteins prediction.

Conclusion: A intuitive web server iSP-RAAC ( was established for the convenience of most experimental scientists.

Keywords: Secretory proteins, reduced amino acids alphabets, dipeptide composition, prediction, malaria parasite, antimalarial vaccines.

Hay, S.I.; Guerra, C.A.; Tatem, A.J.; Noor, A.M.; Snow, R.W. The global distribution and population at risk of malaria: past, present, and future. Lancet Infect. Dis., 2004, 4(6), 327-336.
[] [PMID: 15172341]
de Koning-Ward, T.F.; Gilson, P.R.; Boddey, J.A.; Rug, M.; Smith, B.J.; Papenfuss, A.T.; Sanders, P.R.; Lundie, R.J.; Maier, A.G.; Cowman, A.F.; Crabb, B.S. A newly discovered protein export machine in malaria parasites. Nature, 2009, 459(7249), 945-949.
[] [PMID: 19536257]
Birkholtz, L.M.; Blatch, G.; Coetzer, T.L.; Hoppe, H.C.; Human, E.; Morris, E.J.; Ngcete, Z.; Oldfield, L.; Roth, R.; Shonhai, A.; Stephens, L.; Louw, A.I. Heterologous expression of plasmodial proteins for structural studies and functional annotation. Malar. J., 2008, 7, 197.
[] [PMID: 18828893]
Anfinsen, C.B. The formation and stabilization of protein structure. Biochem. J., 1972, 128(4), 737-749.
[] [PMID: 4565129]
Jiang, Y.; Oron, T.R.; Clark, W.T.; Bankapur, A.R.; D’Andrea, D.; Lepore, R.; Funk, C.S.; Kahanda, I.; Verspoor, K.M.; Ben-Hur, A.; Koo, C.E.; Penfold-Brown, D.; Shasha, D.; Youngs, N.; Bonneau, R.; Lin, A.; Sahraeian, S.M.; Martelli, P.L.; Profiti, G.; Casadio, R.; Cao, R.; Zhong, Z.; Cheng, J.; Altenhoff, A.; Skunca, N.; Dessimoz, C.; Dogan, T.; Hakala, K.; Kaewphan, S.; Mehryary, F.; Salakoski, T.; Ginter, F.; Fang, H.; Smithers, B.; Oates, M.; Gough, J.; Törönen, P.; Koskinen, P.; Holm, L.; Chen, C.T.; Hsu, W.L.; Bryson, K.; Cozzetto, D.; Minneci, F.; Jones, D.T.; Chapman, S.; Bkc, D.; Khan, I.K.; Kihara, D.; Ofer, D.; Rappoport, N.; Stern, A.; Cibrian-Uhalte, E.; Denny, P.; Foulger, R.E.; Hieta, R.; Legge, D.; Lovering, R.C.; Magrane, M.; Melidoni, A.N.; Mutowo-Meullenet, P.; Pichler, K.; Shypitsyna, A.; Li, B.; Zakeri, P.; ElShal, S.; Tranchevent, L.C.; Das, S.; Dawson, N.L.; Lee, D.; Lees, J.G.; Sillitoe, I.; Bhat, P.; Nepusz, T.; Romero, A.E.; Sasidharan, R.; Yang, H.; Paccanaro, A.; Gillis, J.; Sedeño-Cortés, A.E.; Pavlidis, P.; Feng, S.; Cejuela, J.M.; Goldberg, T.; Hamp, T.; Richter, L.; Salamov, A.; Gabaldon, T.; Marcet-Houben, M.; Supek, F.; Gong, Q.; Ning, W.; Zhou, Y.; Tian, W.; Falda, M.; Fontana, P.; Lavezzo, E.; Toppo, S.; Ferrari, C.; Giollo, M.; Piovesan, D.; Tosatto, S.C.; Del Pozo, A.; Fernández, J.M.; Maietta, P.; Valencia, A.; Tress, M.L.; Benso, A.; Di Carlo, S.; Politano, G.; Savino, A.; Rehman, H.U.; Re, M.; Mesiti, M.; Valentini, G.; Bargsten, J.W.; van Dijk, A.D.; Gemovic, B.; Glisic, S.; Perovic, V.; Veljkovic, V.; Veljkovic, N. Almeida-E-Silva, D.C.; Vencio, R.Z.; Sharan, M.; Vogel, J.; Kansakar, L.; Zhang, S.; Vucetic, S.; Wang, Z.; Sternberg, M.J.; Wass, M.N.; Huntley, R.P.; Martin, M.J.; O’Donovan, C.; Robinson, P.N.; Moreau, Y.; Tramontano, A.; Babbitt, P.C.; Brenner, S.E.; Linial, M.; Orengo, C.A.; Rost, B.; Greene, C.S.; Mooney, S.D.; Friedberg, I.; Radivojac, P. An expanded evaluation of protein function prediction methods shows an improvement in accuracy. Genome Biol., 2016, 17(1), 184.
[] [PMID: 27604469]
Cao, R.; Bhattacharya, D.; Hou, J.; Cheng, J. DeepQA: improving the estimation of single protein model quality with deep belief networks. BMC Bioinformatics, 2016, 17(1), 495.
[] [PMID: 27919220]
Zuo, Y.; Lv, Y.; Wei, Z.; Yang, L.; Li, G.; Fan, G. iDPF-PseRAAAC: a web-server for identifying the defensin peptide family and subfamily using pseudo reduced amino acid alphabet composition. PLoS One, 2015, 10(12)e0145541
[] [PMID: 26713618]
Rahman, M.S.; Shatabda, S.; Saha, S.; Kaykobad, M.; Rahman, M.S. DPP-PseAAC: A DNA-binding protein prediction model using Chou’s general PseAAC. J. Theor. Biol., 2018, 452, 22-34.
[] [PMID: 29753757]
Feng, P.M.; Ding, H.; Chen, W.; Lin, H. Naïve Bayes classifier with feature selection to identify phage virion proteins. Comput. Math. Methods Med., 2013, 2013530696
[] [PMID: 23762187]
Stephenson, N.; Shane, E.; Chase, J.; Rowland, J.; Ries, D.; Justice, N.; Zhang, J.; Chan, L.; Cao, R. Survey of machine learning techniques in drug discovery. Curr. Drug Metab., 2019, 20(3), 185-193.
[] [PMID: 30124147]
Lee, B.J.; Shin, M.S.; Oh, Y.J.; Oh, H.S.; Ryu, K.H. Identification of protein functions using a machine-learning approach based on sequence-derived properties. Proteome Sci., 2009, 7, 27.
[] [PMID: 19664241]
Wei, L.; Xing, P.; Shi, G.; Ji, Z.L.; Zou, Q. Fast prediction of protein methylation sites using a sequence-based feature selection technique. EEE/ACM Trans. Comput. Biol. Bioinform., 2019, 16(4), 1264-1273.
Ding, H.; Li, D. Identification of mitochondrial proteins of malaria parasite using analysis of variance. Amino Acids, 2015, 47(2), 329-333.
[] [PMID: 25385313]
Gardner, M.J.; Hall, N.; Fung, E.; White, O.; Berriman, M.; Hyman, R.W.; Carlton, J.M.; Pain, A.; Nelson, K.E.; Bowman, S.; Paulsen, I.T.; James, K.; Eisen, J.A.; Rutherford, K.; Salzberg, S.L.; Craig, A.; Kyes, S.; Chan, M.S.; Nene, V.; Shallom, S.J.; Suh, B.; Peterson, J.; Angiuoli, S.; Pertea, M.; Allen, J.; Selengut, J.; Haft, D.; Mather, M.W.; Vaidya, A.B.; Martin, D.M.; Fairlamb, A.H.; Fraunholz, M.J.; Roos, D.S.; Ralph, S.A.; McFadden, G.I.; Cummings, L.M.; Subramanian, G.M.; Mungall, C.; Venter, J.C.; Carucci, D.J.; Hoffman, S.L.; Newbold, C.; Davis, R.W.; Fraser, C.M.; Barrell, B. Genome sequence of the human malaria parasite Plasmodium falciparum. Nature, 2002, 419(6906), 498-511.
[] [PMID: 12368864]
Verma, R.; Tiwari, A.; Kaur, S.; Varshney, G.C.; Raghava, G.P. Identification of proteins secreted by malaria parasite into erythrocyte using SVM and PSSM profiles. BMC Bioinformatics, 2008, 9, 201.
[] [PMID: 18416838]
Zuo, Y.C.; Li, Q.Z. Using K-minimum increment of diversity to predict secretory proteins of malaria parasite based on groupings of amino acids. Amino Acids, 2010, 38(3), 859-867.
[] [PMID: 19387791]
Lin, W-Z.; Fang, J-A.; Xiao, X.; Chou, K-C. Predicting secretory proteins of malaria parasite by incorporating sequence evolution information into pseudo amino acid composition via grey system model. PLoS One, 2012, 7(11), e49040-e49040.
[] [PMID: 23189138]
Feng, Y.E. Identify secretory protein of malaria parasite with modified quadratic discriminant algorithm and amino acid composition. Interdiscip. Sci., 2016, 8(2), 156-161.
[] [PMID: 26286010]
Hua, T.; Zhang, C.; Rong, C.; Huang, P.; Duan, C.; Ping, Z. Identification of secretory proteins of malaria parasite by feature selection technique. Lett. Org. Chem., 2017, 14, 1-1.
Nanni, L.; Lumini, A. A genetic approach for building different alphabets for peptide and protein classification. BMC Bioinformatics, 2008, 9, 45.
[] [PMID: 18218100]
Chou, K.C.; Com, M.P. Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins, 2001, 43(3), 246-255.
[] [PMID: 11288174]
Brendel, V. PROSET—a fast procedure to create non-redundant sets of protein sequences. Math. Comput. Model., 1992, 16, 37-43.
Yong-Chun, Z.; Qian-Zhong, L.J.A.A. Using K-minimum increment of diversity to predict secretory proteins of malaria parasite based on groupings of amino acids. Amino Acids, 2010, 38, 859-867.
[ 10.1007/s00726-009-0292-1] [PMID: 19387791]
Regan, L.; DeGrado, W.F. Characterization of a helical protein designed from first principles. Science, 1988, 241(4868), 976-978.
[] [PMID: 3043666]
Weathers, E.A.; Paulaitis, M.E.; Woolf, T.B.; Hoh, J.H. Reduced amino acid alphabet is sufficient to accurately recognize intrinsically disordered protein. FEBS Lett., 2004, 576(3), 348-352.
[] [PMID: 15498561]
Li, J.; Wang, W. Grouping of amino acids and recognition of protein structurally conserved regions by reduced alphabets of amino acids. Sci. China C Life Sci., 2007, 50, 392-402.
Oğul, H.; Mumcuoğu, E.U. Subcellular localization prediction with new protein encoding schemes. IEEE/ACM Trans. Comput. Biol. Bioinform., 2007, 4(2), 227-232.
[] [PMID: 17473316]
Zheng, L.; Huang, S.; Mu, N.; Zhang, H.; Zhang, J.; Chang, Y.; Yang, L.; Zuo, Y. RAACBook: a web server of reduced amino acid alphabet for sequence-dependent inference by using Chou’s five-step rule. Database (Oxford), 2019, 2019baz131
[] [PMID: 31802128]
Liu, X.; Liu, D.; Qi, J.; Zheng, W.M. Simplified amino acid alphabets based on deviation of conditional probability from random background. Phys. Rev. E Stat. Nonlin. Soft Matter Phys., 2002, 66(2 Pt 1)021906
[] [PMID: 12241213]
Liu, B.; Xu, J.; Lan, X.; Xu, R.; Zhou, J.; Wang, X.; Chou, K.C. iDNA-Prot|dis: identifying DNA-binding proteins by incorporating amino acid distance-pairs and reduced alphabet profile into the general pseudo amino acid composition. PLoS One, 2014, 9(9)e106691
[] [PMID: 25184541]
Esteve, J.G.; Falceto, F. A general clustering approach with application to the Miyazawa-Jernigan potentials for amino acids. Proteins, 2004, 55(4), 999-1004.
[] [PMID: 15146496]
Murphy, L.R.; Wallqvist, A.; Levy, R.M. Simplified amino acid alphabets for protein fold recognition and implications for folding. Protein Eng., 2000, 13(3), 149-152.
[] [PMID: 10775656]
Etchebest, C.; Benros, C.; Bornot, A.; Camproux, A.C.; de Brevern, A.G. A reduced amino acid alphabet for understanding and designing protein adaptation to mutation. Eur. Biophys. J., 2007, 36(8), 1059-1069.
[] [PMID: 17565494]
Stephenson, J.D.; Freeland, S.J. Unearthing the root of amino acid similarity. J. Mol. Evol., 2013, 77(4), 159-169.
[] [PMID: 23743923]
Nakashima, H.; Nishikawa, K. Discrimination of intracellular and extracellular proteins using amino acid composition and residue-pair frequencies. J. Mol. Biol., 1994, 238(1), 54-61.
[] [PMID: 8145256]
Tan, J-X.; Li, S-H.; Zhang, Z-M.; Chen, C-X.; Chen, W.; Tang, H.; Lin, H. Identification of hormone binding proteins based on machine learning methods. Math. Biosci. Eng., 2019, 16(4), 2466-2480.
[] [PMID: 31137222]
Zhu, X-J.; Feng, C-Q.; Lai, H-Y.; Chen, W.; Hao, L. Predicting protein structural classes for low-similarity sequences by evaluating different features. Knowl. Base. Syst., 2019, 163, 787-793.
Yang, W.; Zhu, X-J.; Huang, J.; Ding, H.; Lin, H. A brief survey of machine learning methods in protein sub-golgi localization. Curr. Bioinform., 2019, 14, 234-240.
Cao, R.; Freitas, C.; Chan, L.; Sun, M.; Jiang, H.; Chen, Z. ProLanGO: protein function prediction using neural machine translation based on a recurrent neural network. Molecules, 2017, 22(10), 1732.
[] [PMID: 29039790]
Chen, Z.; Zhao, P.; Li, F.; Leier, A.; Marquez-Lago, T.T.; Wang, Y.; Webb, G.I.; Smith, A.I.; Daly, R.J.; Chou, K.C.; Song, J. iFeature: a Python package and web server for features extraction and selection from protein and peptide sequences. Bioinformatics, 2018, 34(14), 2499-2502.
[] [PMID: 29528364]
Cristianini, N. An introduction to support vector machines and other kernel-based learning methods. Kybernetes, 2001, 32, 1-28.
Cao, R.; Wang, Z.; Wang, Y.; Cheng, J. SMOQ: a tool for predicting the absolute residue-specific quality of a single protein model with support vector machines. BMC Bioinformatics, 2014, 15, 120.
[] [PMID: 24776231]
Zhang, M.; Li, F.; Marquez-Lago, T.T.; Leier, A.; Fan, C.; Kwoh, C.K.; Chou, K.C.; Song, J.; Jia, C. MULTiPly: a novel multi-layer predictor for discovering general and specific types of promoters. Bioinformatics, 2019, 35(17), 2957-2965.
[] [PMID: 30649179]
Song, J.; Burrage, K.; Yuan, Z.; Huber, T. Prediction of cis/trans isomerization in proteins using PSI-BLAST profiles and secondary structure information. BMC Bioinformatics, 2006, 7, 124.
[] [PMID: 16526956]
Song, J.; Yuan, Z.; Tan, H.; Huber, T.; Burrage, K. Predicting disulfide connectivity from protein sequence using multiple sequence feature vectors and secondary structure. Bioinformatics, 2007, 23(23), 3147-3154.
[] [PMID: 17942444]
Altman, N.S. An introduction to kernel and nearest-neighbor nonparametric regression. Am. Stat., 1992, 46, 175-185.
Strobl, C.; Boulesteix, A.L.; Kneib, T.; Augustin, T.; Zeileis, A.J.B.B. Conditional variable importance for random forests. BMC Bioinformatics, 2008, 9, 307-307.
Pedregosa, F.; Varoquaux, G.l.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V. Scikit-learn: machine learning in python. J. Mach. Learn. Res., 2013, 12, 2825-2830.
Tang, H.; Chen, W.; Lin, H. Identification of immunoglobulins using Chou’s pseudo amino acid composition with feature selection technique. Mol. Biosyst., 2016, 12(4), 1269-1275.
[] [PMID: 26883492]
Zhang, J.; Liu, B. A review on the recent developments of sequence-based protein feature extraction methods. Curr. Bioinform., 2019, 14, 190-199.
Qu, K.; Wei, L.; Zou, Q. A review of DNA-binding proteins prediction methods. Curr. Bioinform., 2019, 14, 246-254.
Chen, Z.; Zhao, P.; Li, F.; Marquez-Lago, T.T.; Leier, A.; Revote, J.; Zhu, Y.; Powell, D.R.; Akutsu, T.; Webb, G.I.; Chou, K.C.; Smith, A.I.; Daly, R.J.; Li, J.; Song, J. iLearn: an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data. Brief. Bioinform., 2019, 2019bbz041
[] [PMID: 31067315]
Li, F.; Zhang, Y.; Purcell, A.W.; Webb, G.I.; Chou, K.C.; Lithgow, T.; Li, C.; Song, J. Positive-unlabelled learning of glycosylation sites in the human proteome. BMC Bioinformatics, 2019, 20(1), 112.
[] [PMID: 30841845]
Mei, S.; Li, F.; Leier, A.; Marquez-Lago, T.T.; Giam, K.; Croft, N.P.; Akutsu, T.; Smith, A.I.; Li, J.; Rossjohn, J.; Purcell, A.W.; Song, J. A comprehensive review and performance evaluation of bioinformatics tools for HLA class I peptide-binding prediction. Brief. Bioinform., 2019, 2019bbz051
[] [PMID: 31204427]
Song, J.; Wang, Y.; Li, F.; Akutsu, T.; Rawlings, N.D.; Webb, G.I.; Chou, K.C. iProt-Sub: a comprehensive package for accurately mapping and predicting protease-specific substrates and cleavage sites. Brief. Bioinform., 2019, 20(2), 638-658.
[] [PMID: 29897410]
Feng, P.M.; Lin, H.; Chen, W. Identification of antioxidants from sequence information using naïve Bayes. Comput. Math. Methods Med., 2013, 2013567529
[] [PMID: 24062796]
Efron, B. Bootstrap methods: another look at the jackknife. Ann. Stat., 1979, 7, 1-26.
Kearns, M.; Ron, D. Algorithmic stability and sanity-check bounds for leave-one-out cross-validation. Neural Comput., 1999, 11(6), 1427-1453.
[] [PMID: 10423502]
Tang, H.; Zhao, Y.W.; Zou, P.; Zhang, C.M.; Chen, R.; Huang, P.; Lin, H. HBPred: a tool to identify growth hormone-binding proteins. Int. J. Biol. Sci., 2018, 14(8), 957-964.
[] [PMID: 29989085]
Yang, H.; Tang, H.; Chen, X.X.; Zhang, C.J.; Zhu, P.P.; Ding, H.; Chen, W.; Lin, H. Identification of secretory proteins in Mycobacterium tuberculosis using pseudo amino acid composition. BioMed Res. Int., 2016, 20165413903
[] [PMID: 27597968]
Chung, Y.G.; Matoba, S.; Liu, Y.; Eum, J.H.; Lu, F.; Jiang, W.; Lee, J.E.; Sepilian, V.; Cha, K.Y.; Lee, D.R.; Zhang, Y. Histone demethylase expression enhances human somatic cell nuclear transfer efficiency and promotes derivation of pluripotent stem cells. Cell Stem Cell, 2015, 17(6), 758-766.
[] [PMID: 26526725]
Li, F.; Li, C.; Revote, J.; Zhang, Y.; Webb, G.I.; Li, J.; Song, J.; Lithgow, T. GlycoMinestruct: a new bioinformatics tool for highly accurate mapping of the human N-linked and O-linked glycoproteomes by incorporating structural features. Sci. Rep., 2016, 6, 34595.
[] [PMID: 27708373]
Wu, Z.; Pan, S.; Chen, F.; Long, G.; Zhang, C.; Yu, P.S. A comprehensive survey on graph neural networks. arXiv preprint arXiv:1901.00596, 2019.
Chen, Z.; Zhao, P.; Li, F.; Wang, Y.; Smith, A.I.; Webb, G.I.; Akutsu, T.; Baggag, A.; Bensmail, H.; Song, J. Comprehensive review and assessment of computational methods for predicting RNA post-transcriptional modification sites from RNA sequences. Brief. Bioinform., 2019, 2019bbz112
[] [PMID: 31714956]
Li, F.; Li, C.; Marquez-Lago, T.T.; Leier, A.; Akutsu, T.; Purcell, A.W.; Ian Smith, A.; Lithgow, T.; Daly, R.J.; Song, J.; Chou, K.C. Quokka: a comprehensive tool for rapid and accurate prediction of kinase family-specific phosphorylation sites in the human proteome. Bioinformatics, 2018, 34(24), 4223-4231.
[] [PMID: 29947803]
Liang, Z.Y.; Lai, H.Y.; Yang, H.; Zhang, C.J.; Yang, H.; Wei, H.H.; Chen, X.X.; Zhao, Y.W.; Su, Z.D.; Li, W.C.; Deng, E.Z.; Tang, H.; Chen, W.; Lin, H. Pro54DB: a database for experimentally verified sigma-54 promoters. Bioinformatics, 2017, 33(3), 467-469.
[PMID: 28171531]
Li, F.; Fan, C.; Marquez-Lago, T.T.; Leier, A.; Revote, J.; Jia, C.; Zhu, Y.; Smith, A.I.; Webb, G.I.; Liu, Q.; Wei, L.; Li, J.; Song, J. PRISMOID: a comprehensive 3D structure database for post translational modifications and mutations with functional impact. Brief. Bioinform., 2019, 2019bbz050
[PMID: 31161204]
Liu, D.; Li, G.; Zuo, Y. Function determinants of TET proteins: the arrangements of sequence motifs with specific codes. Brief. Bioinform., 2019, 20(5), 1826-1835.
[PMID: 29947743]

Rights & Permissions Print Export Cite as
© 2022 Bentham Science Publishers | Privacy Policy