The Development of Machine Learning Methods in Cell-Penetrating Peptides Identification: A Brief Review

Author(s): Huan-Huan Wei, Wuritu Yang*, Hua Tang, Hao Lin*.

Journal Name: Current Drug Metabolism

Volume 20 , Issue 3 , 2019

Graphical Abstract:


Background: Cell-penetrating Peptides (CPPs) are important short peptides that facilitate cellular intake or uptake of various molecules. CPPs can transport drug molecules through the plasma membrane and send these molecules to different cellular organelles. Thus, CPP identification and related mechanisms have been extensively explored. In order to reveal the penetration mechanisms of a large number of CPPs, it is necessary to develop convenient and fast methods for CPPs identification.

Methods: Biochemical experiments can provide precise details for accurately identifying CPP, but these methods are expensive and laborious. To overcome these disadvantages, several computational methods have been developed to identify CPPs. We have performed review on the development of machine learning methods in CPP identification. This review provides an insight into CPP identification.

Results: We summarized the machine learning-based CPP identification methods and compared the construction strategies of 11 different computational methods. Furthermore, we pointed out the limitations and difficulties in predicting CPPs.

Conclusion: In this review, the last studies on CPP identification using machine learning method were reported. We also discussed the future development direction of CPP recognition with computational methods.

Keywords: Cell-penetrating peptide, machine learning method, prediction, membrane, modeling, uptake efficiency.

Zou, Q.; Li, X.; Jiang, Y.; Zhao, Y.; Wang, G. BinMemPredict: A Web server and software for predicting membrane protein types. Curr. Proteomics, 2013, 10(1), 2-9.
Liu, H.; Zeng, F.; Zhang, M.; Huang, F.; Wang, J.; Guo, J.; Liu, C.; Wang, H. Emerging landscape of cell penetrating peptide in reprogramming and gene editing. J. Control. Release, 2016, 226, 124-137.
Milletti, F. Cell-penetrating peptides: Classes, origin, and current landscape. Drug Discov. Today, 2012, 17(15), 850-860.
Green, M.; Loewenstein, P.M. Autonomous functional domains of chemically synthesized human immunodeficiency virus tat trans-activator protein. Cell, 1988, 55(6), 1179-1188.
Frankel, A.D.; Pabo, C.O. Cellular uptake of the tat protein from human immunodeficiency virus. Cell, 1988, 55(6), 1189-1193.
McKeown, A.N.; Naro, J.L.; Huskins, L.J.; Almeida, P.F. A thermodynamic approach to the mechanism of cell-penetrating peptides in model membranes. Biochemistry, 2011, 50(5), 654-662.
Guidotti, G.; Brambilla, L.; Rossi, D. Cell-penetrating peptides: From basic research to clinics. Trends Pharmacol. Sci., 2017, 38(4), 406-424.
Agrawal, P.; Bhalla, S.; Usmani, S.S.; Singh, S.; Chaudhary, K.; Raghava, G.P.; Gautam, A. CPPsite 2.0: A repository of experimentally validated cell-penetrating peptides. Nucleic Acids Res., 2016, 44(D1), D1098-D1103.
Hällbrink, M.; Kilk, K.; Elmquist, A.; Lundberg, P.; Lindgren, M.; Jiang, Y.; Pooga, M.; Soomets, U.; Langel, Ü. Prediction of cell-penetrating peptides. Int. J. Pept. Res. Ther., 2005, 11(4), 249-259.
Gautam, A.; Chaudhary, K.; Kumar, R.; Sharma, A.; Kapoor, P.; Tyagi, A. Open Source Drug Discovery Consortium. Raghava, G.P. In silico approaches for designing highly effective cell penetrating peptides. J. Transl. Med., 2013, 11, 74.
Diener, C.; Garza Ramos Martínez, G.; Moreno Blas, D.; Castillo González, D.A.; Corzo, G.; Castro-Obregon, S.; Del Rio, G. Effective design of multifunctional peptides by combining compatible functions. PLOS Comput. Biol., 2016, 12(4), e1004786.
Hansen, M.; Kilk, K.; Langel, U. Predicting cell-penetrating peptides. Adv. Drug Deliv. Rev., 2008, 60(4-5), 572-579.
Sanders, W.S.; Johnston, C.I.; Bridges, S.M.; Burgess, S.C.; Willeford, K.O. Prediction of cell penetrating peptides by support vector machines. PLOS Comput. Biol., 2011, 7(7), e1002101.
Holton, T.A.; Pollastri, G.; Shields, D.C.; Mooney, C. CPPpred: Prediction of cell penetrating peptides. Bioinformatics, 2013, 29(23), 3094-3096.
Chen, L.; Chu, C.; Huang, T.; Kong, X.; Cai, Y.D. Prediction and analysis of cell-penetrating peptides using pseudo-amino acid composition and random forest models. Amino Acids, 2015, 47(7), 1485-1493.
Tang, H.; Su, Z.D.; Wei, H.H.; Chen, W.; Lin, H. Prediction of cell-penetrating peptides with feature selection techniques. Biochem. Biophys. Res. Commun., 2016, 477(1), 150-154.
Wei, L.; Xing, P.; Su, R.; Shi, G.; Ma, Z.S.; Zou, Q. CPPred-RF: A sequence-based predictor for identifying cell-penetrating peptides and their uptake efficiency. J. Proteome Res., 2017, 16(5), 2044-2053.
Dobchev, D.A.; Mager, I.; Tulp, I.; Karelson, G.; Tamm, T.; Tamm, K.; Janes, J.; Langel, U.; Karelson, M. Prediction of cell-penetrating peptides using artificial neural networks. Curr. Comput. Aided Drug Des., 2010, 6(2), 79-89.
Provost, F. In: Machine learning from imbalanced data sets 101, Proceedings of the AAAI’2000 workshop on imbalanced data sets, Austin, Texas, July 31. 2000.
Cheng, J.H.; Yang, H.; Liu, M.L.; Su, W.; Feng, P.M.; Ding, H.; Chen, W.; Lin, H. Prediction of bacteriophage proteins located in the host cell using hybrid features. Chemometr. Intell. Lab., 2018, 180, 64-69.
Tang, H.; Zhao, Y.W.; Zou, P.; Zhang, C.M.; Chen, R.; Huang, P.; Lin, H. HBPred: A tool to identify growth hormone-binding proteins. Int. J. Biol. Sci., 2018, 14(8), 957-964.
Liu, B.R.; Huang, Y-W.; Aronstam, R.S.; Lee, H-J. Identification of a short cell-penetrating peptide from bovine lactoferricin for intracellular delivery of DNA in human A549 cells. PLoS One, 2016, 11(3), e0150439.
Wei, L.; Tang, J.; Zou, Q. SkipCPP-Pred: An improved and promising sequence-based predictor for predicting cell-penetrating peptides. BMC Genomics, 2017, 18(7), 742.
Altschul, S.F.; Madden, T.L.; Schäffer, A.A.; Zhang, J.; Zhang, Z.; Miller, W.; Lipman, D.J. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res., 1997, 25(17), 3389-3402.
Huang, Y.; Niu, B.; Gao, Y.; Fu, L.; Li, W. CD-HIT Suite: A web server for clustering and comparing biological sequences. Bioinformatics, 2010, 26(5), 680-682.
Cao, R.; Cheng, J. Protein single-model quality assessment by feature-based probability density functions. Sci. Rep., 2016, 6, 23990.
Cao, R.; Cheng, J. Integrated protein function prediction by mining function associations, sequences, and protein-protein and gene-gene interaction networks. Methods, 2016, 93, 84-91.
Tang, H.; Cao, R.; Wang, W.; Liu, T.; Wang, L.; He, C. A two-step discriminated method to identify thermophilic proteins. Int. J. Biomath., 2017, 4, 123-130.
Peng, H.; Long, F.; Ding, C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell., 2005, 27(8), 1226-1238.
Zou, Q.; Zeng, J.; Cao, L.; Ji, R. A novel features ranking metric with application to scalable visual and bioinformatics data classification. Neurocomputing, 2016, 173, 346-354.
Zou, Q.; Wan, S.; Ju, Y.; Tang, J.; Zeng, X. Pretata: predicting TATA binding proteins with novel features and dimensionality reduction strategy. BMC Syst. Biol., 2016, 10(4), 114.
Chen, X.X.; Tang, H.; Li, W.C.; Wu, H.; Chen, W.; Ding, H.; Lin, H. Identification of bacterial cell wall lyases via pseudo amino acid composition. BioMed Res. Int., 2016, 2016, 1654623.
Zhao, Y.W.; Su, Z.D.; Yang, W.; Lin, H.; Chen, W.; Tang, H. IonchanPred 2.0: A tool to predict ion channels and their types. Int. J. Mol. Sci., 2017, 18(9), pii E1838.
Lai, H.Y.; Chen, X.X.; Chen, W.; Tang, H.; Lin, H. Sequence-based predictive modeling to identify cancerlectins. Oncotarget, 2017, 8(17), 28169-28175.
Yang, H.; Tang, H.; Chen, X.X.; Zhang, C.J.; Zhu, P.P.; Ding, H.; Chen, W.; Lin, H. Identification of secretory proteins in mycobacterium tuberculosis using pseudo amino acid composition. BioMed Res. Int., 2016, 2016, 5413903.
Tang, H.; Zou, P.; Zhang, C.; Chen, R.; Chen, W.; Lin, H. Identification of apolipoprotein using feature selection technique. Sci. Rep., 2016, 6, 30441.
Tang, H.; Chen, W.; Lin, H. Identification of immunoglobulins using Chou’s pseudo amino acid composition with feature selection technique. Mol. Biosyst., 2016, 12(4), 1269-1275.
Cao, R.; Adhikari, B.; Bhattacharya, D.; Sun, M.; Hou, J.; Cheng, J. QAcon: Single model quality assessment using protein structural and contact information with machine learning techniques. Bioinformatics, 2017, 33(4), 586-588.
Su, Z.D.; Huang, Y.; Zhang, Z.Y.; Zhao, Y.W.; Wang, D.; Chen, W.; Chou, K.C.; Lin, H. iLoc-lncRNA: Predict the subcellular location of lncRNAs by incorporating octamer composition into general PseKNC. Bioinformatics, 2018, 34(24), 4196-4204.
Chen, W.; Yang, H.; Feng, P.; Ding, H.; Lin, H. iDNA4mC: Identifying DNA N4-methylcytosine sites based on nucleotide chemical properties. Bioinformatics, 2017, 33(22), 3518-3523.
Zhao, Y.W.; Lai, H.Y.; Tang, H.; Chen, W.; Lin, H. Prediction of phosphothreonine sites in human proteins by fusing different features. Sci. Rep., 2016, 6, 34817.
Tan, J.X.; Dao, F.Y.; Lv, H.; Feng, P.M.; Ding, H. Identifying phage virion proteins by using two-step feature selection methods. Molecules, 2018, 23(8), 2000.
Li, W.C.; Deng, E.Z.; Ding, H.; Chen, W.; Lin, H. iORI-PseKNC: A predictor for identifying origin of replication with pseudo k-tuple nucleotide composition. Chemometr. Intell. Lab., 2015, 141, 100-106.
Yang, H.; Lv, H.; Ding, H.; Chen, W.; Lin, H. iRNA-2OM: A sequence-based predictor for identifying 2′-O-methylation sites in Homo sapiens. J. Comput. Biol., 2018, 25(11), 1266-1277.
Li, D.; Ju, Y.; Zou, Q. Protein folds prediction with hierarchical structured SVM. Curr. Proteomics, 2016, 13(2), 79-85.
Chen, W.; Feng, P.; Ding, H.; Lin, H. Identifying N6-methyladenosine sites in the Arabidopsis thaliana transcriptome. Mol. Genet. Genomics, 2016, 291(6), 2225-2229.
Chen, W.; Feng, P.; Ding, H.; Lin, H.; Chou, K.C. iRNA-methyl: Identifying N 6-methyladenosine sites using pseudo nucleotide composition. Anal. Biochem., 2015, 490, 26-33.
Chen, W.; Feng, P.; Tang, H.; Ding, H.; Lin, H. Identifying 2′-O-methylationation sites by integrating nucleotide chemical properties and nucleotide compositions. Genomics, 2016, 107(6), 255-258.
Feng, P.M.; Chen, W.; Lin, H.; Chou, K-C. iHSP-PseRAAAC: Identifying the heat shock protein families using pseudo reduced amino acid alphabet composition. Anal. Biochem., 2013, 442(1), 118-125.
Cao, R.; Wang, Z.; Wang, Y.; Cheng, J. SMOQ: A tool for predicting the absolute residue-specific quality of a single protein model with support vector machines. BMC Bioinformatics, 2014, 15, 120.
Cao, R.; Wang, Z.; Cheng, J. Designing and evaluating the MULTICOM protein local and global model quality prediction methods in the CASP10 experiment. BMC Struct. Biol., 2014, 14, 13.
Lin, H.; Liang, Z.Y.; Tang, H.; Chen, W. Identifying sigma70 promoters with novel pseudo nucleotide composition. IEEE/ACM Trans. Comput. Biol. Bioinfor, 2017.
Breiman, L. Random forests. Mach. Learn., 2001, 45(1), 5-32.
Liao, Z.; Ju, Y.; Zou, Q. Prediction of G-protein-coupled receptors with SVM-prot features and random forest. Scientifica., 2016, 2016, 8309253.
Zhao, X.; Zou, Q.; Liu, B.; Liu, X. Exploratory predicting protein folding model with random forest and hybrid features. Curr. Proteomics, 2014, 11(4), 289-299.
Chen, W.; Lin, H.; Feng, P.; Wang, J. Exon skipping event prediction based on histone modifications. Interdiscip. Sci., 2014, 6(3), 241-249.
Gautam, A.; Singh, H.; Tyagi, A.; Chaudhary, K.; Kumar, R.; Kapoor, P.; Raghava, G. CPPsite: A curated database of cell penetrating peptides. Database., 2012, 2012, bas015.
Zhang, T.; Tan, P.; Wang, L.; Jin, N.; Li, Y.; Zhang, L.; Yang, H.; Hu, Z.; Zhang, L.; Hu, C.; Li, C.; Qian, K.; Zhang, C.; Huang, Y.; Li, K.; Lin, H.; Wang, D. RNALocate: A resource for RNA subcellular localizations. Nucleic Acids Res., 2017, 45(D1), D135-D138.
Liang, Z.Y.; Lai, H.Y.; Yang, H.; Zhang, C.J.; Yang, H.; Wei, H.H.; Chen, X.X.; Zhao, Y.W.; Su, Z.D.; Li, W.C.; Deng, E.Z.; Tang, H.; Chen, W.; Lin, H. Pro54DB: A database for experimentally verified sigma-54 promoters. Bioinformatics, 2017, 33(3), 467-469.
Ding, H.; Yang, W.; Tang, H.; Feng, P.M.; Huang, J.; Chen, W.; Lin, H. PHYPred: A tool for identifying bacteriophage enzymes and hydrolases. Virol. Sin., 2016, 31(4), 350-352.
Guo, S.H.; Deng, E.Z.; Xu, L.Q.; Ding, H.; Lin, H.; Chen, W.; Chou, K.C. iNuc-PseKNC: A sequence-based predictor for predicting nucleosome positioning in genomes with pseudo k-tuple nucleotide composition. Bioinformatics, 2014, 30(11), 1522-1529.
Yang, H.; Qiu, W.R.; Liu, G.; Guo, F.B.; Chen, W.; Chou, K.C.; Lin, H. iRSpot-Pse6NC: Identifying recombination spots in Saccharomyces cerevisiae by incorporating hexamer composition into general PseKNC. Int. J. Biol. Sci., 2018, 14(8), 883-891.
Cao, R.; Bhattacharya, D.; Adhikari, B.; Li, J.; Cheng, J. Large-scale model quality assessment for improving protein tertiary structure prediction. Bioinformatics, 2015, 31(12), i116-i123.
Adhikari, B.; Bhattacharya, D.; Cao, R.; Cheng, J. CONFOLD: Residue-residue contact-guided ab initio protein folding. Proteins, 2015, 83(8), 1436-1449.
Bhattacharya, D.; Nowotny, J.; Cao, R.; Cheng, J. 3Drefine: an interactive web server for efficient protein structure refinement. Nucleic Acids Res., 2016, 44(W1), W406-W409.
Li, J.; Cao, R.; Cheng, J. A large-scale conformation sampling and evaluation server for protein tertiary structure prediction and its assessment in CASP11. BMC Bioinformatics, 2015, 16, 337.
Higa, M.; Katagiri, C.; Shimizu-Okabe, C.; Tsumuraya, T.; Sunagawa, M.; Nakamura, M.; Ishiuchi, S.; Takayama, C.; Kondo, E.; Matsushita, M. Identification of a novel cell-penetrating peptide targeting human glioblastoma cell lines as a cancer-homing transporter. Biochem. Biophys. Res. Commun., 2015, 457(2), 206-212.
Holm, T.; Netzereab, S.; Hansen, M.; Langel, Ü.; Hällbrink, M. Uptake of cell-penetrating peptides in yeasts. FEBS Lett., 2005, 579(23), 5217-5222.
Good, L.; Awasthi, S.K.; Dryselius, R.; Larsson, O.; Nielsen, P.E. Bactericidal antisense effects of peptide-PNA conjugates. Nat. Biotechnol., 2001, 19(4), 360-364.
Mäe, M.; Myrberg, H.; Jiang, Y.; Paves, H.; Valkna, A.; Langel, Ü. Internalisation of cell-penetrating peptides into tobacco protoplasts. Biochim. Biophys. Acta, 2005, 1669(2), 101-107.
Duchardt, F.; Ruttekolk, I.R.; Verdurmen, W.P.; Lortat-Jacob, H.; Bürck, J.; Hufnagel, H.; Fischer, R.; Van den Heuvel, M.; Löwik, D.W.; Vuister, G.W. A cell-penetrating peptide derived from human lactoferrin with conformation-dependent uptake efficiency. J. Biol. Chem., 2009, 284(52), 36099-36108.
Duchardt, F.; Fotin‐Mleczek, M.; Schwarz, H.; Fischer, R.; Brock, R. A comprehensive model for the cellular uptake of cationic cell‐penetrating peptides. Traffic, 2007, 8(7), 848-866.
Sandberg, M.; Eriksson, L.; Jonsson, J.; Sjöström, M.; Wold, S. New chemical descriptors relevant for the design of biologically active peptides. A multivariate characterization of 87 amino acids. J. Med. Chem., 1998, 41(14), 2481-2491.
Karelson, M. Molecular descriptors in QSAR/QSPR; Wiley-Interscience: New York, 2000.
Chou, K.C. Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins, 2001, 43(3), 246-255.
Kawashima, S.; Pokarowski, P.; Pokarowska, M.; Kolinski, A.; Katayama, T.; Kanehisa, M. AAindex: amino acid index database, progress report 2008. Nucleic Acids Res., 2008, 36(Suppl. 1), D202-D205.
Chen, W.; Feng, P-M.; Deng, E.Z.; Lin, H.; Chou, K.C. iTIS-PseTNC: A sequence-based predictor for identifying translation initiation site in human genes using pseudo trinucleotide composition. Anal. Biochem., 2014, 462, 76-83.
Chen, W.; Feng, P.M.; Lin, H.; Chou, K.C. iSS-PseDNC: Identifying splicing sites using pseudo dinucleotide composition. BioMed Res. Int., 2014, 2014, 623149.
Chen, W.; Lei, T-Y.; Jin, D.C.; Lin, H.; Chou, K.C. PseKNC: A flexible web server for generating pseudo K-tuple nucleotide composition. Anal. Biochem., 2014, 456, 53-60.
Bailey, T.L.; Boden, M.; Buske, F.A.; Frith, M.; Grant, C.E.; Clementi, L.; Ren, J.; Li, W.W.; Noble, W.S. MEME SUITE: Tools for motif discovery and searching. Nucleic Acids Res., 2009, 37, W202-W208.
Cao, R.; Freitas, C.; Chan, L.; Sun, M.; Jiang, H.; Chen, Z. ProLanGO: Protein function prediction using neural machine translation based on a recurrent neural network. Molecules, 2017, 22(10), pii E1732.
Cao, R.; Bhattacharya, D.; Hou, J.; Cheng, J.; Deep, Q.A. Improving the estimation of single protein model quality with deep belief networks. BMC Bioinformatics, 2016, 17(1), 495.
Liu, G.; Xu, Y.; Jiang, Y.; Zhang, L.; Feng, R.; Jiang, Q. PICALM rs3851179 variant confers susceptibility to alzheimer’s disease in chinese population. Mol. Neurobiol., 2017, 54(5), 3131-3136.
Liu, G.; Zhang, Y.; Wang, L.; Xu, J.; Chen, X.; Bao, Y.; Hu, Y.; Jin, S.; Tian, R.; Bai, W.; Zhou, W.; Wang, T.; Han, Z.; Zong, J.; Jiang, Q. Alzheimer’s disease rs11767557 variant regulates EPHA1 gene expression specifically in human whole blood. . J. Alzheimers Dis., 2018, 61(3), 1077-1088.
Liu, G.; Zhang, F.; Hu, Y.; Jiang, Y.; Gong, Z.; Liu, S.; Chen, X.; Jiang, Q.; Hao, J. Genetic variants and multiple sclerosis risk gene SLC9A9 expression in distinct human brain regions. Mol. Neurobiol., 2017, 54, 6820-6826.

Rights & PermissionsPrintExport Cite as

Article Details

Year: 2019
Page: [217 - 223]
Pages: 7
DOI: 10.2174/1389200219666181010114750
Price: $58

Article Metrics

PDF: 32