Generic placeholder image

Current Medicinal Chemistry


ISSN (Print): 0929-8673
ISSN (Online): 1875-533X

Review Article

Better Performance with Transformer: CPPFormer in the Precise Prediction of Cell-penetrating Peptides

Author(s): Yuyang Xue, Xiucai Ye*, Lesong Wei, Xin Zhang, Tetsuya Sakurai and Leyi Wei*

Volume 29, Issue 5, 2022

Published on: 14 January, 2022

Page: [881 - 893] Pages: 13

DOI: 10.2174/0929867328666210920103140

Price: $65


Owing to its superior performance, the Transformer model, based on the 'Encoder- Decoder' paradigm, has become the mainstream model in natural language processing. However, bioinformatics has embraced machine learning and has led to remarkable progress in drug design and protein property prediction. Cell-penetrating peptides (CPPs) are a type of permeable protein that is a convenient 'postman' in drug penetration tasks. However, only a few CPPs have been discovered, limiting their practical applications in drug permeability. CPPs have led to a new approach that enables the uptake of only macromolecules into cells (i.e., without other potentially harmful materials found in the drug).

Most previous studies have utilized trivial machine learning techniques and hand-crafted features to construct a simple classifier. CPPFormer was constructed by implementing the attention structure of the Transformer, rebuilding the network based on the characteristics of CPPs according to their short length, and using an automatic feature extractor with a few manually engineered features to co-direct the predicted results. Compared to all previous methods and other classic text classification models, the empirical results show that our proposed deep model-based method achieves the best performance, with an accuracy of 92.16% in the CPP924 dataset, and passes various index tests.

Keywords: Cell-penetrating peptides, deep learning, drug penetration, transformer, feature extractor, classification.

Schneider, P.; Walters, W.P.; Plowright, A.T.; Sieroka, N.; Listgarten, J.; Goodnow, R.A., Jr; Fisher, J.; Jansen, J.M.; Duca, J.S.; Rush, T.S.; Zentgraf, M.; Hill, J.E.; Krutoholow, E.; Kohler, M.; Blaney, J.; Funatsu, K.; Luebkemann, C.; Schneider, G. Rethinking drug design in the artificial intelligence era. Nat. Rev. Drug Discov., 2020, 19(5), 353-364.
[] [PMID: 31801986]
Chen, L.; Chu, C.; Zhang, Y-H.; Zheng, M.; Zhu, L.; Kong, X. Identification of drug-drug interactions using chemical interactions. Curr. Bioinform., 2017, 12(6), 526-534.
Khalili, P.; Arakelian, A.; Chen, G.; Plunkett, M.L.; Beck, I.; Parry, G.C.; Doñate, F.; Shaw, D.E.; Mazar, A.P.; Rabbani, S.A. A non-RGD-based integrin binding peptide (ATN-161) blocks breast cancer growth and metastasis in vivo. Mol. Cancer Ther., 2006, 5(9), 2271-2280.
[] [PMID: 16985061]
Fonseca, S.B.; Pereira, M.P.; Kelley, S.O. Recent advances in the use of cell-penetrating peptides for medical and biological applications. Adv. Drug Deliv. Rev., 2009, 61(11), 953-964.
[] [PMID: 19538995]
Lakshmanan, M.; Kodama, Y.; Yoshizumi, T.; Sudesh, K.; Numata, K. Rapid and efficient gene delivery into plant cells using designed peptide carriers. Biomacromolecules, 2013, 14(1), 10-16.
[] [PMID: 23215041]
Rüter, C.; Buss, C.; Scharnert, J.; Heusipp, G.; Schmidt, M.A. A newly identified bacterial cell-penetrating peptide that reduces the transcription of pro-inflammatory cytokines. J. Cell Sci., 2010, 123(Pt 13), 2190-2198.
[] [PMID: 20554895]
Otvos, L. Peptide-based drug design: here and now; Springer, 2008, pp. 1-8.
Gao, S.; Simon, M.J.; Hue, C.D.; Morrison, B., III; Banta, S. An unusual cell penetrating peptide identified using a plasmid display-based functional selection platform. ACS Chem. Biol., 2011, 6(5), 484-491.
[] [PMID: 21291271]
Yang, W.; Zhu, X-J.; Huang, J.; Ding, H.; Lin, H. A brief survey of machine learning methods in protein sub-Golgi localization. Curr. Bioinform., 2019, 14(3), 234-240.
Frankel, A.D.; Pabo, C.O. Cellular uptake of the tat protein from human immunodeficiency virus. Cell, 1988, 55(6), 1189-1193.
[] [PMID: 2849510]
Sanders, W.S.; Johnston, C.I.; Bridges, S.M.; Burgess, S.C.; Willeford, K.O. Prediction of cell penetrating peptides by support vector machines. PLOS Comput. Biol., 2011, 7(7)e1002101
[] [PMID: 21779156]
Gautam, A.; Singh, H.; Tyagi, A.; Chaudhary, K.; Kumar, R.; Kapoor, P.; Raghava, G.P. CPPsite: A curated database of cell penetrating peptides. Database (Oxford), 2012, 2012bas015
[] [PMID: 22403286]
Agrawal, P.; Bhalla, S.; Usmani, S.S.; Singh, S.; Chaudhary, K.; Raghava, G.P.; Gautam, A. CPPsite 2.0: A repository of experimentally validated cell-penetrating peptides. Nucleic Acids Res., 2016, 44(D1), D1098-D1103.
[] [PMID: 26586798]
Wei, L.; Tang, J.; Zou, Q. SkipCPP-Pred: An improved and promising sequence-based predictor for predicting cell-penetrating peptides. BMC Genomics, 2017, 18(Suppl. 7), 742.
[] [PMID: 29513192]
Pandey, P.; Patel, V.; George, N.V.; Mallajosyula, S.S. KELM-CPPpred: Kernel extreme learning machine based prediction model for cell-penetrating peptides. J. Proteome Res., 2018, 17(9), 3214-3222.
[] [PMID: 30032609]
Zhang, J.; Liu, B. A review on the recent developments of sequence-based protein feature extraction methods. Curr. Bioinform., 2019, 14(3), 190-199.
Dao, F.Y.; Lv, H.; Zulfiqar, H.; Yang, H.; Su, W.; Gao, H.; Ding, H.; Lin, H. A computational platform to identify origins of replication sites in eukaryotes. Brief. Bioinform., 2021, 22(2), 1940-1950.
[] [PMID: 32065211]
Tang, H.; Su, Z.D.; Wei, H.H.; Chen, W.; Lin, H. Prediction of cell-penetrating peptides with feature selection techniques. Biochem. Biophys. Res. Commun., 2016, 477(1), 150-154.
[] [PMID: 27291150]
Hansen, M.; Kilk, K.; Langel, U. Predicting cell-penetrating peptides. Adv. Drug Deliv. Rev., 2008, 60(4-5), 572-579.
[] [PMID: 18045726]
Dobchev, D.A.; Mager, I.; Tulp, I.; Karelson, G.; Tamm, T.; Tamm, K.; Janes, J.; Langel, U.; Karelson, M. Prediction of cell-penetrating peptides using artificial neural networks. Curr. Comput. Aided Drug Des., 2010, 6(2), 79-89.
[] [PMID: 20402661]
Tahir, M.; Idris, A. MD-LBP: An Efficient computational model for protein subcellular localization from HeLa cell lines using SVM. Curr. Bioinform., 2020, 15(3), 204-211.
Kuo, J-H.; Chang, C-C.; Chen, C-W.; Liang, H-H.; Chang, C-Y.; Chu, Y-W. Sequence-based structural B-cell Epitope prediction by using two layer SVM model and association rule features. Curr. Bioinform., 2020, 15(3), 246-252.
Holton, T.A.; Pollastri, G.; Shields, D.C.; Mooney, C. CPPpred: Prediction of cell penetrating peptides. Bioinformatics, 2013, 29(23), 3094-3096.
[] [PMID: 24064418]
Chen, L.; Chu, C.; Huang, T.; Kong, X.; Cai, Y-D. Prediction and analysis of cell-penetrating peptides using pseudo-amino acid composition and random forest models. Amino Acids, 2015, 47(7), 1485-1493.
[] [PMID: 25894890]
Qiang, X.; Zhou, C.; Ye, X.; Du, P.F.; Su, R.; Wei, L. CPPred-FL: A sequence-based predictor for large-scale identification of cell-penetrating peptides by feature representation learning.Brief. Bioinform., 2018.Online ahead of print..
[] [PMID: 30239616]
Arif, M.; Ahmad, S.; Ali, F.; Fang, G.; Li, M.; Yu, D.J. TargetCPP: Accurate prediction of cell-penetrating peptides from optimized multi-scale features using gradient boost decision tree. J. Comput. Aided Mol. Des., 2020, 34(8), 841-856.
[] [PMID: 32180124]
Su, R.; Hu, J.; Zou, Q.; Manavalan, B.; Wei, L. Empirical comparison and analysis of web-based cell-penetrating peptide prediction tools. Brief. Bioinform., 2020, 21(2), 408-420.
[] [PMID: 30649170]
Huang, G.; Li, J. Feature extractions for computationally predicting protein post-translational modifications. Curr. Bioinform., 2018, 13(4), 387-395.
Chou, K.C. Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins, 2001, 43(3), 246-255.
[] [PMID: 11288174]
Young, T.; Hazarika, D.; Poria, S.; Cambria, E. Recent trends in deep learning based natural language processing. IEEE Computational Intelligence Magazine,, 2018, 13(3), 55-75.
Liu, B. Sentiment analysis and opinion mining. Synth. Lectures Hum. Lang. Technol., 2012, 5(1), 1-167.
van Aken, B.; Risch, J.; Krestel, R.; Löser, A. In: Challenges for toxic comment classification: An in-depth error analysis, Proceedings of the 2nd Workshop on Abusive Language Online (ALW2), Brussels, Belgium, October 2018; Association for Computational Linguistics: Stroudsburg, Pennsylvania, United States,, , pp. 33-42.
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N. Attention is all you need. Adv. Neural Inf. Process. Syst., 2017, 30, 5998-6008.
Dehghani, M.; Gouws, S.; Vinyals, O.; Uszkoreit, J.; Kaiser, Ł. Universal transformers. In: arXiv,; , 2018. Preprint Papers..
LeCun, Y.; Bengio, Y. Convolutional networks for images, speech, and time series.In:The Handbook of Brain Theory and Neural Networks; MIT Press: Pennsylvania, 1995, Vol. 3361, . (10)
Zhang, L.; He, Y.; Song, H.; Wang, X.; Lu, N.; Sun, L. Elastic net regularized softmax regression methods for multi-subtype classification in cancer. Curr. Bioinform., 2020, 15(3), 212-224.
Jordan, M.I. Attractor dynamics and parallelism in a connectionist sequential machine. In: Artificial Neural Networks: Concept Learning; ACM Digital: NewYork City,; , 1990, pp. 112-127.
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput., 1997, 9(8), 1735-1780.
[] [PMID: 9377276]
Long, H.; Sun, Z.; Li, M.; Fu, H.Y.; Lin, M.C. Predicting protein phosphorylation sites based on deep learning. Curr. Bioinform., 2020, 15(4), 300-308.
Chen, M.X.; Firat, O.; Bapna, A.; Johnson, M.; Macherey, W.; Foster, G.; Jones, L.; Schuster, M.; Shazeer, N.; Parmar, N.; Vaswani, A.; Uszkoreit, J.; Kaiser, L.; Chen, Z.; Wu, Y.; Hughes, M. In: The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation, Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics Volume 1 Long Papers), Melbourne, Australia, July, 2018; Association for Computational Linguistics, Stroudsburg, Pennsylvania, United States,, 2018, pp. 76-86.
Luo, H.; Zhang, S.; Lei, M.; Xie, L. Simplified selfattention for transformer-based end-to-end speech recognition. arXiv, 2020. Preprint paper.,
Parmar, N.; Vaswani, A.; Uszkoreit, J.; Kaiser, Ł.; Shazeer, N.; Ku, A. .Image transformer. arXiv, 2018. Preprint paper.,
Du, Y.; Meier, J.; Ma, J.; Fergus, R.; Rives, A. .Energybased models for atomic-resolution protein conformations. arXiv, 2020. Preprint paper.,
Elnaggar, A.; Heinzinger, M.; Dallago, C.; Rost, B. End-to-end multitask learning, from protein language to protein features without alignments. bioRxiv, 2020.864405 [Preprint paper].
Madani, A.; McCann, B.; Naik, N.; Keskar, N.S.; Anand, N.; Eguchi, R.R. ProGen: Language modeling for protein generation. bioRxiv, 2020.982272 [Preprint paper]
Rives, A.; Goyal, S.; Meier, J.; Guo, D.; Ott, M.; Zitnick, C.L. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. bioRxiv, 2020.622803 [Preprint paper]
Ingraham, J.; Garg, V.; Barzilay, R.; Jaakkola, T. Generative models for graph-based protein design.In: Advances in Neural Information Processing Systems, 2019, 15820- 15831. Article No.: 1417;
Bello, I.; Zoph, B.; Vaswani, A.; Shlens, J.; Le, Q.V. Attention augmented convolutional networks. arXiv, 2019. Preprint paper.,
Gulati, A.; Qin, J.; Chiu, C-C.; Parmar, N.; Zhang, Y.; Yu, J.; Han, W.; Wang, S.; Zhang, Z.; Wu, Y.; Pang, R. Conformer: Convolution-augmented transformer for speech recognition. arXiv, 2020. Preprint paper.,
Stuart, T.; Butler, A.; Hoffman, P.; Hafemeister, C.; Papalexi, E.; Mauck, W.M., III; Hao, Y.; Stoeckius, M.; Smibert, P.; Satija, R. Comprehensive integration of single-cell data. Cell, 2019, 177(7), 1888-1902.e21.
[] [PMID: 31178118]
Child, R.; Gray, S.; Radford, A.; Sutskever, I. Generating long sequences with sparse transformers. arXiv, 2019. Preprint paper.,
Yang, H.; Tang, H.; Chen, X.X.; Zhang, C.J.; Zhu, P.P.; Ding, H.; Chen, W.; Lin, H. Identification of secretory proteins in mycobacterium tuberculosis using pseudo amino acid composition. BioMed Res. Int., 2016, 20165413903
[] [PMID: 27597968]
Chen, X.X.; Tang, H.; Li, W.C.; Wu, H.; Chen, W.; Ding, H.; Lin, H. Identification of bacterial cell wall lyases via pseudo amino acid composition. BioMed Res. Int., 2016, 20161654623
[] [PMID: 27437396]
Broder, A.Z.; Glassman, S.C.; Manasse, M.S.; Zweig, G. Syntactic clustering of the web. Comput. Netw. ISDN Syst., 1997, 29(8-13), 1157-1166.
Tang, H.; Zhao, Y.W.; Zou, P.; Zhang, C.M.; Chen, R.; Huang, P.; Lin, H. HBPred: a tool to identify growth hormone-binding proteins. Int. J. Biol. Sci., 2018, 14(8), 957-964.
[] [PMID: 29989085]
Henikoff, S.; Henikoff, J.G. Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA, 1992, 89(22), 10915-10919.
[] [PMID: 1438297]
Bahdanau, D.; Cho, K.; Bengio, Y. Neural machine translation by jointly learning to align and translate. arXiv, 2014. Preprint paper.,
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. arXiv, 2016. Preprint paper.,
Ba, J.L.; Kiros, J.R.; Hinton, G.E. Layer normalization. arXiv, 2016. Preprint paper.,
Boukelia, A.; Boucheham, A.; Belguidou, M.; Batouche, M.; Zehraoui, F.; Tahi, F. A novel integrative approach for non-coding RNA classification based on deep learning. Curr. Bioinform., 2020, 15(4), 338-348.
Jin, Q.; Meng, Z.; Tuan, D.P.; Chen, Q.; Wei, L.; Su, R. DUNet: A deformable network for retinal vessel segmentation. Knowl. Base. Syst., 2019, 178, 149-162.
Manavalan, B.; Basith, S.; Shin, T.H.; Wei, L.; Lee, G. Meta-4mCpred: A Sequence-based meta-predictor for accurate DNA 4mC site prediction using effective feature representation. Mol. Ther. Nucleic Acids, 2019, 16, 733-744.
[] [PMID: 31146255]
Manavalan, B.; Basith, S.; Shin, T.H.; Wei, L.; Lee, G. mAHTPred: A sequence-based meta-predictor for improving the prediction of anti-hypertensive peptides using effective feature representation. Bioinformatics, 2019, 35(16), 2757-2765.
[] [PMID: 30590410]
Hong, Z.; Zeng, X.; Wei, L.; Liu, X. Identifying enhancer-promoter interactions with neural network based on pre-trained DNA vectors and attention mechanism. Bioinformatics, 2020, 36(4), 1037-1043.
[] [PMID: 31588505]
Wei, L.; Liao, M.; Gao, Y.; Ji, R.; He, Z.; Zou, Q. Improved and promising identification of human MicroRNAs by incorporating a high-quality negative set. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 2014, 11(1), 192-201.
[] [PMID: 26355518]
Wei, L.; Wan, S.; Guo, J.; Wong, K.K.L. A novel hierarchical selective ensemble classifier with bioinformatics application. Artif. Intell. Med., 2017, 83, 82-90.
[] [PMID: 28245947]
Wei, L.; Xing, P.; Shi, G.; Ji, Z.; Zou, Q. Fast prediction of protein methylation sites using a sequence-based feature selection technique. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 2019, 16(4), 1264-1273.
[] [PMID: 28222000]
Wei, L.; Xing, P.; Zeng, J.; Chen, J.; Su, R.; Guo, F. Improved prediction of protein-protein interactions using novel negative samples, features, and an ensemble classifier. Artif. Intell. Med., 2017, 83, 67-74.
[] [PMID: 28320624]
Amanat, S.; Ashraf, A.; Hussain, W.; Rasool, N.; Khan, Y.D. Identification of lysine carboxylation sites in proteins by integrating statistical moments and position relative features via general PseAAC. Curr. Bioinform., 2020, 15(5), 396-407.
Niu, M.; Zhang, J.; Li, Y.; Wang, C.; Liu, Z.; Ding, H.; Zou, Q.; Ma, Q. CirRNAPL: A web server for the identification of circRNA based on extreme learning machine. Comput. Struct. Biotechnol. J., 2020, 18, 834-842.
[] [PMID: 32308930]
Li, Y.; Niu, M.; Zou, Q. ELM-MHC: An improved MHC identification method with extreme learning machine algorithm. J. Proteome Res., 2019, 18(3), 1392-1401.
[] [PMID: 30698979]
Matthews, B.W. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim. Biophys. Acta, 1975, 405(2), 442-451.
[] [PMID: 1180967]
Lv, H.; Dao, F-Y.; Guan, Z-X.; Yang, H.; Li, Y-W.; Lin, H. Deep-Kcr: Accurate detection of lysine crotonylation sites using deep learning method. Brief. Bioinform., 2021, 22(4), bbaa255.,
[] [PMID: 33099604]
Zhu, X.J.; Feng, C.Q.; Lai, H.Y.; Chen, W.; Lin, H. Predicting protein structural classes for low-similarity sequences by evaluating different features. Knowl. Base. Syst., 2019, 163, 787-793.
Lin, H.; Liang, Z.Y.; Tang, H.; Chen, W. Identifying sigma70 promoters with novel pseudo nucleotide composition. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 2019, 16(4), 1316-1321.
[] [PMID: 28186907]
Wei, L.; Ding, Y.; Su, R.; Tang, J.; Zou, Q. Prediction of human protein subcellular localization using deep learning. J. Parallel Distrib. Comput., 2018, 117, 212-217.
Wei, L.; Hu, J.; Li, F.; Song, J.; Su, R.; Zou, Q. Comparative analysis and prediction of quorum-sensing peptides using feature representation learning and machine learning algorithms. Brief. Bioinform., 2018, 21(1), 106-119.
[] [PMID: 30383239]
Wei, L.; Xing, P.; Su, R.; Shi, G.; Ma, Z.S.; Zou, Q. CPPred-RF: A Sequence-based predictor for identifying cell-penetrating peptides and their uptake efficiency. J. Proteome Res., 2017, 16(5), 2044-2053.
[] [PMID: 28436664]
Kim, Y. In: Convolutional neural networks for sentence classification, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar October,2014Stroudsburg, Pennsylvania, United States2014, pp. 1746-1751.
Liu, P.; Qiu, X.; Huang, X. Recurrent neural network for text classification with multi-task learning. arXiv, 2016. Preprint paper.,
Zhou, P.; Shi, W.; Tian, J.; Qi, Z.; Li, B.; Hao, H.; Xu, B. In: Attention-based bidirectional long short-term memory networks for relation classification,Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany August, 2016; Association for Computational Linguistics: Stroudsburg, Pennsylvania, United States,2016, , pp. 207-212.
Jumper, J.; Evans, R.; Pritzel, A.; Green, T.; Figurnov, M.; Ronneberger, O.; Tunyasuvunakool, K.; Bates, R.; Žídek, A.; Potapenko, A.; Bridgland, A.; Meyer, C.; Kohl, S.A.A.; Ballard, A.J.; Cowie, A.; Romera-Paredes, B.; Nikolov, S.; Jain, R.; Adler, J.; Back, T.; Petersen, S.; Reiman, D.; Clancy, E.; Zielinski, M.; Steinegger, M.; Pacholska, M.; Berghammer, T.; Bodenstein, S.; Silver, D.; Vinyals, O.; Senior, A.W.; Kavukcuoglu, K.; Kohli, P.; Hassabis, D. Highly accurate protein structure prediction with AlphaFold. Nature, 2021, 596(7873), 583-589.
[] [PMID: 34265844]
Su, R.; Liu, X.; Wei, L.; Zou, Q. Deep-Resp-Forest: A deep forest model to predict anti-cancer drug response. Methods, 2019, 166, 91-102.
[] [PMID: 30772464]
Su, R.; Liu, X.; Xiao, G.; Wei, L. Meta-GDBP: A high-level stacked regression model to improve anticancer drug response prediction. Brief. Bioinform., 2020, 21(3), 996-1005.
[] [PMID: 30868164]
Su, R.; Wu, H.; Xu, B.; Liu, X.; Wei, L. Developing a multi-dose computational model for drug-induced hepatotoxicity prediction based on toxicogenomics data. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 2019, 16(4), 1231-1239.
[] [PMID: 30040651]
Wei, L.; Chen, H.; Su, R. M6APred-EL: A sequence-based predictor for identifying N6-methyladenosine sites using ensemble learning. Mol. Ther. Nucleic Acids, 2018, 12, 635-644.
[] [PMID: 30081234]
Su, R.; Liu, X.; Wei, L. MinE-RFE: Determine the optimal subset from RFE by minimizing the subset-accuracy-defined energy. Brief. Bioinform., 2020, 21(2), 687-698.
[] [PMID: 30860571]
Dai, C.; Feng, P.; Cui, L.; Su, R.; Chen, W.; Wei, L. Iterative feature representation algorithm to improve the predictive performance of N7-methylguanosine sites.Brief. Bioinform., 2021, 22(4), bbaa278.
Wei, L.; He, W.; Malik, A.; Su, R.; Cui, L.; Manavalan, B. Computational prediction and interpretation of cell-specific replication origin sites from multiple eukaryotes by exploiting stacking framework. Brief. Bioinform., 2021, 22(4), 2020-Nov-05.
[] [PMID: 33152766]

Rights & Permissions Print Export Cite as
© 2023 Bentham Science Publishers | Privacy Policy