Predicting Protein Phosphorylation Sites Based on Deep Learning

Haixia       Long; Zhao       Sun; Manzhi       Li; Hai    Yan   Fu; Ming    Cai   Lin

Abstract

Background: Protein phosphorylation is one of the most important Post-translational Modifications (PTMs) occurring at amino acid residues serine (S), threonine (T), and tyrosine (Y). It plays critical roles in protein structure and function predicting. With the development of novel high-throughput sequencing technologies, there are a huge amount of protein sequences being generated and stored in databases.

Objective: It is of great importance in both basic research and drug development to quickly and accurately predict which residues of S, T, or Y can be phosphorylated.

Methods: In order to solve the problem, a novel hybrid deep learning model with a convolutional neural network and bi-directional long short-term memory recurrent neural network (CNN+BLSTM) is proposed for predicting phosphorylation sites in proteins. The model contains a list of layers that transform the input data into an output class, in which the convolution layer captures higher-level abstraction features of amino acid, while the recurrent layer captures long-term dependencies between amino acids to improve predictions. The joint model learns interactions between higher-level features derived from the protein sequence to predict the phosphorylated sites.

Results: We applied our model together with two canonical methods namely iPhos-PseEn and MusiteDeep. A 5-fold cross-validation process indicated that CNN+BLSTM outperforms the two competitors in various evaluation metrics like the area under the receiver operating characteristic and precision-recall curves, the Matthews correlation coefficient, F-measure, accuracy, and so on.

Conclusion: CNN+BLSTM is promising in identifying potential protein phosphorylation for further experimental validation.

Keywords: Phosphorylation sites, deep learning, convolutional neural network, bi-directional long short-term memory recurrent neural network, ROC curve, Precision-recall curve.

« Previous Next »

Graphical Abstract

[1] 
Jia C, Zuo Y, Zou Q. O-GlcNAcPRED-II: an integrated classification algorithm for identifying O-GlcNAcylation sites based on fuzzy undersampling and a K-means PCA oversampling technique. Bioinformatics  2018; 34(12): 2029-36.
[http://dx.doi.org/10.1093/bioinformatics/bty039] [PMID: 29420699] 
[2] 
Zeng X, Liu L, Lü L, Zou Q. Prediction of potential disease-associated microRNAs using structural perturbation method. Bioinformatics  2018; 34(14): 2425-32.
[http://dx.doi.org/10.1093/bioinformatics/bty112] [PMID: 29490018] 
[3] 
Blom N, Gammeltoft S, Brunak S. Sequence and structure-based prediction of eukaryotic protein phosphorylation sites. J Mol Biol  1999; 294(5): 1351-62.
[http://dx.doi.org/10.1006/jmbi.1999.3310] [PMID: 10600390] 
[4] 
Kim JH, Lee J, Oh B, Kimm K, Koh I. Prediction of phosphorylation sites using SVMs. Bioinformatics  2004; 20(17): 3179-84.
[http://dx.doi.org/10.1093/bioinformatics/bth382] [PMID: 15231530] 
[5] 
Li A, Wang L, Shi Y, Wang M, Jiang Z, Feng H. Phosphorylation site prediction with a modified k-nearest neighbor algorithm and BLOSUM62 matrix. Conf Proc IEEE Eng Med Biol Soc  2005; 2005: 6075-8.
[PMID: 17281648] 
[6] 
Tang YR, Chen YZ, Canchaya CA, Zhang Z. GANNPhos: a new phosphorylation site predictor based on a genetic algorithm integrated neural network. Protein Eng Des Sel  2007; 20(8): 405-12.
[http://dx.doi.org/10.1093/protein/gzm035] [PMID: 17652129] 
[7] 
Qiu WR, Xiao X, Xu ZC, Chou KC. iPhos-PseEn: identifying phosphorylation sites in proteins by fusing different pseudo components into an ensemble classifier. Oncotarget  2016; 7(32): 51270-83.
[http://dx.doi.org/10.18632/oncotarget.9987] [PMID: 27323404] 
[8] 
Wei L, Xing P, Tang J, Zou Q. PhosPred-RF: A Novel Sequence-Based Predictor for Phosphorylation Sites Using Sequential Information Only. IEEE Trans Nanobioscience  2017; 16(4): 240-7.
[http://dx.doi.org/10.1109/TNB.2017.2661756] [PMID: 28166503] 
[9] 
Eickholt J, Cheng J. DNdisorder: predicting protein disorder using boosting and deep networks. BMC Bioinformatics  2013; 14: 88-98.
[http://dx.doi.org/10.1186/1471-2105-14-88] [PMID: 23497251] 
[10] 
Leung MKK, Xiong HY, Lee LJ, Frey BJ. Deep learning of the tissue-regulated splicing code. Bioinformatics  2014; 30(12): i121-9.
[http://dx.doi.org/10.1093/bioinformatics/btu277] [PMID: 24931975] 
[11] 
Nguyen N, Tran V, Ngo D, et al. DNA sequence classification by convolutional neural network. J Biomed Sci Eng  2016; 9: 280-6.
[http://dx.doi.org/10.4236/jbise.2016.95021] 
[12] 
Quang D, Xie X. DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences. Nucleic Acids Res  2016; 44(11) e107
[http://dx.doi.org/10.1093/nar/gkw226] [PMID: 27084946] 
[13] 
Wang D, Zeng S, Xu C, et al. MusiteDeep: a deep-learning framework for general and kinase-specific phosphorylation site prediction. Bioinformatics  2017; 33(24): 3909-16.
[http://dx.doi.org/10.1093/bioinformatics/btx496] [PMID: 29036382] 
[14] 
Wei L, Ding Y, Su R, Tang J, Zou Q. Prediction of human protein subcellular localization using deep learning. J Parallel Distrib Comput  2018; 117: 212-7.
[15] 
Chou KC. Some remarks on protein attribute prediction and pseudo amino acid composition. J Theor Biol  2011; 273(1): 236-47.
[http://dx.doi.org/10.1016/j.jtbi.2010.12.024] [PMID: 21168420] 
[16] 
Yan Y, Chen M, Shyu ML, Chen SC. Deep learning for imbalanced multimedia data classification. IEEE International Symposium on Multimedia (ISM)  2015; 483-8.
[http://dx.doi.org/10.1109/ISM.2015.126] 
[17] 
Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst  2012; 1097-105.
[18] 
Sundermeyer M, Alkhouli T, Wuebker J, Ney H. Translation Modeling with Bidirectional Recurrent Neural Networks. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)  2014; 14-25.
[http://dx.doi.org/10.3115/v1/D14-1003] 
[19] 
Zhu W, Lan C, Xing J, et al. Co-occurrence feature learning for skeleton based action recognition using regularized deep LSTM Networks. Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16)  2016; 3697-703.

Rights & Permissions Print Cite

Article Metrics

37

Journal Information

For Authors

For Editors

For Reviewers

Explore Articles

Open Access

Open Access Articles

For Visitors

DOI https://dx.doi.org/10.2174/1574893614666190902154332	Print ISSN 1574-8936
Publisher Name Bentham Science Publisher	Online ISSN 2212-392X

Current Bioinformatics

Predicting Protein Phosphorylation Sites Based on Deep Learning

Abstract

Graphical Abstract

Related Journals

Related Books