DeepSSPred: A Deep Learning Based Sulfenylation Site Predictor Via a Novel nSegmented Optimize Federated Feature Encoder

Author(s): Zaheer Ullah Khan, Dechang Pi*

Journal Name: Protein & Peptide Letters

Volume 28 , Issue 6 , 2021


Become EABM
Become Reviewer
Call for Editor

Graphical Abstract:


Abstract:

Background: S-sulfenylation (S-sulphenylation, or sulfenic acid) proteins, are special kinds of post-translation modification, which plays an important role in various physiological and pathological processes such as cytokine signaling, transcriptional regulation, and apoptosis. Despite these aforementioned significances, and by complementing existing wet methods, several computational models have been developed for sulfenylation cysteine sites prediction. However, the performance of these models was not satisfactory due to inefficient feature schemes, severe imbalance issues, and lack of an intelligent learning engine.

Objective: In this study, our motivation is to establish a strong and novel computational predictor for discrimination of sulfenylation and non-sulfenylation sites.

Methods: In this study, we report an innovative bioinformatics feature encoding tool, named DeepSSPred, in which, resulting encoded features is obtained via nSegmented hybrid feature, and then the resampling technique called synthetic minority oversampling was employed to cope with the severe imbalance issue between SC-sites (minority class) and non-SC sites (majority class). State of the art 2D-Convolutional Neural Network was employed over rigorous 10-fold jackknife cross-validation technique for model validation and authentication.

Results: Following the proposed framework, with a strong discrete presentation of feature space, machine learning engine, and unbiased presentation of the underline training data yielded into an excellent model that outperforms with all existing established studies. The proposed approach is 6% higher in terms of MCC from the first best. On an independent dataset, the existing first best study failed to provide sufficient details. The model obtained an increase of 7.5% in accuracy, 1.22% in Sn, 12.91% in Sp and 13.12% in MCC on the training data and12.13% of ACC, 27.25% in Sn, 2.25% in Sp, and 30.37% in MCC on an independent dataset in comparison with 2nd best method. These empirical analyses show the superlative performance of the proposed model over both training and Independent dataset in comparison with existing literature studies.

Conclusion: In this research, we have developed a novel sequence-based automated predictor for SC-sites, called DeepSSPred. The empirical simulations outcomes with a training dataset and independent validation dataset have revealed the efficacy of the proposed theoretical model. The good performance of DeepSSPred is due to several reasons, such as novel discriminative feature encoding schemes, SMOTE technique, and careful construction of the prediction model through the tuned 2D-CNN classifier. We believe that our research work will provide a potential insight into a further prediction of S-sulfenylation characteristics and functionalities. Thus, we hope that our developed predictor will significantly helpful for large scale discrimination of unknown SC-sites in particular and designing new pharmaceutical drugs in general.

Keywords: S-sulfenylation proteins, cytokine signaling, new feature encoding scheme, nSegmented wrapper feature, 2DCNN, deep learning.

[1]
Voet, D.; Voet, J.G.; Pratt, C.W. Fundamentals of biochemistry: life at the molecular level, 5th Ed.; Weliy, 2013.
[2]
Khoury, G.A.; Baliban, R.C.; Floudas, C.A. Proteome-wide post-translational modification statistics: frequency analysis and curation of the swiss-prot database. Sci. Rep., 2011, 1, 90.
[http://dx.doi.org/10.1038/srep00090] [PMID: 22034591]
[3]
Hornbeck, P.V. PhosphoSitePlus: a comprehensive resource for investigating the structure and function of experimentally determined post-translational modifications in man and mouse. Nucleic Acids Res., 2011, 40(D1), D261-D270.
[4]
Mann, M.; Jensen, O.N. Proteomic analysis of post-translational modifications. Nat. Biotechnol., 2003, 21(3), 255-261.
[http://dx.doi.org/10.1038/nbt0303-255] [PMID: 12610572]
[5]
Papin, J.A.; Hunter, T.; Palsson, B.O.; Subramaniam, S. Reconstruction of cellular signalling networks and analysis of their properties. Nat. Rev. Mol. Cell Biol., 2005, 6(2), 99-111.
[http://dx.doi.org/10.1038/nrm1570] [PMID: 15654321]
[6]
Yang, J.; Gupta, V.; Carroll, K.S.; Liebler, D.C. Site-specific mapping and quantification of protein S-sulphenylation in cells. Nat. Commun., 2014, 5, 4776.
[http://dx.doi.org/10.1038/ncomms5776] [PMID: 25175731]
[7]
Paulsen, C.E.; Truong, T.H.; Garcia, F.J.; Homann, A.; Gupta, V.; Leonard, S.E.; Carroll, K.S. Peroxide-dependent sulfenylation of the EGFR catalytic site enhances kinase activity. Nat. Chem. Biol., 2011, 8(1), 57-64.
[http://dx.doi.org/10.1038/nchembio.736] [PMID: 22158416]
[8]
Paulsen, C.E.; Carroll, K.S. Cysteine-mediated redox signaling: chemistry, biology, and tools for discovery. Chem. Rev., 2013, 113(7), 4633-4679.
[http://dx.doi.org/10.1021/cr300163e] [PMID: 23514336]
[9]
Sevier, C.S.; Kaiser, C.A. Formation and transfer of disulphide bonds in living cells. Nat. Rev. Mol. Cell Biol., 2002, 3(11), 836-847.
[http://dx.doi.org/10.1038/nrm954] [PMID: 12415301]
[10]
Poole, L.B. The basics of thiols and cysteines in redox biology and chemistry. Free Radic. Biol. Med., 2015, 80, 148-157.
[http://dx.doi.org/10.1016/j.freeradbiomed.2014.11.013] [PMID: 25433365]
[11]
Leonard, S.E.; Carroll, K.S. Chemical ‘omics’ approaches for understanding protein cysteine oxidation in biology. Curr. Opin. Chem. Biol., 2011, 15(1), 88-102.
[http://dx.doi.org/10.1016/j.cbpa.2010.11.012] [PMID: 21130680]
[12]
Kelley, A.R.; Bach, S.B.H.; Perry, G. Analysis of post-translational modifications in Alzheimer’s disease by mass spectrometry. Biochim. Biophys. Acta Mol. Basis Dis., 2019, 1865(8), 2040-2047.
[http://dx.doi.org/10.1016/j.bbadis.2018.11.002] [PMID: 30481587]
[13]
Poole, L.B.; Nelson, K.J. Discovering mechanisms of signaling-mediated cysteine oxidation. Curr. Opin. Chem. Biol., 2008, 12(1), 18-24.
[http://dx.doi.org/10.1016/j.cbpa.2008.01.021] [PMID: 18282483]
[14]
Wani, R.; Qian, J.; Yin, L.; Bechtold, E.; King, S.B.; Poole, L.B.; Paek, E.; Tsang, A.W.; Furdui, C.M. Isoform-specific regulation of Akt by PDGF-induced reactive oxygen species. Proc. Natl. Acad. Sci. USA, 2011, 108(26), 10550-10555.
[http://dx.doi.org/10.1073/pnas.1011665108] [PMID: 21670275]
[15]
Zhou, J.; Zhao, S.; Dunker, A.K. Intrinsically disordered proteins link alternative splicing and post-translational modifications to complex cell signaling and regulation. J. Mol. Biol., 2018, 430(16), 2342-2359.
[http://dx.doi.org/10.1016/j.jmb.2018.03.028] [PMID: 29626537]
[16]
Oo, H. Z.; Seiler, R.; Black, P. C.; Daugaard, M. Post-translational modifications in bladder cancer: expanding the tumor target repertoire. Urol. Oncol. Semin. Orig. Investig, 2018, 38(12), 858-866.
[http://dx.doi.org/10.1016/j.urolonc.2018.09.001]
[17]
Williams, C.A.C.; Soufi, A.; Pollard, S.M. Post-translational modification of SOX family proteins: Key biochemical targets in cancer? Semin. Cancer Biol., 2019, 67(Pt 1), 30-38.
[http://dx.doi.org/10.1016/j.semcancer.2019.09.009] [PMID: 31539559]
[18]
Denniss, A.; Dulhunty, A.F.; Beard, N.A. Ryanodine receptor Ca2+ release channel post-translational modification: central player in cardiac and skeletal muscle disease. Int. J. Biochem. Cell Biol., 2018, 101, 49-53.
[http://dx.doi.org/10.1016/j.biocel.2018.05.004] [PMID: 29775742]
[19]
Gregorich, Z.R.; Cai, W.; Lin, Z.; Chen, A.J.; Peng, Y.; Kohmoto, T.; Ge, Y. Distinct sequences and post-translational modifications in cardiac atrial and ventricular myosin light chains revealed by top-down mass spectrometry. J. Mol. Cell. Cardiol., 2017, 107, 13-21.
[http://dx.doi.org/10.1016/j.yjmcc.2017.04.002] [PMID: 28427997]
[20]
Bui, V-M.; Lu, C-T.; Ho, T-T.; Lee, T-Y. MDD-SOH: exploiting maximal dependence decomposition to identify S-sulfenylation sites with substrate motifs. Bioinformatics, 2016, 32(2), 165-172.
[PMID: 26411868]
[21]
Xu, Y.; Ding, J.; Wu, L-Y. iSulf-Cys: prediction of S-sulfenylation sites in proteins with physicochemical properties of amino acids. PLoS One, 2016, 11(4), e0154237.
[http://dx.doi.org/10.1371/journal.pone.0154237] [PMID: 27104833]
[22]
Bui, V-M.; Weng, S-L.; Lu, C-T.; Chang, T-H.; Weng, J.T-Y.; Lee, T-Y. SOHSite: incorporating evolutionary information and physicochemical properties to identify protein S-sulfenylation sites. BMC Genomics, 2016, 17(1) (Suppl. 1), 9.
[http://dx.doi.org/10.1186/s12864-015-2299-1] [PMID: 26819243]
[23]
Sakka, M.; Tzortzis, G.; Mantzaris, M.D.; Bekas, N.; Kellici, T.F.; Likas, A.; Galaris, D.; Gerothanassis, I.P.; Tzakos, A.G. PRESS: PRotEin S-Sulfenylation server. Bioinformatics, 2016, 32(17), 2710-2712.
[http://dx.doi.org/10.1093/bioinformatics/btw301] [PMID: 27187205]
[24]
Wang, X.; Yan, R.; Li, J.; Song, J. SOHPRED: a new bioinformatics tool for the characterization and prediction of human S-sulfenylation sites. Mol. Biosyst., 2016, 12(9), 2849-2858.
[http://dx.doi.org/10.1039/C6MB00314A] [PMID: 27364688]
[25]
Hasan, M.M.; Guo, D.; Kurata, H. Computational identification of protein S-sulfenylation sites by incorporating the multiple sequence features information. Mol. Biosyst., 2017, 13(12), 2545-2550.
[http://dx.doi.org/10.1039/C7MB00491E] [PMID: 28990628]
[26]
Jia, C.; Zuo, Y. S-SulfPred: A sensitive predictor to capture S-sulfenylation sites based on a resampling one-sided selection undersampling-synthetic minority oversampling technique. J. Theor. Biol., 2017, 422, 84-89.
[http://dx.doi.org/10.1016/j.jtbi.2017.03.031] [PMID: 28411111]
[27]
Al-Barakati, H.J.; McConnell, E.W.; Hicks, L.M.; Poole, L.B.; Newman, R.H.; Kc, D.B. SVM-SulfoSite: A support vector machine based predictor for sulfenylation sites. Sci. Rep., 2018, 8(1), 11288.
[http://dx.doi.org/10.1038/s41598-018-29126-x] [PMID: 30050050]
[28]
Butt, A.H.; Khan, Y.D. Prediction of S-Sulfenylation sites using statistical moments based features via CHOU’S 5-step rule. Int. J. Pept. Res. Ther., 2020, 26(8), 1291-1301.
[http://dx.doi.org/10.1007/s10989-019-09931-2]
[29]
Khan, I.A.; Pi, D.; Khan, Z.U.; Hussain, Y.; Nawaz, A. HML-IDS: A Hybrid-Multilevel Anomaly Prediction Approach for Intrusion Detection in SCADA Systems. IEEE Access, 2019, 7, 89507-89521.
[http://dx.doi.org/10.1109/ACCESS.2019.2925838]
[30]
Pi, D.; Yue, P.; Li, B.; Khan, Z.U.; Hussain, Y.; Nawaz, A. An efficient behaviour specification and bidirectional Gated Recurrent Units based intrusion detection method for industrial control systems. Electron. Lett., 2020, 56(1), 27-30.
[31]
Khan, Z.U.; Ali, F.; Ahmad, I.; Hayat, M.; Pi, D. iPredCNC: Computational prediction model for cancerlectins and non-cancerlectins using novel cascade features subset selection. Chemom. Intell. Lab. Syst., 2019, 195, 103876.
[http://dx.doi.org/10.1016/j.chemolab.2019.103876]
[32]
Khan, Z.U.; Hayat, M.; Khan, M.A. Discrimination of acidic and alkaline enzyme using Chou’s pseudo amino acid composition in conjunction with probabilistic neural network model. J. Theor. Biol., 2015, 365, 197-203.
[http://dx.doi.org/10.1016/j.jtbi.2014.10.014] [PMID: 25452135]
[33]
Khan, Z.U.; Ali, F.; Khan, I.A.; Hussain, Y.; Pi, D. iRSpot-SPI: Deep learning-based recombination spots prediction by incorporating secondary sequence information coupled with physio-chemical properties via Chou’s 5-step rule and pseudo components. Chemom. Intell. Lab. Syst., 2019, 189, 169-180.
[http://dx.doi.org/10.1016/j.chemolab.2019.05.003]
[34]
Ali, F. DBPPred-PDSD: Machine learning approach for prediction of DNA-binding proteins using Discrete Wavelet Transform and optimized integrated features space. Chemom. Intell. Lab. Syst., 2018, 182, 21-30.
[http://dx.doi.org/10.1016/j.chemolab.2018.08.013]
[35]
Chou, K-C.; Cai, Y-D. Prediction of membrane protein types by incorporating amphipathic effects. J. Chem. Inf. Model., 2005, 45(2), 407-413.
[http://dx.doi.org/10.1021/ci049686v] [PMID: 15807506]
[36]
Chen, Z. iLearn: an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data. Brief. Bioinform., 2020, 21(3), 1047-1057.
[http://dx.doi.org/10.1093/bib/bbz041] [PMID: 31067315]
[37]
Zhao, X.; Zhang, W.; Xu, X.; Ma, Z.; Yin, M. Prediction of protein phosphorylation sites by using the composition of k-spaced amino acid pairs. PLoS One, 2012, 7(10), e46302.
[http://dx.doi.org/10.1371/journal.pone.0046302] [PMID: 23110047]
[38]
Lee, T-Y.; Lin, Z-Q.; Hsieh, S-J.; Bretaña, N.A.; Lu, C-T. Exploiting maximal dependence decomposition to identify conserved motifs from a group of aligned signal sequences. Bioinformatics, 2011, 27(13), 1780-1787.
[http://dx.doi.org/10.1093/bioinformatics/btr291] [PMID: 21551145]
[39]
Du, P.; Wang, X.; Xu, C.; Gao, Y. PseAAC-Builder: a cross-platform stand-alone program for generating various special Chou’s pseudo-amino acid compositions. Anal. Biochem., 2012, 425(2), 117-119.
[http://dx.doi.org/10.1016/j.ab.2012.03.015] [PMID: 22459120]
[40]
Cao, D-S.; Xu, Q-S.; Liang, Y-Z. propy: a tool to generate various modes of Chou’s PseAAC. Bioinformatics, 2013, 29(7), 960-962.
[http://dx.doi.org/10.1093/bioinformatics/btt072] [PMID: 23426256]
[41]
Xiao, X.; Cheng, X.; Chen, G.; Mao, Q.; Chou, K-C. pLoc_bal-mGpos: predict subcellular localization of Gram-positive bacterial proteins by quasi-balancing training dataset and PseAAC. Genomics, 2019, 111(4), 886-892.
[PMID: 29842950]
[42]
Du, P.; Gu, S.; Jiao, Y. PseAAC-General: fast building various modes of general form of Chou’s pseudo-amino acid composition for large-scale protein datasets. Int. J. Mol. Sci., 2014, 15(3), 3495-3506.
[http://dx.doi.org/10.3390/ijms15033495] [PMID: 24577312]
[43]
Yang, R.; Zhang, C.; Zhang, L.; Gao, R. A two-step feature selection method to predict cancerlectins by multiview features and synthetic minority oversampling technique. BioMed Res. Int., 2018, 2018, 9364182.
[http://dx.doi.org/10.1155/2018/9364182] [PMID: 29568772]
[44]
Hussain, M.; Zhu, W.; Zhang, W.; Ni, J.; Khan, Z.U.; Hussain, S. Identifying beneficial sessions in an e-learning system using machine learning techniques. 2018 IEEE Conference on Big Data and Analytics (ICBDA), 2018, pp. 123-128.
[http://dx.doi.org/10.1109/ICBDAA.2018.8629697]
[45]
Arif, M.; Ali, F.; Ahmad, S.; Kabir, M.; Ali, Z.; Hayat, M. Pred-BVP-Unb: Fast prediction of bacteriophage Virion proteins using un-biased multi-perspective properties with recursive feature elimination. Genomics, 2020, 112(2), 1565-1574.
[http://dx.doi.org/10.1016/j.ygeno.2019.09.006] [PMID: 31526842]
[46]
Chou, K-C.; Shen, H-B. Recent progress in protein subcellular location prediction. Anal. Biochem., 2007, 370(1), 1-16.
[http://dx.doi.org/10.1016/j.ab.2007.07.006] [PMID: 17698024]
[47]
Khan, Z.U.; Hayat, M. Hourly based climate prediction using data mining techniques by comprising entity demean algorithm. Middle East J. Sci. Res., 2014, 21(8), 1295-1300.
[48]
Khan, H. Face recognition using principle component analysis based feature selection feature vector. 2016, 4, 349.
[49]
Jani, M.R.; Khan Mozlish, M.T.; Ahmed, S.; Tahniat, N.S.; Farid, D.M.; Shatabda, S. iRecSpot-EF: effective sequence based features for recombination hotspot prediction. Comput. Biol. Med., 2018, 103, 17-23.
[http://dx.doi.org/10.1016/j.compbiomed.2018.10.005] [PMID: 30336361]
[50]
Cohn, D.; Zuk, O.; Kaplan, T. Enhancer identification using transfer and adversarial deep learning of DNA sequences. bioRxiv, 2018, 264200.
[51]
Telenti, A.; Lippert, C.; Chang, P-C.; DePristo, M. Deep learning of genomic variation and regulatory network data. Hum. Mol. Genet., 2018, 27(R1), R63-R71.
[http://dx.doi.org/10.1093/hmg/ddy115] [PMID: 29648622]
[52]
Li, Y.; Huang, C.; Ding, L.; Li, Z.; Pan, Y.; Gao, X. Deep learning in bioinformatics: introduction, application, and perspective in the big data era. Methods, 2019, 166, 4-21.
[http://dx.doi.org/10.1016/j.ymeth.2019.04.008] [PMID: 31022451]
[53]
Tahir, M.; Tayara, H.; Chong, K.T. iDNA6mA (5-step rule): identification of DNA N6-methyladenine sites in the rice genome by intelligent computational model via Chou’s 5-step rule. Chemom. Intell. Lab. Syst., 2019, 189, 96-101.
[http://dx.doi.org/10.1016/j.chemolab.2019.04.007]
[54]
Liu, B.; Fang, L.; Long, R.; Lan, X.; Chou, K-C. iEnhancer-2L: a two-layer predictor for identifying enhancers and their strength by pseudo k-tuple nucleotide composition. Bioinformatics, 2016, 32(3), 362-369.
[http://dx.doi.org/10.1093/bioinformatics/btv604] [PMID: 26476782]
[55]
Chou, K.C. Some remarks on predicting multi-label attributes in molecular biosystems. Mol. Biosyst., 2013, 9(6), 1092-1100.
[http://dx.doi.org/10.1039/c3mb25555g] [PMID: 23536215]
[56]
Chou, K.C. Some remarks on protein attribute prediction and pseudo amino acid composition. J. Theor. Biol., 2011, 273(1), 236-247.
[http://dx.doi.org/10.1016/j.jtbi.2010.12.024] [PMID: 21168420]
[57]
Ghosh, T.; Zhang, W.; Ghosh, D.; Kechris, K. Predictive modeling for metabolomics data. In: Computational Methods and Data Analysis for Metabolomics; Springer, 2020; pp. 313-336.
[http://dx.doi.org/10.1007/978-1-0716-0239-3_16]
[58]
Akbar, S.; Rahman, A.U.; Hayat, M.; Sohail, M. cACP: Classifying anticancer peptides using discriminative intelligent model via Chou’s 5-step rules and general pseudo components. Chemom. Intell. Lab. Syst., 2020, 196, 103912.
[http://dx.doi.org/10.1016/j.chemolab.2019.103912]
[59]
Bowyer, K.W.; Hall, L.O. SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res., 2002, (16), 321-357.
[60]
Chen, H.; Wang, L.; Chi, C-H.; Shen, J. Leveraging SMOTE in a two-layer model for prediction of protein-protein interactions. 2019 Seventh International Conference on Advanced Cloud and Big Data (CBD), 2019, pp. 133-138.
[http://dx.doi.org/10.1109/CBD.2019.00033]
[61]
Liu, B.; Wang, S.; Long, R.; Chou, K.C. iRSpot-EL: identify recombination spots with an ensemble learning approach. Bioinformatics, 2017, 33(1), 35-41.
[http://dx.doi.org/10.1093/bioinformatics/btw539] [PMID: 27531102]
[62]
Tahir, M.; Tayara, H.; Chong, K.T. iRNA-PseKNC(2methyl): identify RNA 2′-O-methylation sites by convolution neural network and Chou’s pseudo components. J. Theor. Biol., 2019, 465, 1-6.
[http://dx.doi.org/10.1016/j.jtbi.2018.12.034] [PMID: 30590059]
[63]
Tayara, H.; Tahir, M.; Chong, K.T. Identification of prokaryotic promoters and their strength by integrating heterogeneous features. Genomics, 2020, 112(2), 1396-1403.
[http://dx.doi.org/10.1016/j.ygeno.2019.08.009] [PMID: 31437540]
[64]
Lei, G-C.; Tang, J.; Du, P-F. Predicting S-sulfenylation sites using physicochemical properties differences. Lett. Org. Chem., 2017, 14(9), 665-672.
[http://dx.doi.org/10.2174/1570178614666170421164731]
[65]
Chou, K.-C.; Shen, H.-B. Recent advances in developing web-servers for predicting protein attributes. Nat. Sci., 2009, 1(02), 63.
[http://dx.doi.org/10.4236/ns.2009.12011]


Rights & PermissionsPrintExport Cite as

Article Details

VOLUME: 28
ISSUE: 6
Year: 2021
Published on: 01 December, 2020
Page: [708 - 721]
Pages: 14
DOI: 10.2174/0929866527666201202103411
Price: $65

Article Metrics

PDF: 32
HTML: 3