Recent Advances of Computational Methods for Identifying Bacteriophage Virion Proteins

Author(s): Wei Chen*, Fulei Nie, Hui Ding*.

Journal Name: Protein & Peptide Letters

Volume 27 , Issue 4 , 2020

Become EABM
Become Reviewer

Graphical Abstract:


Abstract:

Phage Virion Proteins (PVP) are essential materials of bacteriophage, which participate in a series of biological processes. Accurate identification of phage virion proteins is helpful to understand the mechanism of interaction between the phage and its host bacteria. Since experimental method is labor intensive and time-consuming, in the past few years, many computational approaches have been proposed to identify phage virion proteins. In order to facilitate researchers to select appropriate methods, it is necessary to give a comprehensive review and comparison on existing computational methods on identifying phage virion proteins. In this review, we summarized the existing computational methods for identifying phage virion proteins and also assessed their performances on an independent dataset. Finally, challenges and future perspectives for identifying phage virion proteins were presented. Taken together, we hope that this review could provide clues to researches on the study of phage virion proteins.

Keywords: Bacteriophage, phage virion protein, host bacteria, machine learning algorithm, feature selection, web-server.

[1]
Wommack, K.E.; Colwell, R.R. Virioplankton: Viruses in aquatic ecosystems. Microbiol. Mol. Biol. Rev., 2000, 64(1), 69-114.
[http://dx.doi.org/10.1128/MMBR.64.1.69-114.2000] [PMID: 10704475]
[2]
Gibson, W. Structure and assembly of the virion. Intervirology, 1996, 39(5-6), 389-400.
[http://dx.doi.org/10.1159/000150509] [PMID: 9130048]
[3]
Stella, E.J.; Franceschelli, J.J.; Tasselli, S.E.; Morbidoni, H.R. Analysis of novel mycobacteriophages indicates the existence of different strategies for phage inheritance in mycobacteria. PLoS One, 2013, 8(2) e56384
[http://dx.doi.org/10.1371/journal.pone.0056384] [PMID: 23468864]
[4]
Martelet, A.; L’Hostis, G.; Tavares, P.; Brasilès, S.; Fenaille, F.; Rozand, C.; Theretz, A.; Gervasi, G.; Tabet, J.C.; Ezan, E.; Junot, C.; Muller, B.H.; Becher, F. Bacterial detection using unlabeled phage amplification and mass spectrometry through structural and nonstructural phage markers. J. Proteome Res., 2014, 13(3), 1450-1465.
[http://dx.doi.org/10.1021/pr400991t] [PMID: 24517284]
[5]
Aguilar, P.V.; Adams, A.P.; Wang, E.; Kang, W.; Carrara, A.S.; Anishchenko, M.; Frolov, I.; Weaver, S.C. Structural and nonstructural protein genome regions of eastern equine encephalitis virus are determinants of interferon sensitivity and murine virulence. J. Virol., 2008, 82(10), 4920-4930.
[http://dx.doi.org/10.1128/JVI.02514-07] [PMID: 18353963]
[6]
Moreland, N.J.; Tay, M.Y.; Lim, E.; Paradkar, P.N.; Doan, D.N.; Yau, Y.H.; Geifman Shochat, S.; Vasudevan, S.G. High affinity human antibody fragments to dengue virus non-structural protein 3. PLoS Negl. Trop. Dis., 2010, 4(11) e881
[http://dx.doi.org/10.1371/journal.pntd.0000881] [PMID: 21085466]
[7]
Lavigne, R.; Ceyssens, P.J.; Robben, J. Phage proteomics: Applications of mass spectrometry. Methods Mol. Biol., 2009, 502, 239-251.
[http://dx.doi.org/10.1007/978-1-60327-565-1_14] [PMID: 19082560]
[8]
Feng, P.M.; Ding, H.; Chen, W.; Lin, H. Naïve Bayes classifier with feature selection to identify phage virion proteins. Comput. Math. Methods Med., 2013, 2013530696
[http://dx.doi.org/10.1155/2013/530696] [PMID: 23762187]
[9]
Manavalan, B.; Shin, T.H.; Lee, G. PVP-SVM: Sequence-based prediction of phage virion proteins using a support vector machine. Front. Microbiol., 2018, 9, 476.
[http://dx.doi.org/10.3389/fmicb.2018.00476] [PMID: 29616000]
[10]
Pan, Y.; Gao, H.; Lin, H.; Liu, Z.; Tang, L.; Li, S. Identification of bacteriophage virion proteins using multinomial naïve bayes with g-Gap feature tree. Int. J. Mol. Sci., 2018, 19(6) E1779
[http://dx.doi.org/10.3390/ijms19061779] [PMID: 29914091]
[11]
Zhang, L.; Zhang, C.; Gao, R.; Yang, R. An ensemble method to distinguish bacteriophage virion from non-virion proteins based on protein sequence characteristics. Int. J. Mol. Sci., 2015, 16(9), 21734-21758.
[http://dx.doi.org/10.3390/ijms160921734] [PMID: 26370987]
[12]
Tan, J.X.; Dao, F.Y.; Lv, H.; Feng, P.M.; Ding, H. Identifying phage virion proteins by using two-step feature selection methods. Molecules, 2018, 23(8) E2000
[http://dx.doi.org/10.3390/molecules23082000] [PMID: 30103458]
[13]
Ding, H.; Feng, P.M.; Chen, W.; Lin, H. Identification of bacteriophage virion proteins by the ANOVA feature selection and analysis. Mol. Biosyst., 2014, 10(8), 2229-2235.
[http://dx.doi.org/10.1039/C4MB00316K] [PMID: 24931825]
[14]
UniProt. The universal protein knowledgebase. Nucleic Acids Res., 2017, 45(D1), D158-D169.
[http://dx.doi.org/10.1093/nar/gkw1099] [PMID: 27899622]
[15]
Fu, L.; Niu, B.; Zhu, Z.; Wu, S.; Li, W. CD-HIT: Accelerated for clustering the next-generation sequencing data. Bioinformatics, 2012, 28(23), 3150-3152.
[http://dx.doi.org/10.1093/bioinformatics/bts565] [PMID: 23060610]
[16]
Zou, Q.; Lin, G.; Jiang, X.; Liu, X.; Zeng, X. Sequence clustering in bioinformatics: An empirical study. Brief. Bioinform., 2018, 21(1), 1-10.
[http://dx.doi.org/10.1093/bib/bby090] [PMID: 30239587]
[17]
Chou, K.C. Some remarks on protein attribute prediction and pseudo amino acid composition. J. Theor. Biol., 2011, 273(1), 236-247.
[http://dx.doi.org/10.1016/j.jtbi.2010.12.024] [PMID: 21168420]
[18]
Chen, W.; Feng, P-M.; Lin, H.; Chou, K.C. iSS-PseDNC: Identifying splicing sites using pseudo dinucleotide composition. BioMed Res. Int., 2014, 2014623149
[19]
Chen, W.; Feng, P-M.; Deng, E-Z.; Lin, H.; Chou, K-C. iTIS-PseTNC: A sequence-based predictor for identifying translation initiation site in human genes using pseudo trinucleotide composition. Anal. Biochem., 2014, 462, 76-83.
[http://dx.doi.org/10.1016/j.ab.2014.06.022] [PMID: 25016190]
[20]
Chen, W.; Lv, H.; Nie, F.; Lin, H. i6mA-Pred: Identifying DNA N6-methyladenine sites in the rice genome. Bioinformatics, 2019, 35(16), 2796-2800.
[http://dx.doi.org/10.1093/bioinformatics/btz015] [PMID: 30624619]
[21]
Chen, W.; Yang, H.; Feng, P.; Ding, H.; Lin, H. iDNA4mC: Identifying DNA N4-methylcytosine sites based on nucleotide chemical properties. Bioinformatics, 2017, 33(22), 3518-3523.
[http://dx.doi.org/10.1093/bioinformatics/btx479] [PMID: 28961687]
[22]
Zhu, X.J.; Feng, C.Q.; Lai, H.Y.; Chen, W.; Lin, H. Predicting protein structural classes for low-similarity sequences by evaluating different features. Knowl. Base. Syst., 2019, 163, 787-793.
[http://dx.doi.org/10.1016/j.knosys.2018.10.007]
[23]
Yang, H.; Lv, H.; Ding, H.; Chen, W.; Lin, H. iRNA-2OM: A sequence-based predictor for identifying 2′-O-Methylation sites in homo sapiens. J. Comput. Biol., 2018, 25(11), 1266-1277.
[http://dx.doi.org/10.1089/cmb.2018.0004] [PMID: 30113871]
[24]
Dao, F.Y.; Lv, H.; Wang, F.; Feng, C.Q.; Ding, H.; Chen, W.; Lin, H. Identify origin of replication in Saccharomyces cerevisiae using two-step feature selection technique. Bioinformatics, 2018, 35(12), 2075-2083.
[http://dx.doi.org/10.1093/bioinformatics/bty943] [PMID: 30428009]
[25]
Feng, C.Q.; Zhang, Z.Y.; Zhu, X.J.; Lin, Y.; Chen, W.; Tang, H.; Lin, H. iTerm-PseKNC: A sequence-based tool for predicting bacterial transcriptional terminators. Bioinformatics, 2018, 35(9), 1469-1477.
[http://dx.doi.org/10.1093/bioinformatics/bty827] [PMID: 30247625]
[26]
Yang, H.; Tang, H.; Chen, X.X.; Zhang, C.J.; Zhu, P.P.; Ding, H.; Chen, W.; Lin, H. Identification of secretory proteins in Mycobacterium tuberculosis using pseudo amino acid composition. BioMed Res. Int., 2016, 2016 5413903
[http://dx.doi.org/10.1155/2016/5413903] [PMID: 27597968]
[27]
Chen, X.X.; Tang, H.; Li, W.C.; Wu, H.; Chen, W.; Ding, H.; Lin, H. Identification of bacterial cell wall lyases via pseudo amino acid composition. BioMed Res. Int., 2016, 2016 1654623
[http://dx.doi.org/10.1155/2016/1654623] [PMID: 27437396]
[28]
Manavalan, B.; Basith, S.; Shin, T.H.; Wei, L.; Lee, G. mAHTPred: A sequence-based meta-predictor for improving the prediction of anti-hypertensive peptides using effective feature representation. Bioinformatics, 2018, 35(16), 2757-2765.
[http://dx.doi.org/10.1093/bioinformatics/bty1047] [PMID: 30590410]
[29]
Basith, S.; Manavalan, B.; Shin, T.H.; Lee, G. iGHBP: Computational identification of growth hormone binding proteins from sequences using extremely randomised tree. Comput. Struct. Biotechnol. J., 2018, 16, 412-420.
[http://dx.doi.org/10.1016/j.csbj.2018.10.007] [PMID: 30425802]
[30]
Manavalan, B.; Govindaraj, R.G.; Shin, T.H.; Kim, M.O.; Lee, G. iBCE-EL: A new ensemble learning framework for improved linear B-cell epitope prediction. Front. Immunol., 2018, 9, 1695.
[http://dx.doi.org/10.3389/fimmu.2018.01695] [PMID: 30100904]
[31]
Maruyama, O. Heterodimeric protein complex identification by naïve Bayes classifiers. BMC Bioinformatics, 2013, 14, 347.
[http://dx.doi.org/10.1186/1471-2105-14-347] [PMID: 24299017]
[32]
Feng, P.M.; Lin, H.; Chen, W. Identification of antioxidants from sequence information using naïve Bayes. Comput. Math. Methods Med., 2013, 2013 567529
[http://dx.doi.org/10.1155/2013/567529] [PMID: 24062796]
[33]
Zuo, Y.; Jia, C.Z.; Li, T.Y.; Chen, Y. Identification of cancer lectins by split Bi-profile Bayes feature extraction. Curr. Proteomics, 2018, 15(3), 196-200.
[http://dx.doi.org/10.2174/1570164615666180309152924]
[34]
Tang, H.; Chen, W.; Lin, H. Identification of immunoglobulins using Chou’s pseudo amino acid composition with feature selection technique. Mol. Biosyst., 2016, 12(4), 1269-1275.
[http://dx.doi.org/10.1039/C5MB00883B] [PMID: 26883492]
[35]
Zhang, J.; Feng, P.; Lin, H.; Chen, W.; Identifying, R.N.A.N. 6-Methyladenosine sites in Escherichia coli genome. Front. Microbiol., 2018, 9, 955.
[http://dx.doi.org/10.3389/fmicb.2018.00955] [PMID: 29867860]
[36]
Yang, H.; Qiu, W.R.; Liu, G.; Guo, F.B.; Chen, W.; Chou, K.C.; Lin, H. iRSpot-Pse6NC: Identifying recombination spots in Saccharomyces cerevisiae by incorporating hexamer composition into general PseKNC. Int. J. Biol. Sci., 2018, 14(8), 883-891.
[http://dx.doi.org/10.7150/ijbs.24616] [PMID: 29989083]
[37]
Su, Z.D.; Huang, Y.; Zhang, Z.Y.; Zhao, Y.W.; Wang, D.; Chen, W.; Chou, K.C.; Lin, H. iLoc-lncRNA: Predict the subcellular location of lncRNAs by incorporating octamer composition into general PseKNC. Bioinformatics, 2018, 34(24), 4196-4204.
[http://dx.doi.org/10.1093/bioinformatics/bty508] [PMID: 29931187]
[38]
Feng, P.; Yang, H.; Ding, H.; Lin, H.; Chen, W.; Chou, K.C. iDNA6mA-PseKNC: Identifying DNA N(6)-methyladenosine sites by incorporating nucleotide physicochemical properties into PseKNC. Genomics, 2019, 111(1), 96-102.
[http://dx.doi.org/10.1016/j.ygeno.2018.01.005] [PMID: 29360500]
[39]
Ding, H.; Deng, E.Z.; Yuan, L.F.; Liu, L.; Lin, H.; Chen, W.; Chou, K.C. iCTX-type: A sequence-based predictor for identifying the types of conotoxins in targeting ion channels. BioMed Res. Int., 2014, 2014 286419
[http://dx.doi.org/10.1155/2014/286419] [PMID: 24991545]
[40]
Tang, H.; Zhao, Y.W.; Zou, P.; Zhang, C.M.; Chen, R.; Huang, P.; Lin, H. HBPred: A tool to identify growth hormone-binding proteins. Int. J. Biol. Sci., 2018, 14(8), 957-964.
[http://dx.doi.org/10.7150/ijbs.24174] [PMID: 29989085]
[41]
Li, D.; Ju, Y.; Zou, Q. Protein folds prediction with hierarchical structured SVM. Curr. Proteomics, 2016, 13(2), 79-85.
[http://dx.doi.org/10.2174/157016461302160514000940]
[42]
Lai, H.Y.; Chen, X.X.; Chen, W.; Tang, H.; Lin, H. Sequence-based predictive modeling to identify cancerlectins. Oncotarget, 2017, 8(17), 28169-28175.
[http://dx.doi.org/10.18632/oncotarget.15963] [PMID: 28423655]
[43]
Ding, H.; Yang, W.; Tang, H.; Feng, P.M.; Huang, J.; Chen, W.; Lin, H. PHYPred: A tool for identifying bacteriophage enzymes and hydrolases. Virol. Sin., 2016, 31(4), 350-352.
[http://dx.doi.org/10.1007/s12250-016-3740-6] [PMID: 27151186]
[44]
Cao, R.; Wang, Z.; Wang, Y.; Cheng, J. SMOQ: A tool for predicting the absolute residue-specific quality of a single protein model with support vector machines. BMC Bioinformatics, 2014, 15, 120.
[http://dx.doi.org/10.1186/1471-2105-15-120] [PMID: 24776231]
[45]
Manavalan, B.; Subramaniyam, S.; Shin, T.H.; Kim, M.O.; Lee, G. Machine-learning-based prediction of cell-penetrating peptides and their uptake efficiency with improved accuracy. J. Proteome Res., 2018, 17(8), 2715-2726.
[http://dx.doi.org/10.1021/acs.jproteome.8b00148] [PMID: 29893128]
[46]
Manavalan, B.; Shin, T.H.; Lee, G. DHSpred: Support-vector-machine-based human DNase I hypersensitive sites prediction using the optimal features selected by random forest. Oncotarget, 2017, 9(2), 1944-1956.
[PMID: 29416743]
[47]
Ding, C.; Peng, H. Minimum redundancy feature selection from microarray gene expression data. J. Bioinform. Comput. Biol., 2005, 3(2), 185-205.
[http://dx.doi.org/10.1142/S0219720005001004] [PMID: 15852500]
[48]
Breiman, L. Random forests. Mach. Learn., 2001, 45, 5-32.
[http://dx.doi.org/10.1023/A:1010933404324]
[49]
Manavalan, B.; Shin, T.H.; Kim, M.O.; Lee, G. AIPpred: Sequence-based prediction of anti-inflammatory peptides using random forest. Front. Pharmacol., 2018, 9, 276.
[http://dx.doi.org/10.3389/fphar.2018.00276] [PMID: 29636690]
[50]
Chen, W.; Feng, P.; Ding, H.; Lin, H. Classifying included and excluded exons in exon skipping event using histone modifications. Front. Genet., 2018, 9, 433.
[http://dx.doi.org/10.3389/fgene.2018.00433] [PMID: 30327665]
[51]
Jia, S.C.; Hu, X.Z. Using random forest algorithm to predict β-hairpin motifs. Protein Pept. Lett., 2011, 18(6), 609-617.
[http://dx.doi.org/10.2174/092986611795222777] [PMID: 21309739]
[52]
Liao, Z.; Ju, Y.; Zou, Q. Prediction of G-protein-coupled receptors with SVM-Prot features and random forest. Scientifica (Cairo), 2016, 2016 8309253
[http://dx.doi.org/10.1155/2016/8309253] [PMID: 27529053]
[53]
He, W.; Jia, C.; Zou, Q. 4mCPred: Machine learning methods for DNA N4-methylcytosine sites prediction. Bioinformatics, 2019, 35(4), 593-601.
[http://dx.doi.org/10.1093/bioinformatics/bty668] [PMID: 30052767]
[54]
He, W.; Jia, C.; Duan, Y.; Zou, Q. 70ProPred: A predictor for discovering sigma70 promoters based on combining multiple features. BMC Syst. Biol., 2018, 12(4), 44.
[http://dx.doi.org/10.1186/s12918-018-0570-1] [PMID: 29745856]
[55]
McGuffin, L.J.; Bryson, K.; Jones, D.T. The PSIPRED protein structure prediction server. Bioinformatics, 2000, 16(4), 404-405.
[http://dx.doi.org/10.1093/bioinformatics/16.4.404] [PMID: 10869041]
[56]
Petersen, B.; Petersen, T.N.; Andersen, P.; Nielsen, M.; Lundegaard, C. A generic method for assignment of reliability scores applied to solvent accessibility predictions. BMC Struct. Biol., 2009, 9, 51.
[http://dx.doi.org/10.1186/1472-6807-9-51] [PMID: 19646261]
[57]
Feng, P-M.; Chen, W.; Lin, H.; Chou, K-C. iHSP-PseRAAAC: Identifying the heat shock protein families using pseudo reduced amino acid alphabet composition. Anal. Biochem., 2013, 442(1), 118-125.
[http://dx.doi.org/10.1016/j.ab.2013.05.024] [PMID: 23756733]
[58]
Chen, W.; Feng, P.; Liu, T.; Jin, D. Recent advances in machine learning methods for predicting heat shock proteins. Curr. Drug Metab., 2019, 20(3), 224-228.
[PMID: 30378494]
[59]
Wei, L.; Su, R.; Wang, B.; Li, X.; Zou, Q. Integration of deep feature representations and handcrafted features to improve the prediction of N 6-methyladenosine sites. Neurocomputing, 2019, 324, 3-9.
[http://dx.doi.org/10.1016/j.neucom.2018.04.082]
[60]
Yu, L.; Sun, X.; Tian, S.W.; Shi, X.Y.; Yan, Y.L. Drug and nondrug classification based on deep learning with various feature selection strategies. Curr. Bioinform., 2018, 13(3), 253-259.
[http://dx.doi.org/10.2174/1574893612666170125124538]
[61]
Wei, L.; Ding, Y.; Su, R.; Tang, J.; Zou, Q. Prediction of human protein subcellular localization using deep learning. J. Parallel Distrib. Comput., 2018, 117, 212-217.
[http://dx.doi.org/10.1016/j.jpdc.2017.08.009]
[62]
Peng, L.; Peng, M.M.; Liao, B.; Huang, G.H.; Li, W.B.; Xie, D.F. The advances and challenges of deep learning application in biological big data processing. Curr. Bioinform., 2018, 13(4), 352-359.
[http://dx.doi.org/10.2174/1574893612666170707095707]


Rights & PermissionsPrintExport Cite as


Article Details

VOLUME: 27
ISSUE: 4
Year: 2020
Page: [259 - 264]
Pages: 6
DOI: 10.2174/0929866526666190410124642
Price: $65

Article Metrics

PDF: 14