Improving Multi-type Gram-negative Bacterial Secreted Protein Prediction via Protein Evolutionary Information and Feature Ranking

Liang	      Kong; Lichao	      Zhang; Shiqian	      He

Abstract

Background: Gram-negative bacteria interact with their environment by secreting a wide range of particular substrates (such as proteins) across two lipid bilayers from the cytoplasm to the extracellular space. Determining the types of secreted proteins is beneficial for further research on secreted proteins and secretion systems.

Objective: As an essential alternative for experimental methods, an accurate machine learningbased multi-type Gram-negative bacterial secreted protein prediction method was proposed in this study.

Methods: The main contribution is combining auto-cross-correlation analysis and feature ranking technology to build an effective support vector machine-based multi-type Gram-negative bacterial secreted protein predictor. The specifically designed auto-cross-correlation descriptor can capture evolutionary correlation information between amino acid pairs along protein sequence from position specific scoring matrices. Feature ranking technique was used to analyze and select the most informative features for building prediction model.

Results: Several kinds of prediction accuracies obtained by independent dataset test are reported on two benchmark datasets. Compared with the state-of-the-art prediction methods, the proposed method improves overall accuracies by 2.91% and 2.25%, respectively.

Conclusion: Our study will provide an important guide to utilize protein evolutionary information for further research on bacterial secreted proteins.

Keywords: Gram-negative bacteria, secreted proteins, position specific scoring matrix, auto-cross correlation, feature ranking, support vector machine.

« Previous Next »

Graphical Abstract

[1] 
Costa TRD, Felisberto-Rodrigues C, Meir A, et al. Secretion systems in Gram-negative bacteria: structural and mechanistic insights. Nat Rev Microbiol  2015; 13(6): 343-59.
[http://dx.doi.org/10.1038/nrmicro3456] [PMID: 25978706] 
[2] 
Desvaux M, Hébraud M, Talon R, Henderson IR. Secretion and subcellular localizations of bacterial proteins: a semantic awareness issue. Trends Microbiol  2009; 17(4): 139-45.
[http://dx.doi.org/10.1016/j.tim.2009.01.004] [PMID: 19299134] 
[3] 
Yu L, Luo J, Guo Y, Li Y, Pu X, Li M. In silico identification of Gram-negative bacterial secreted proteins from primary sequence. Comput Biol Med  2013; 43(9): 1177-81.
[http://dx.doi.org/10.1016/j.compbiomed.2013.06.001] [PMID: 23930811] 
[4] 
Luo J, Li W, Liu Z, Guo Y, Pu X, Li M. A sequence-based two-level method for the prediction of type I secreted RTX proteins. Analyst (Lond)  2015; 140(9): 3048-56.
[http://dx.doi.org/10.1039/C5AN00311C] [PMID: 25800819] 
[5] 
An Y, Wang J, Li C, et al. Comprehensive assessment and performance improvement of effector protein predictors for bacterial secretion systems III, IV and VI. Brief Bioinform  2018; 19(1): 148-61.
[PMID: 27777222] 
[6] 
Wang J, Yang B, Leier A, et al. Bastion6: a bioinformatics approach for accurate prediction of type VI secreted effectors. Bioinformatics  2018; 34(15): 2546-55.
[http://dx.doi.org/10.1093/bioinformatics/bty155] [PMID: 29547915] 
[7] 
Arnold R, Brandmaier S, Kleine F, et al. Sequence-based prediction of type III secreted proteins. PLoS Pathog  2009; 5(4)e1000376
[http://dx.doi.org/10.1371/journal.ppat.1000376] [PMID: 19390696] 
[8] 
Dong X, Zhang YJ, Zhang Z. Using weakly conserved motifs hidden in secretion signals to identify type-III effectors from bacterial pathogen genomes. PLoS One  2013; 8(2)e56632
[http://dx.doi.org/10.1371/journal.pone.0056632] [PMID: 23437191] 
[9] 
Wang Y, Zhang Q, Sun MA, Guo D. High-accuracy prediction of bacterial type III secreted effectors based on position-specific amino acid composition profiles. Bioinformatics  2011; 27(6): 777-84.
[http://dx.doi.org/10.1093/bioinformatics/btr021] [PMID: 21233168] 
[10] 
Yang Y, Qi S. A new feature selection method for computational prediction of type III secreted effectors. Int J Data Min Bioinform  2014; 10(4): 440-54.
[http://dx.doi.org/10.1504/IJDMB.2014.064894] [PMID: 25946888] 
[11] 
Panina EM, Mattoo S, Griffith N, Kozak NA, Yuk MH, Miller JF. A genome-wide screen identifies a Bordetella type III secretion effector and candidate effectors in other species. Mol Microbiol  2005; 58(1): 267-79.
[http://dx.doi.org/10.1111/j.1365-2958.2005.04823.x] [PMID: 16164564] 
[12] 
Dong X, Lu X, Zhang Z. BEAN 2.0: an integrated web resource for the identification and functional analysis of type III secreted effectors. Database (Oxford)  2015; 2015bav064
[13] 
Zou L, Nan C, Hu F. Accurate prediction of bacterial type IV secreted effectors using amino acid composition and PSSM profiles. Bioinformatics  2013; 29(24): 3135-42.
[http://dx.doi.org/10.1093/bioinformatics/btt554] [PMID: 24064423] 
[14] 
Yu L, Liu F, Du L, Li Y. An improved approach for rapidly identifying different types of Gram-negative bacterial secreted proteins. Nat Sci  2018; 10: 168-77.
[http://dx.doi.org/10.4236/ns.2018.105018] 
[15] 
Kong L, Zhang L. An ensemble method for multi-type Gram-negative bacterial secreted protein prediction by integrating different PSSM-based features. SAR QSAR Environ Res  2019; 30(3): 181-94.
[http://dx.doi.org/10.1080/1062936X.2019.1573438] [PMID: 30739484] 
[16] 
Altschul SF, Koonin EV. Iterated profile searches with PSI-BLAST--a tool for discovery in protein databases. Trends Biochem Sci  1998; 23(11): 444-7.
[http://dx.doi.org/10.1016/S0968-0004(98)01298-5] [PMID: 9852764] 
[17] 
Kampenusa I, Zikmanis P. Distinctive attributes for predicted secondary structures at terminal sequences of non-classically secreted proteins from proteobacteria. Cent Eur J Biol  2008; 3: 320-6.
[18] 
Fu L, Niu B, Zhu Z, Wu S, Li W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics  2012; 28(23): 3150-2.
[http://dx.doi.org/10.1093/bioinformatics/bts565] [PMID: 23060610] 
[19] 
Chou KC. Some remarks on protein attribute prediction and pseudo amino acid composition. J Theor Biol  2011; 273(1): 236-47.
[http://dx.doi.org/10.1016/j.jtbi.2010.12.024] [PMID: 21168420] 
[20] 
Wang J, Yang B, Revote J, et al. POSSUM: a bioinformatics toolkit for generating numerical sequence feature descriptors based on PSSM profiles. Bioinformatics  2017; 33(17): 2756-8.
[http://dx.doi.org/10.1093/bioinformatics/btx302] [PMID: 28903538] 
[21] 
Kong L, Kong L, Wang C, Jing R, Zhang L. Predicting protein structural class for low-similarity sequences via novel evolutionary modes of PseAAC and recursive feature elimination. Lett Org Chem  2017; 14: 673-83.
[http://dx.doi.org/10.2174/1570178614666170511165837] 
[22] 
Zhang L, Zhao X, Kong L. Predict protein structural class for low-similarity sequences by evolutionary difference information into the general form of Chou’s pseudo amino acid composition. J Theor Biol  2014; 355: 105-10.
[http://dx.doi.org/10.1016/j.jtbi.2014.04.008] [PMID: 24735902] 
[23] 
Dong Q, Zhou S, Guan J. A new taxonomy-based protein fold recognition approach based on autocross-covariance transformation. Bioinformatics  2009; 25(20): 2655-62.
[http://dx.doi.org/10.1093/bioinformatics/btp500] [PMID: 19706744] 
[24] 
Xia J, Peng Z, Qi D, Mu H, Yang J. An ensemble approach to protein fold classification by integration of template-based assignment and support vector machine classifier. Bioinformatics  2017; 33(6): 863-70.
[PMID: 28039166] 
[25] 
Moran PAP. Notes on continuous stochastic phenomena. Biometrika  1950; 37(1-2): 17-23.
[http://dx.doi.org/10.1093/biomet/37.1-2.17] [PMID: 15420245] 
[26] 
Rao HB, Zhu F, Yang GB, Li ZR, Chen YZ. Update of PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence Nucleic Acids Res  2011; 39(Web Server issue): W385-90.
[http://dx.doi.org/10.1093/nar/gkr284] 
[27] 
Saeys Y, Inza I, Larrañaga P. A review of feature selection techniques in bioinformatics. Bioinformatics  2007; 23(19): 2507-17.
[http://dx.doi.org/10.1093/bioinformatics/btm344] [PMID: 17720704] 
[28] 
Feng PM, Lin H, Chen W. Identification of antioxidants from sequence information using naïve Bayes. Comput Math Methods Med  2013; 2013567529
[http://dx.doi.org/10.1155/2013/567529] [PMID: 24062796] 
[29] 
Feng PM, Ding H, Chen W, Lin H. Naïve Bayes classifier with feature selection to identify phage virion proteins. Comput Math Methods Med  2013; 2013530696
[http://dx.doi.org/10.1155/2013/530696] [PMID: 23762187] 
[30] 
Cortes C, Vapnik V. Support-vector networks. Mach Learn  1995; 20: 273-97.
[http://dx.doi.org/10.1007/BF00994018] 
[31] 
Chen W, Ding H, Zhou X, Lin H, Chou KC. iRNA(m6A)-PseDNC: Identifying N6-methyladenosine sites using pseudo dinucleotide composition. Anal Biochem  2018; 561-562: 59-65.
[http://dx.doi.org/10.1016/j.ab.2018.09.002] [PMID: 30201554] 
[32] 
Chen W, Feng P, Yang H, Ding H, Lin H, Chou KC. iRNA-3typea: Identifying three types of modification at RNA’s adenosine sites. Mol Ther Nucleic Acids  2018; 11: 468-74.
[http://dx.doi.org/10.1016/j.omtn.2018.03.012] [PMID: 29858081] 
[33] 
Zhang L, Kong L. iRSpot-ADPM: Identify recombination spots by incorporating the associated dinucleotide product model into Chou’s pseudo components. J Theor Biol  2018; 441: 1-8.
[http://dx.doi.org/10.1016/j.jtbi.2017.12.025] [PMID: 29305179] 
[34] 
Kong L, Zhang L, Lv J. Accurate prediction of protein structural classes by incorporating predicted secondary structure information into the general form of Chou’s pseudo amino acid composition. J Theor Biol  2014; 344: 12-8.
[http://dx.doi.org/10.1016/j.jtbi.2013.11.021] [PMID: 24316044] 
[35] 
Chang CC, Lin CJ. LIBSVM: A library for support vector machines. ACM Trans Intell Syst Technol  2011; 2: 389-96.
[http://dx.doi.org/10.1145/1961189.1961199] 
[36] 
Wei L, Xing P, Zeng J, Chen J, Su R, Guo F. Improved prediction of protein-protein interactions using novel negative samples, features, and an ensemble classifier. Artif Intell Med  2017; 83: 67-74.
[http://dx.doi.org/10.1016/j.artmed.2017.03.001] [PMID: 28320624] 
[37] 
Wei L, Wan S, Guo J, Wong KK. A novel hierarchical selective ensemble classifier with bioinformatics application. Artif Intell Med  2017; 83: 82-90.
[http://dx.doi.org/10.1016/j.artmed.2017.02.005] [PMID: 28245947] 
[38] 
Dao FY, Lv H, Wang F, et al. Identify origin of replication in Saccharomyces cerevisiae using two-step feature selection technique. Bioinformatics  2019; 35(12): 2075-83.
[http://dx.doi.org/10.1093/bioinformatics/bty943] [PMID: 30428009] 
[39] 
Feng CQ, Zhang ZY, Zhu XJ, et al. iTerm-PseKNC: a sequence-based tool for predicting bacterial transcriptional terminators. Bioinformatics  2019; 35(9): 1469-77.
[http://dx.doi.org/10.1093/bioinformatics/bty827] [PMID: 30247625] 
[40] 
Chen W, Lv H, Nie F, Lin H. i6mA-Pred: identifying DNA N6-methyladenine sites in the rice genome. Bioinformatics  2019; 35(16): 2796-800.
[http://dx.doi.org/10.1093/bioinformatics/btz015] [PMID: 30624619] 

Rights & Permissions Print Cite

Article Metrics

20

Journal Information

For Authors

For Editors

For Reviewers

Explore Articles

Open Access

Open Access Articles

For Visitors

DOI https://dx.doi.org/10.2174/1574893614666190730105629	Print ISSN 1574-8936
Publisher Name Bentham Science Publisher	Online ISSN 2212-392X

Current Bioinformatics

Improving Multi-type Gram-negative Bacterial Secreted Protein Prediction via Protein Evolutionary Information and Feature Ranking

Abstract

Graphical Abstract

Related Journals

Related Books