Background: Gram-negative bacteria interact with their environment by secreting a
wide range of particular substrates (such as proteins) across two lipid bilayers from the cytoplasm
to the extracellular space. Determining the types of secreted proteins is beneficial for further
research on secreted proteins and secretion systems.
Objective: As an essential alternative for experimental methods, an accurate machine learningbased
multi-type Gram-negative bacterial secreted protein prediction method was proposed in this
Methods: The main contribution is combining auto-cross-correlation analysis and feature ranking
technology to build an effective support vector machine-based multi-type Gram-negative bacterial
secreted protein predictor. The specifically designed auto-cross-correlation descriptor can capture
evolutionary correlation information between amino acid pairs along protein sequence from
position specific scoring matrices. Feature ranking technique was used to analyze and select the
most informative features for building prediction model.
Results: Several kinds of prediction accuracies obtained by independent dataset test are reported
on two benchmark datasets. Compared with the state-of-the-art prediction methods, the proposed
method improves overall accuracies by 2.91% and 2.25%, respectively.
Conclusion: Our study will provide an important guide to utilize protein evolutionary information
for further research on bacterial secreted proteins.