Phages are widely distributed in locations populated by bacterial hosts. Phage proteins can
be divided into two main categories, that is, virion and non-virion proteins with different functions. In
practice, people mainly use phage virion proteins to clarify the lysis mechanism of bacterial cells and
develop new antibacterial drugs. Accurate identification of phage virion proteins is therefore essential
to understanding the phage lysis mechanism. Although some computational methods have been focused
on identifying virion proteins, the result is not satisfying which gives more room for improvement.
In this study, a new sequence-based method was proposed to identify phage virion proteins using
g-gap tripeptide composition. In this approach, the protein features were firstly extracted from the ggap
tripeptide composition. Subsequently, we obtained an optimal feature subset by performing incremental
feature selection (IFS) with information gain. Finally, the support vector machine (SVM) was
used as the classifier to discriminate virion proteins from non-virion proteins. In 10-fold crossvalidation
test, our proposed method achieved an accuracy of 97.40% with AUC of 0.9958, which outperforms
state-of-the-art methods. The result reveals that our proposed method could be a promising
method in the work of phage virion proteins identification.
Keywords: Phage virion proteins, g-gap tripeptide composition, SVM, IFS, information gain, 10-fold cross validation.
Rights & PermissionsPrintExport