Background: As a crucial component of the entire protein-protein interaction (PPI) network,
protein-peptide interactions are ubiquitous in living cells. These interactions play important roles in
signaling transduction and regulation. Compared with laborious and time-consuming experimental
approaches, predicting protein-peptide interactions with effective computational methods could be
convenient and rapid.
Method: This study proposed a novel method for the prediction of interactions between proteins and
peptides using various features extracted from both proteins and peptides. The traditional amino acid
composition as well as pseudo-amino acid composition and features derived from 205 domains were
utilized to represent a protein-peptide interaction. The predictor was constructed based on four different
machine learning algorithms including SMO (sequential minimal optimization), IB1 (nearest neighbor
algorithm), dagging, and random forest (RF). All features were analyzed by some feature selection
technologies, such as the maximum relevance minimum redundancy method and the incremental feature
selection method, to extract optimal features. Additionally, an optimal predictor based on IB1 was
constructed according to the extracted optimal features.
Results: MCC values of 0.4436 for the cross-validation test of the training set and 0.4444 for the
independent test set were obtained with the IB1 algorithm. Different encoding methods were compared.
The domain-based method outperformed the pseudo-amino acid composition method. An optimal
feature set of 230 features was selected, which contributed most to the prediction of the protein-peptide
Conclusion: Several important domains related to some features in the optimal feature set were deemed
to play key roles in determining the protein-peptide interactions.