Computationally Probing Drug-Protein Interactions Via Support Vector Machine
The past decades witnessed extensive efforts to study the relationships among small molecules (drugs, metabolites, or ligands) and proteins due to the scale and complexity of their physical and genetic interactions. Particularly, computationally predicting the drug-protein interactions is fundamentally important in speeding up the process of developing novel therapeutic agents. Here, we present a supervised learning method, support vector machine (SVM), to predict drugprotein interactions by introducing two machine learning ideas. Firstly, the chemical structure similarity among drugs and the genomic sequence similarity among proteins are intuitively encoded as a feature vector to represent a given drugprotein pair. Secondly, we design an automatic procedure to select a gold-standard negative dataset to deal with the training data imbalance issue, i.e., gold-standard positive data is scarce relative to large scale unlabeled data. Our SVM based predictor is validated on four classes of drug target proteins, including enzymes, ion channels, G-protein couple receptors, and nuclear receptors. We find that our method improves the existing methods regarding to true positive rate upon given false positive rate. The functional annotation analysis and database search indicate that our new predictions are worthy of future experimental validation. In addition, follow-up analysis suggests that our method can partly capture the topological features in the drug-protein interaction network. In conclusion, our new method can efficiently identify the potential drugprotein bindings and will promote the further research in drug discovery.
Keywords: Drug-target interaction, Chemical structure, Protein sequence, Imbalance problem, Support vector machine
Rights & PermissionsPrintExport