Background: Drug-Target Interactions (DTI) play a crucial role in discovering new drug
candidates and finding new proteins to target for drug development. Although the number of detected
DTI obtained by high-throughput techniques has been increasing, the number of known DTI is still
limited. On the other hand, the experimental methods for detecting the interactions among drugs and
proteins are costly and inefficient.
Objective: Therefore, computational approaches for predicting DTI are drawing increasing attention in
recent years. In this paper, we report a novel computational model for predicting the DTI using extremely
randomized trees model and protein amino acids information.
Method: More specifically, the protein sequence is represented as a Pseudo Substitution Matrix Representation
(Pseudo-SMR) descriptor in which the influence of biological evolutionary information is
retained. For the representation of drug molecules, a novel fingerprint feature vector is utilized to describe
its substructure information. Then the DTI pair is characterized by concatenating the two vector
spaces of protein sequence and drug substructure. Finally, the proposed method is explored for predicting
the DTI on four benchmark datasets: Enzyme, Ion Channel, GPCRs and Nuclear Receptor.
Results: The experimental results demonstrate that this method achieves promising prediction accuracies
of 89.85%, 87.87%, 82.99% and 81.67%, respectively. For further evaluation, we compared the
performance of Extremely Randomized Trees model with that of the state-of-the-art Support Vector
Machine classifier. And we also compared the proposed model with existing computational models,
and confirmed 15 potential drug-target interactions by looking for existing databases.
Conclusion: The experiment results show that the proposed method is feasible and promising for predicting
drug-target interactions for new drug candidate screening based on sizeable features.