Recent Advances in the Machine Learning-Based Drug-Target Interaction Prediction

Author(s): Wen Zhang*, Weiran Lin, Ding Zhang, Siman Wang, Jingwen Shi, Yanqing Niu.

Journal Name: Current Drug Metabolism

Volume 20 , Issue 3 , 2019

Submit Manuscript
Submit Proposal

Graphical Abstract:


Background: The identification of drug-target interactions is a crucial issue in drug discovery. In recent years, researchers have made great efforts on the drug-target interaction predictions, and developed databases, software and computational methods.

Results: In the paper, we review the recent advances in machine learning-based drug-target interaction prediction. First, we briefly introduce the datasets and data, and summarize features for drugs and targets which can be extracted from different data. Since drug-drug similarity and target-target similarity are important for many machine learning prediction models, we introduce how to calculate similarities based on data or features. Different machine learningbased drug-target interaction prediction methods can be proposed by using different features or information. Thus, we summarize, analyze and compare different machine learning-based prediction methods.

Conclusion: This study provides the guide to the development of computational methods for the drug-target interaction prediction.

Keywords: Machine learning, drug-target interaction, drug discovery, drug repurposing, molecular fingerprint, similarity measure.

Breckenridge, A.M. Clinical pharmacology and therapeutics. BMJ, 1995, 310, 377-380.
Adams, C.; Brantner, V.V. Estimating the cost of new drug development: Is it really $802 million?: Variations in cost estimates suggest that policymakers should not use a single number to characterize drug costs. Health Aff., 2006, 25, 420-428.
Russ, A.; Lampel, S. The druggable genome: An update. Drug Discov. Today, 2005, 10, 1607-1610.
Knox, C.; Law, V.; Jewison, T.; Liu, P.; Ly, S.; Frolkis, A.; Pon, A.; Banco, K.; Mak, C.; Neveu, V.; Djoumbou, Y.; Eisner, R.; Guo, A.C.; Wishart, D.S. DrugBank 3.0: a comprehensive resource for omics research on drugs. Nucleic Acids Res., 2011, 39, D1035-D1041.
Law, V.; Knox, C.; Djoumbou, Y.; Jewison, T.; Guo, A.C.; Liu, Y.; Maciejewski, A.; Arndt, D.; Wilson, M.; Neveu, V.; Tang, A.; Gabriel, G.; Ly, C.; Adamjee, S.; Dame, Z.T.; Han, B.; Zhou, Y.; Wishart, D.S. DrugBank 4.0: shedding new light on drug metabolism. Nucleic Acids Res., 2014, 42, D1091-D1097.
Kanehisa, M.; Goto, S.; Furumichi, M.; Tanabe, M.; Hirakawa, M. KEGG for representation and analysis of molecular networks involving diseases and drugs. Nucleic Acids Res., 2009, 38, D355-D360.
Kanehisa, M.; Furumichi, M.; Tanabe, M.; Sato, Y.; Morishima, K. KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res., 2017, 45, D353-D361.
Gaulton, A.; Hersey, A.; Nowotka, M.; Bento, A.P.; Chambers, J.; Mendez, D.; Mutowo, P.; Atkinson, F.; Bellis, L.J.; Cibrian-Uhalte, E.; Davies, M.; Dedman, N.; Karlsson, A.; Magarinos, M.P.; Overington, J.P.; Papadatos, G.; Smit, I.; Leach, A.R. The ChEMBL database in 2017. Nucleic Acids Res., 2017, 45, D945-D954.
Szklarczyk, D.; Santos, A.; von Mering, C.; Jensen, L.J.; Bork, P.; Kuhn, M. STITCH 5: augmenting protein-chemical interaction networks with tissue and affinity data. Nucleic Acids Res., 2016, 44, D380-D384.
Gunther, S.; Kuhn, M.; Dunkel, M.; Campillos, M.; Senger, C.; Petsalaki, E.; Ahmed, J.; Urdiales, E.G.; Gewiess, A.; Jensen, L.J.; Schneider, R.; Skoblo, R.; Russell, R.B.; Bourne, P.E.; Bork, P.; Preissner, R. SuperTarget and Matador: resources for exploring drug-target relationships. Nucleic Acids Res., 2008, 36, D919-D922.
Hecker, N.; Ahmed, J.; Von Eichborn, J.; Dunkel, M.; Macha, K.; Eckert, A.; Gilson, M.K.; Bourne, P.E.; Preissner, R. SuperTarget goes quantitative: update on drug-target interactions. Nucleic Acids Res., 2012, 40, D1113-D1117.
Li, Y.H.; Yu, C.Y.; Li, X.X.; Zhang, P.; Tang, J.; Yang, Q.; Fu, T.; Zhang, X.; Cui, X.; Tu, G.; Zhang, Y.; Li, S.; Yang, F.; Sun, Q.; Qin, C.; Zeng, X.; Chen, Z.; Chen, Y.Z.; Zhu, F. Therapeutic target database update 2018: Enriched resource for facilitating bench-to-clinic research of targeted therapeutics. Nucleic Acids Res., 2018, 46(D1), D1121-D1127.
Gilson, M.K.; Liu, T.; Baitaluk, M.; Nicola, G.; Hwang, L.; Chong, J. BindingDB in 2015: A public database for medicinal chemistry, computational chemistry and systems pharmacology. Nucleic Acids Res., 2016, 44, D1045-D1053.
Placzek, S.; Schomburg, I.; Chang, A.; Jeske, L.; Ulbrich, M.; Tillack, J.; Schomburg, D. Brenda in 2017: New perspectives and new tools in BRENDA. Nucleic Acids Res., 2017, 45, D380-D388.
Kim, S.; Thiessen, P.A.; Bolton, E.E.; Chen, J.; Fu, G.; Gindulyte, A.; Han, L.; He, J.; He, S.; Shoemaker, B.A. PubChem Substance and Compound databases. Nucleic Acids Res., 2016, 44, 1202-1213.
Consortium, T.U. UniProt: the universal protein knowledgebase. Nucleic Acids Res., 2017, 45, D158-D169.
Chatr-Aryamontri, A.; Oughtred, R.; Boucher, L.; Rust, J.; Chang, C.; Kolas, N.K.; O Donnell, L.; Oster, S.; Theesfeld, C.; Sellam, A.; Stark, C.; Breitkreutz, B.J.; Dolinski, K.; Tyers, M. The BioGRID interaction database: 2017 update. Nucleic Acids Res., 2017, 45, D369-D379.
Kuhn, M.; Letunic, I.; Jensen, L.J.; Bork, P. The SIDER database of drugs and side effects. Nucleic Acids Res., 2016, 44, D1075-D1079.
Ashburner, M.; Ball, C.A.; Blake, J.A.; Botstein, D.; Butler, H.; Cherry, J.M.; Davis, A.P.; Dolinski, K.; Dwight, S.S.; Eppig, J.T. Gene Ontology: Tool for the unification of biology. Nat. Genet., 2000, 25, 25-29.
Finn, R.D.; Coggill, P.; Eberhardt, R.Y.; Eddy, S.R.; Mistry, J.; Mitchell, A.L.; Potter, S.C.; Punta, M.; Qureshi, M.; Sangrador-Vegas, A. The Pfam protein families database: Towards a more sustainable future. Nucleic Acids Res., 2016, 44, D279-D285.
Willighagen, E.L.; Mayfield, J.W.; Alvarsson, J.; Berg, A.; Carlsson, L.; Jeliazkova, N.; Kuhn, S.; Pluskal, T.; Rojas-Cherto, M.; Spjuth, O.; Torrance, G.; Evelo, C.T.; Guha, R.; Steinbeck, C. Erratum to: The Chemistry Development Kit (CDK) v2.0: Atom typing, depiction, molecular formulas, and substructure searching. J. Cheminform., 2017, 9, 53.
Yap, C.W. PaDEL-descriptor: An open source software to calculate molecular descriptors and fingerprints. J. Comput. Chem., 2011, 32, 1466-1474.
O Boyle, N.M.; Banck, M.; James, C.A.; Morley, C.; Vandermeersch, T.; Hutchison, G.R. Open Babel: An open chemical toolbox. J. Cheminform., 2011, 3, 33.
Hinselmann, G.; Rosenbaum, L.; Jahn, A.; Fechner, N.; Zell, A. jCompoundMapper: An open source Java library and command-line tool for chemical fingerprints. J. Cheminform., 2011, 3, 3.
Cao, D.S.; Xu, Q.S.; Hu, Q.N.; Liang, Y.Z. ChemoPy: freely available python package for computational biology and chemoinformatics. Bioinformatics, 2013, 29, 1092-1094.
Dong, J.; Cao, D.S.; Miao, H.Y.; Liu, S.; Deng, B.C.; Yun, Y.H.; Wang, N.N.; Lu, A.P.; Zeng, W.B.; Chen, A.F. ChemDes: an integrated web-based platform for molecular descriptor and fingerprint computation. J. Cheminform., 2015, 7, 60.
Liu, B.; Liu, F.; Fang, L.; Wang, X.; Chou, K-C. repDNA: A Python package to generate various modes of feature vectors for DNA sequences by incorporating user-defined physicochemical properties and sequence-order effects. Bioinformatics, 2014, 31, 1307-1309.
Liu, B.; Liu, F.; Fang, L.; Wang, X.; Chou, K-C. repRNA: A web server for generating various feature vectors of RNA sequences. Mol. Genet. Genomics, 2016, 291, 473-481.
Liu, B. BioSeq-Analysis: A platform for DNA, RNA and protein sequence analysis based on machine learning approaches. Brief. Bioinform., 2017.
Chen, W.; Feng, P-M.; Lin, H.; Chou, K-C. iSS-PseDNC: Identifying splicing sites using pseudo dinucleotide composition. BioMed Res. Int., 2014, 2014, 623149.
Chen, W.; Yang, H.; Feng, P.; Ding, H.; Lin, H. iDNA4mC: Identifying DNA N4-methylcytosine sites based on nucleotide chemical properties. Bioinformatics, 2017, 33, 3518-3523.
Liu, X.; Ding, J.; Gong, F. piRNA identification based on motif discovery. Mol. Biosyst., 2014, 10, 3075-3080.
Li, D.; Luo, L.; Zhang, W.; Liu, F.; Luo, F. A genetic algorithm-based weighted ensemble method for predicting transposon-derived piRNAs. BMC Bioinformatics, 2016, 17, 329.
Luo, L.; Li, D.; Zhang, W.; Tu, S.; Zhu, X.; Tian, G. Accurate prediction of transposon-derived piRNAs by integrating various sequential and physicochemical features. PLoS One, 2016, 11, e0153268.
Rao, H.; Zhu, F.; Yang, G.; Li, Z.; Chen, Y.Z. Update of PROFEAT: A web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence. Nucleic Acids Res., 2006, 39, 385-390.
Chen, W.; Zhang, X.; Brooker, J.; Lin, H.; Zhang, L.; Chou, K-C. PseKNC-General: A cross-platform package for generating various modes of pseudo nucleotide compositions. Bioinformatics, 2014, 31, 119-120.
Liu, B.; Liu, F.; Wang, X.; Chen, J.; Fang, L.; Chou, K. Pse-in-One: A web server for generating various modes of pseudo components of DNA, RNA, and protein sequences. Nucleic Acids Res., 2015, 43, W65-W71.
Zhang, W.; Qu, Q.; Zhang, Y.; Wang, W. The linear neighborhood propagation method for predicting long non-coding RNA-protein interactions. Neurocomputing, 2018, 273, 526-534.
Zhang, W.; Yue, X.; Liu, F.; Chen, Y.; Tu, S.; Zhang, X. A unified frame of predicting side effects of drugs by using linear neighborhood similarity. BMC Syst. Biol., 2017, 11, 101.
Zhang, W.; Chen, Y.; Tu, S.; Liu, F.; Qu, Q. Drug side effect prediction through linear neighborhoods and multiple data source integration. IEEE Int. Conf. Bioinformatics Biomed. (BIBM), 2016, pp. 427-434.
Zhang, W.; Chen, Y.; Liu, F.; Luo, F.; Tian, G.; Li, X. Predicting potential drug-drug interactions by integrating chemical, biological, phenotypic and network data. BMC Bioinformatics, 2017, 18, 18.
Hattori, M.; Tanaka, N.; Kanehisa, M.; Goto, S. SIMCOMP/SUBCOMP: Chemical structure search servers for network analyses. Nucleic Acids Res., 2010, 38, W652-W656.
Zou, Q.; Hu, Q.; Guo, M.; Wang, G. HAlign: Fast multiple similar DNA/RNA sequence alignment based on the centre star strategy. Bioinformatics, 2015, 31, 2475-2481.
Su, W.; Liao, X.; Lu, Y.; Zou, Q.; Peng, S. Multiple sequence alignment based on a suffix tree and center-star strategy: A linear method for multiple nucleotide sequence alignment on spark parallel framework. J. Comput. Biol., 2017, 24, 1230-1242.
Wan, S.; Zou, Q. HAlign-II: Efficient ultra-large multiple sequence alignment and phylogenetic tree reconstruction with distributed and parallel computing. Algorithms Mol. Biol., 2017, 12, 25.
Bleakley, K.; Yamanishi, Y. Supervised prediction of drug-target interactions using bipartite local models. Bioinformatics, 2009, 25, 2397-2403.
Yamanishi, Y.; Kotera, M.; Kanehisa, M.; Goto, S. Drug-target interaction prediction from chemical, genomic and pharmacological data in an integrated framework. Bioinformatics, 2010, 26, i246-i254.
Tabei, Y.; Pauwels, E.; Stoven, V.; Takemoto, K.; Yamanishi, Y. Identification of chemogenomic features from drug-target interaction networks using interpretable classifiers. Bioinformatics, 2012, 28, i487-i494.
Mei, J.P.; Kwoh, C.K.; Yang, P.; Li, X.L.; Zheng, J. Drug-target interaction prediction by learning from local information and neighbors. Bioinformatics, 2013, 29, 238-245.
Liu, H.; Sun, J.; Guan, J.; Zheng, J.; Zhou, S. Improving compound-protein interaction prediction by building up highly credible negative samples. Bioinformatics, 2015, 31, i221-i229.
Mousavian, Z.; Khakabimamaghani, S.; Kavousi, K.; Masoudi-Nejad, A. Drug-target interaction prediction from PSSM based evolutionary information. J. Pharmacol. Toxicol. Methods, 2015, 78, 42-51.
Ezzat, A.; Wu, M.; Li, X.L.; Kwoh, C.K. Drug-target interaction prediction via class imbalance-aware ensemble learning. BMC Bioinformatics, 2016, 17, 509.
Ding, Y.J.; Tang, J.J.; Guo, F. Identification of drug-target interactions via multiple information integration. Inf. Sci., 2017, 418, 546-560.
Keum, J.; Nam, H. SELF-BLM: Prediction of drug-target interactions via self-training SVM. PLoS One, 2017, 12, 16.
Li, Z.; Han, P.; You, Z.H.; Li, X.; Zhang, Y.; Yu, H.; Nie, R.; Chen, X. In silico prediction of drug-target interaction networks based on drug chemical structure and protein sequences. Sci. Rep., 2017, 7, 11174.
Meng, F.R.; You, Z.H.; Chen, X.; Zhou, Y.; An, J.Y. Prediction of drug-target interaction networks from the integration of protein sequences and drug chemical structures. Molecules, 2017, 22, E1119.
Peng, L.H.; Zhu, W.; Liao, B.; Duan, Y.; Chen, M.; Chen, Y.; Yang, J.L. Screening drug-target interactions with positive-unlabeled learning. Sci. Rep., 2017, 7, 17.
Chen, X.; Liu, M.X.; Yan, G.Y. Drug-target interaction prediction by random walk on the heterogeneous network. Mol. Biosyst., 2012, 8, 1970-1978.
Cheng, F.; Liu, C.; Jiang, J.; Lu, W.; Li, W.; Liu, G.; Zhou, W.; Huang, J.; Tang, Y. Prediction of drug-target interactions and drug repositioning via network-based inference. PLOS Comput. Biol., 2012, 8, e1002503.
Alaimo, S.; Pulvirenti, A.; Giugno, R.; Ferro, A. Drug-target interaction prediction through domain-tuned network-based inference. Bioinformatics, 2013, 29, 2004-2008.
Emig, D.; Ivliev, A.; Pustovalova, O.; Lancashire, L.; Bureeva, S.; Nikolsky, Y.; Bessarabova, M. Drug target prediction and repositioning using an integrated network-based approach. PLoS One, 2013, 8, e60618.
Re, M.; Valentini, G. Network-based drug ranking and repositioning with respect to DrugBank therapeutic categories. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 2013, 10, 1359-1371.
Yu, W.; Yan, Y.; Liu, Q.; Wang, J.; Jiang, Z. Predicting drug-target interaction networks of human diseases based on multiple feature information. Pharmacogenomics, 2013, 14, 1701-1707.
Alaimo, S.; Bonnici, V.; Cancemi, D.; Ferro, A.; Giugno, R.; Pulvirenti, A. DT-Web: A web-based application for drug-target interaction and drug combination prediction through domain-tuned network-based inference. BMC Syst. Biol., 2015, 9(Suppl. 3), S4.
Seal, A.; Ahn, Y.Y.; Wild, D.J. Optimizing drug-target interaction prediction based on random walk on heterogeneous networks. J. Cheminform., 2015, 7, 40.
Yan, X.Y.; Zhang, S.W.; Zhang, S.Y. Prediction of drug-target interaction by label propagation with mutual interaction information derived from heterogeneous network. Mol. Biosyst., 2016, 12(2), 520-531.
Fu, G.; Ding, Y.; Seal, A.; Chen, B.; Sun, Y.Z.; Bolton, E. Predicting drug target interactions using meta-path-based semantic network analysis. BMC Bioinformatics, 2016, 17, 10.
Li, Z.C.; Huang, M.H.; Zhong, W.Q.; Liu, Z.Q.; Xie, Y.; Dai, Z.; Zou, X.Y. Identification of drug-target interaction from interactome network with guilt-by-association principle and topology features. Bioinformatics, 2016, 32, 1057-1064.
Lu, Y.D.; Guo, Y.F.; Korhonen, A. Link prediction in drug-target interactions network using similarity indices. BMC Bioinformatics, 2017, 18, 9.
Luo, Y.; Zhao, X.; Zhou, J.; Yang, J.; Zhang, Y.; Kuang, W.; Peng, J.; Chen, L.; Zeng, J. A network integration approach for drug-target interaction prediction and computational drug repositioning from heterogeneous information. Nat. Commun., 2017, 8, 573.
Wu, Z.R.; Cheng, F.X.; Li, J.; Li, W.H.; Liu, G.X.; Tang, Y. SDTNBI: An integrated network and chemoinformatics tool for systematic prediction of drug-target interactions and drug repositioning. Brief. Bioinform., 2017, 18, 333-347.
Gonen, M. Predicting drug-target interactions from chemical and genomic kernels using Bayesian matrix factorization. Bioinformatics, 2012, 28, 2304-2310.
Zheng, X.; Ding, H.; Mamitsuka, H.; Zhu, S. In: Collaborative Matrix Factorization With Multiple Similarities For Predicting Drug-Target Interactions, Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining. 2013.
Liu, Y.; Wu, M.; Miao, C.; Zhao, P.; Li, X.L. Neighborhood regularized logistic matrix factorization for drug-target interaction prediction. PLOS Comput. Biol., 2016, 12, e1004760.
Hao, M.; Bryant, S.H.; Wang, Y.L. Predicting drug-target interactionsby dual-network integrated logistic matrix factorization. Sci. Rep., 2017, 7, 11.
Ezzat, A.; Zhao, P.L.; Wu, M.; Li, X.L.; Kwoh, C.K. Drug-target interaction prediction with graph regularized matrix factorization. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 2017, 14, 646-656.
Peska, L.; Buza, K.; Koller, J. Drug-target interaction prediction: A Bayesian ranking approach. Comput. Methods Programs Biomed., 2017, 152, 15-21.
Hu, P.W.; Chan, K.C.C.; You, Z.H. Large-scale prediction of drugtarget interactions from deep representations. In. 2016 International Joint Conference on Neural Networks, IEEE: New York, 2016; pp. 1236-1243.
Yuan, Q.; Gao, J.; Wu, D.; Zhang, S.; Mamitsuka, H.; Zhu, S. DrugE-Rank: Improving drug-target interaction prediction of new candidate drugs or targets by ensemble learning to rank. Bioinformatics, 2016, 32, i18-i27.
Tian, K.; Shao, M.; Wang, Y.; Guan, J.; Zhou, S. Boosting compound-protein interaction prediction by deep learning. Methods, 2016, 110, 64-72.
Wen, M.; Zhang, Z.; Niu, S.; Sha, H.; Yang, R.; Yun, Y.; Lu, H. Deep-learning-based drug-target interaction prediction. J. Proteome Res., 2017, 16, 1401-1409.
Zong, N.S.; Kim, H.; Ngo, V.; Harismendy, O. Deep mining heterogeneous networks of biomedical linked data to predict novel drug-target associations. Bioinformatics, 2017, 33, 2337-2344.
Jacob, L.; Vert, J.P. Protein-ligand interaction prediction: An improved chemogenomics approach. Bioinformatics, 2008, 24, 2149-2156.
Xia, Z.; Wu, L.Y.; Zhou, X.; Wong, S.T. Semi-supervised drug-protein interaction prediction from heterogeneous biological spaces. BMC Syst. Biol., 2010, 4(Suppl. 2), S6.
van Laarhoven, T.; Nabuurs, S.B.; Marchiori, E. Gaussian interaction profile kernels for predicting drug-target interaction. Bioinformatics, 2011, 27, 3036-3043.
Shang, F.; Jiao, L.C.; Liu, Y. Integrating spectral kernel learning and constraints in semi-supervised classification. Neural Process. Lett., 2012, 36, 101-115.
Nascimento, A.C.; Prudencio, R.B.; Costa, I.G. A multiple kernel learning algorithm for drug-target interaction prediction. BMC Bioinformatics, 2016, 17, 46.
Wang, Y.; Zeng, J. Predicting drug-target interactions using restricted Boltzmann machines. Bioinformatics, 2013, 29, i126-i134.
Koohi, A. Prediction of drug-target interactions using popular collaborative filtering methods. In. 2013 Ieee International Workshop on Genomic Signal Processing and Statistics, IEEE: New York, 2013; pp. 58-61.
Fakhraei, S.; Huang, B.; Raschid, L.; Getoor, L. Network-based drug-target interaction prediction with probabilistic soft logic. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 2014, 11, 775-787.
Zhang, X.; Li, L.; Ng, M.K.; Zhang, S. Drug-target interaction prediction by integrating multiview network data. Comput. Biol. Chem., 2017, 69, 185-193.
Zhang, W.; Chen, Y.; Li, D. Drug-target interaction prediction through label propagation with linear neighborhood information. Molecules, 2017, 22, 2056.

Rights & PermissionsPrintExport Cite as

Article Details

Year: 2019
Page: [194 - 202]
Pages: 9
DOI: 10.2174/1389200219666180821094047
Price: $58

Article Metrics

PDF: 24