PTML Multi-Label Algorithms: Models, Software, and Applications

Author(s): Bernabe Ortega-Tenezaca, Viviana Quevedo-Tumailli, Harbil Bediaga, Jon Collados, Sonia Arrasate, Gotzon Madariaga, Cristian R Munteanu, M. Natália D.S. Cordeiro*, Humbert González-Díaz*

Journal Name: Current Topics in Medicinal Chemistry

Volume 20 , Issue 25 , 2020


Become EABM
Become Reviewer
Call for Editor

Abstract:

By combining Machine Learning (ML) methods with Perturbation Theory (PT), it is possible to develop predictive models for a variety of response targets. Such combination often known as Perturbation Theory Machine Learning (PTML) modeling comprises a set of techniques that can handle various physical, and chemical properties of different organisms, complex biological or material systems under multiple input conditions. In so doing, these techniques effectively integrate a manifold of diverse chemical and biological data into a single computational framework that can then be applied for screening lead chemicals as well as to find clues for improving the targeted response(s). PTML models have thus been extremely helpful in drug or material design efforts and found to be predictive and applicable across a broad space of systems. After a brief outline of the applied methodology, this work reviews the different uses of PTML in Medicinal Chemistry, as well as in other applications. Finally, we cover the development of software available nowadays for setting up PTML models from large datasets.

Keywords: Drug Discovery, Cheminformatics, Multi-target models, Large data sets, PTML, Perturbation theory, Machine learning.

[1]
Kalliokoski, T.; Kramer, C.; Vulpetti, A.; Gedeck, P. Comparability of mixed IC50 data - a statistical analysis. PLoS One, 2013, 8(4), e61007.
[http://dx.doi.org/10.1371/journal.pone.0061007 ] [PMID: 23613770]
[2]
Eriksson, L.; Jaworska, J.; Worth, A.P.; Cronin, M.T.; McDowell, R.M.; Gramatica, P. Methods for reliability and uncertainty assessment and for applicability evaluations of classification- and regression-based QSARs. Environ. Health Perspect., 2003, 111(10), 1361-1375.
[http://dx.doi.org/10.1289/ehp.5758 ] [PMID: 12896860]
[3]
Arrasate, S.; Duardo-Sanchez, A. Perturbation theory machine learning models: theory, regulatory issues, and applications to organic synthesis, medicinal chemistry, protein research, and technology. Curr. Top. Med. Chem., 2018, 18(14), 1203-1213.
[http://dx.doi.org/10.2174/1568026618666180810124031 ] [PMID: 30095052]
[4]
Davies, M.; Nowotka, M.; Papadatos, G.; Dedman, N.; Gaulton, A.; Atkinson, F.; Bellis, L.; Overington, J.P. ChEMBL web services: streamlining access to drug discovery data and utilities. Nucleic Acids Res., 2015, 43(W1), W612-620.
[http://dx.doi.org/10.1093/nar/gkv352 ] [PMID: 25883136]
[5]
Pundir, S.; Martin, M.J.; O'Donovan, C. UniProt Tools. Curr. Protoc. Bioinformatics , 2016, 53 , 1 29 1 -15.
[6]
NCBI Resource Coordinators. Database resources of the national center for biotechnology information. Nucleic Acids Res., 2016, 44(D1), D7-D19.
[7]
Ferreira da Costa, J.; Silva, D.; Caamaño, O.; Brea, J.M.; Loza, M.I.; Munteanu, C.R.; Pazos, A.; García-Mera, X.; González-Díaz, H. Perturbation theory/machine learning model of ChEMBL data for dopamine targets: docking, synthesis, and assay of new l-prolyl-l-leucyl-glycinamide peptidomimetics. ACS Chem. Neurosci., 2018, 9(11), 2572-2587.
[http://dx.doi.org/10.1021/acschemneuro.8b00083 ] [PMID: 29791132]
[8]
Blazquez-Barbadillo, C.; Aranzamendi, E.; Coya, E.; Lete, E.; Sotomayor, N.; Gonzalez-Diaz, H. Perturbation theory model of reactivity and enantioselectivity of palladium-catalyzed Heck-Heck cascade reactions. RSC Advances, 2016, 6, 38602 -38610 .
[http://dx.doi.org/10.1039/C6RA08751E]
[9]
Casañola-Martin, G.M.; Le-Thi-Thu, H.; Pérez-Giménez, F.; Marrero-Ponce, Y.; Merino-Sanjuán, M.; Abad, C.; González-Díaz, H. Multi-output model with box-jenkins operators of quadratic indices for prediction of malaria and cancer inhibitors targeting ubiquitin- proteasome pathway (upp) proteins. Curr. Protein Pept. Sci., 2016, 17(3), 220-227.
[http://dx.doi.org/10.2174/1389203717999160226173500 ] [PMID: 26427384]
[10]
Romero-Durán, F.J.; Alonso, N.; Yañez, M.; Caamaño, O.; García-Mera, X.; González-Díaz, H. Brain-inspired cheminformatics of drug-target brain interactome, synthesis, and assay of TVP1022 derivatives. Neuropharmacology, 2016, 103, 270-278.
[http://dx.doi.org/10.1016/j.neuropharm.2015.12.019 ] [PMID: 26721628]
[11]
Kleandrova, V.V.; Luan, F.; González-Díaz, H.; Ruso, J.M.; Speck-Planche, A.; Cordeiro, M.N.D.S. Computational tool for risk assessment of nanomaterials: novel QSTR-perturbation model for simultaneous prediction of ecotoxicity and cytotoxicity of uncoated and coated nanoparticles under multiple experimental conditions. Environ. Sci. Technol., 2014, 48(24), 14686-14694.
[http://dx.doi.org/10.1021/es503861x ] [PMID: 25384130]
[12]
Luan, F.; Kleandrova, V.V.; González-Díaz, H.; Ruso, J.M.; Melo, A.; Speck-Planche, A.; Cordeiro, M.N. Computer-aided nanotoxicology: assessing cytotoxicity of nanoparticles under diverse experimental conditions by using a novel QSTR-perturbation approach. Nanoscale, 2014, 6(18), 10623-10630.
[http://dx.doi.org/10.1039/C4NR01285B ] [PMID: 25083742]
[13]
Alonso, N.; Caamaño, O.; Romero-Duran, F.J.; Luan, F.D.S.; Cordeiro, M.N.; Yañez, M.; González-Díaz, H.; García-Mera, X. Model for high-throughput screening of multitarget drugs in chemical neurosciences: synthesis, assay, and theoretic study of rasagiline carbamates. ACS Chem. Neurosci., 2013, 4(10), 1393-1403.
[http://dx.doi.org/10.1021/cn400111n ] [PMID: 23855599]
[14]
Ambure, P.; Halder, A.K.; González Díaz, H.; Cordeiro, M.N.D.S. QSAR-Co: An open source software for developing robust multitasking or multitarget classification-based qsar models. J. Chem. Inf. Model., 2019, 59(6), 2538-2544.
[http://dx.doi.org/10.1021/acs.jcim.9b00295 ] [PMID: 31083984]
[15]
Bernabe Ortega-Tenezaca, V.Q-T.; González-Díaz, H. In: ; FRAMA 1.0: Framework for moving average operators calculation in data analysis. , Proceedings of MOL2NET, International Conference Series on Multidisciplinary Sciences; MDPI Sciforum, Basel, Switzerland . 2017, p. 3 .
[16]
Bediaga, H.; Arrasate, S.; González-Díaz, H. PTML combinatorial model of chembl compounds assays for multiple types of cancer. ACS Comb. Sci., 2018, 20(11), 621-632.
[http://dx.doi.org/10.1021/acscombsci.8b00090 ] [PMID: 30240186]
[17]
Nocedo-Mena, D.; Cornelio, C.; Camacho-Corona, M.D.R.; Garza-González, E.; Waksman de Torres, N.; Arrasate, S.; Sotomayor, N.; Lete, E.; González-Díaz, H. Modeling antibacterial activity with machine learning and fusion of chemical structure information with microorganism metabolic networks. J. Chem. Inf. Model., 2019, 59(3), 1109-1120.
[http://dx.doi.org/10.1021/acs.jcim.9b00034 ] [PMID: 30802402]
[18]
Vásquez-Domínguez, E.; Armijos-Jaramillo, V.D.; Tejera, E.; González-Díaz, H. Multioutput perturbation-theory machine learning (ptml) model of chembl data for antiretroviral compounds. Mol. Pharm., 2019, 16(10), 4200-4212.
[http://dx.doi.org/10.1021/acs.molpharmaceut.9b00538 ] [PMID: 31426639]
[19]
Speck-Planche, A.; Cordeiro, M.N.D.S. Erratum to: Fragment-based in silico modeling of multi-target inhibitors against breast cancer-related proteins. Mol. Divers., 2017, 21(3), 525.
[http://dx.doi.org/10.1007/s11030-017-9766-3 ] [PMID: 28766255]
[20]
Speck-Planche, A.; Cordeiro, M.N.D.S. Fragment-based in silico modeling of multi-target inhibitors against breast cancer-related proteins. Mol. Divers., 2017, 21(3), 511-523.
[http://dx.doi.org/10.1007/s11030-017-9731-1 ] [PMID: 28194627]
[21]
Levy, V.; Grant, R.M. Antiretroviral therapy for hepatitis B virus-HIV-coinfected patients: Promises and pitfalls. Clin. Infect. Dis., 2006, 43(7), 904-910.
[http://dx.doi.org/10.1086/507532 ] [PMID: 16941375]
[22]
Benhamou, Y. Antiretroviral therapy and HIV/hepatitis B virus coinfection. Clin. Infect. Dis., 2004, 38(Suppl. 2), S98-S103.
[http://dx.doi.org/10.1086/381451 ] [PMID: 14986281]
[23]
Yang, R.; Gui, X.; Xiong, Y.; Gao, S.C.; Yan, Y. Impact of hepatitis B virus infection on HIV response to antiretroviral therapy in a Chinese antiretroviral therapy center. Int. J. Infect. Dis., 2014, 28, 29-34.
[http://dx.doi.org/10.1016/j.ijid.2014.07.018 ] [PMID: 25236390]
[24]
Ferreira da Costa, J.; Caamaño, O.; Fernández, F.; García-Mera, X.; Sampaio-Dias, I.E.; Brea, J.M.; Cadavid, M.I. Synthesis and allosteric modulation of the dopamine receptor by peptide analogs of L-prolyl-L-leucyl-glycinamide (PLG) modified in the L-proline or L-proline and L-leucine scaffolds. Eur. J. Med. Chem., 2013, 69, 146-158.
[http://dx.doi.org/10.1016/j.ejmech.2013.08.001 ] [PMID: 24013414]
[25]
Quevedo-Tumailli, V.F.; Ortega-Tenezaca, B.; González-Díaz, H. Chromosome gene orientation inversion networks (goins) of plasmodium proteome. J. Proteome Res., 2018, 17(3), 1258-1268.
[http://dx.doi.org/10.1021/acs.jproteome.7b00861 ] [PMID: 29336158]
[26]
Martínez-Arzate, S.G.; Tenorio-Borroto, E.; Barbabosa Pliego, A.; Díaz-Albiter, H.M.; Vázquez-Chagoyán, J.C.; González-Díaz, H. PTML model for proteome mining of b-cell epitopes and theoretical-experimental study of bm86 protein sequences from Colima, Mexico. J. Proteome Res., 2017, 16(11), 4093-4103.
[http://dx.doi.org/10.1021/acs.jproteome.7b00477 ] [PMID: 28922600]
[27]
Concu, R.; D.S., Cordeiro M.N.; Munteanu, C.R.; González-Díaz, H. . PTML model of enzyme subclasses for mining the proteome of biofuel producing microorganisms. J. Proteome Res., 2019, 18(7), 2735-2746.
[http://dx.doi.org/10.1021/acs.jproteome.8b00949 ] [PMID: 31081631]
[28]
Blay, V.; Yokoi, T.; González-Díaz, H. Perturbation theory-machine learning study of zeolite materials desilication. perturbation theory-machine learning study of zeolite materials desilication. J. Chem. Inf. Model., 2018, 58(12), 2414-2419.
[http://dx.doi.org/10.1021/acs.jcim.8b00383 ] [PMID: 30139249]
[29]
Organization for Economic Co-operation and Development (OECD). Guidance document on the validation of (quantitative) structure-activity relationship ((Q)SAR) models. In: OECD Series on Testing and Assessment; OECD Publishing: Paris, 2007, pp. 55-65.
[30]
Speck-Planche, A.; Cordeiro, M.N. Simultaneous modeling of antimycobacterial activities and ADMET profiles: a chemoinformatic approach to medicinal chemistry. Curr. Top. Med. Chem., 2013, 13(14), 1656-1665.
[http://dx.doi.org/10.2174/15680266113139990116 ] [PMID: 23889052]
[31]
Speck-Planche, A.; Cordeiro, M.N. Chemoinformatics for medicinal chemistry: in silico model to enable the discovery of potent and safer anti-cocci agents. Future Med. Chem., 2014, 6(18), 2013-2028.
[http://dx.doi.org/10.4155/fmc.14.136 ] [PMID: 25531966]
[32]
Speck-Planche, A.; Cordeiro, M.N.D.S. De novo computational design of compounds virtually displaying potent antibacterial activity and desirable in vitro ADMET profiles. Med. Chem. Res., 2017, 26, 2345-2356.
[http://dx.doi.org/10.1007/s00044-017-1936-4]
[33]
Speck-Planche, A.; Kleandrova, V.V.; Ruso, J.M.; Cordeiro, M.N. First multitarget chemo-bioinformatic model to enable the discovery of antibacterial peptides against multiple gram-positive pathogens. J. Chem. Inf. Model., 2016, 56(3), 588-598.
[http://dx.doi.org/10.1021/acs.jcim.5b00630 ] [PMID: 26960000]
[34]
Kennard, R.W.; Stone, L.A. Computer aided design of experiments. Technometrics, 1969, 11, 137-148.
[http://dx.doi.org/10.1080/00401706.1969.10490666]
[35]
Venkatasubramanian, V.; Sundaram, A. Genetic algorithms: introduction and applications. In: Encyclopedia of Computational Chemistry; Wiley & Sons Inc.: Hoboken, 2002, p. 2.
[36]
Rogers, D.; Hopfinger, A.J. Application of genetic function approximation to quantitative structure-activity relationships and quantitative structure-property relationships. J. Chem. Inf. Comput. Sci., 1994, 34, 854-866.
[http://dx.doi.org/10.1021/ci00020a020]
[37]
Hemmateenejad, B.; Akhond, M.; Miri, R.; Shamsipur, M. Genetic algorithm applied to the selection of factors in principal component-artificial neural networks: application to QSAR study of calcium channel antagonist activity of 1,4-dihydropyridines (nifedipine analogous). J. Chem. Inf. Comput. Sci., 2003, 43(4), 1328-1334.
[http://dx.doi.org/10.1021/ci025661p ] [PMID: 12870926]
[38]
Hasegawa, K.; Miyashita, Y.; Funatsu, K. GA strategy for variable selection in QSAR studies: GA-based PLS analysis of calcium channel antagonists. J. Chem. Inf. Comput. Sci., 1997, 37(2), 306-310.
[http://dx.doi.org/10.1021/ci960047x ] [PMID: 9157101]
[39]
Ambure, P.; Roy, K. Understanding the structural requirements of cyclic sulfone hydroxyethylamines as hBACE1 inhibitors against Aβ plaques in Alzheimer’s disease: a predictive QSAR approach. RSC Advances , 2016, 6, 28171 -28186.
[http://dx.doi.org/10.1039/C6RA04104C]
[40]
Gramatica, P.; Chirico, N.; Papa, E.; Cassani, S.; Kovarich, S. QSARINS: A new software for the development, analysis, and validation of QSAR MLR models. J. Comput. Chem., 2013, 34, 2121-2132.
[http://dx.doi.org/10.1002/jcc.23361]
[41]
Gao, H. Application of BCUT metrics and genetic algorithm in binary QSAR analysis. J. Chem. Inf. Comput. Sci., 2001, 41(2), 402-407.
[http://dx.doi.org/10.1021/ci000306p ] [PMID: 11277729]
[42]
Sutherland, J.J.; O’Brien, L.A.; Weaver, D.F. Spline-fitting with a genetic algorithm: a method for developing classification structureactivity relationships. J. Chem. Inf. Comput. Sci. , 2003, 43(6), 1906-1915.
[http://dx.doi.org/10.1021/ci034143r] [PMID: 14632439]
[43]
Snedecor, G.; Cochran, W. Statistical Methods; Oxford and IBH Publishing Co: New Delhi, 1967, p. 593.
[44]
Breiman, L. Random forests. Mach. Learn., 2001, 45, 5-32.
[http://dx.doi.org/10.1023/A:1010933404324]
[45]
Hall, M.; Frank, E.; Holmes, G.; Pfahringer, B.; Reutemann, P.; Witten, I.H. The WEKA data mining software: an update. SIGKDD Explor, 2009, 11, 10-18.
[http://dx.doi.org/10.1145/1656274.1656278]
[46]
Wilks, S.S. Certain generalizations in the analysis of variance. Biometrika, 1932, 471-494.
[http://dx.doi.org/10.1093/biomet/24.3-4.471]
[47]
Fawcett, T. An introduction to ROC analysis. Pattern Recognit. Lett., 2006, 27, 861-874.
[http://dx.doi.org/10.1016/j.patrec.2005.10.010]
[48]
Fisher, R.A. The design of experiments; Oliver And Boyd: Edinburgh, London, 1937.
[49]
Roy, K.; Kar, S.; Ambure, P. On a simple approach for determining applicability domain of QSAR models. Chemom. Intell. Lab. Syst., 2015, 145, 22-29.
[http://dx.doi.org/10.1016/j.chemolab.2015.04.013]
[50]
Hill, T.; Lewicki, P. STATISTICS Methods and applications. A comprehensive reference for science, industry and data mining; StatSoft: Tulsa, 2006, Vol. 1, p. 813.


Rights & PermissionsPrintExport Cite as

Article Details

VOLUME: 20
ISSUE: 25
Year: 2020
Published on: 16 September, 2020
Page: [2326 - 2337]
Pages: 12
DOI: 10.2174/1568026620666200916122616
Price: $65

Article Metrics

PDF: 16
HTML: 2