Bioactivity Prediction Based on Matched Molecular Pair and Matched Molecular Series Methods

Xiaoyu       Ding; Chen       Cui; Dingyan       Wang; Jihui       Zhao; Mingyue       Zheng; Xiaomin       Luo; Hualiang       Jiang; Kaixian       Chen

Abstract

Background: Enhancing a compound’s biological activity is the central task for lead optimization in small molecules drug discovery. However, it is laborious to perform many iterative rounds of compound synthesis and bioactivity tests. To address the issue, it is highly demanding to develop high quality in silico bioactivity prediction approaches, to prioritize such more active compound derivatives and reduce the trial-and-error process.

Methods: Two kinds of bioactivity prediction models based on a large-scale structure-activity relationship (SAR) database were constructed. The first one is based on the similarity of substituents and realized by matched molecular pair analysis, including SA, SA_BR, SR, and SR_BR. The second one is based on SAR transferability and realized by matched molecular series analysis, including Single MMS pair, Full MMS series, and Multi single MMS pairs. Moreover, we also defined the application domain of models by using the distance-based threshold.

Results: Among seven individual models, Multi single MMS pairs bioactivity prediction model showed the best performance (R2 = 0.828, MAE = 0.406, RMSE = 0.591), and the baseline model (SA) produced the most lower prediction accuracy (R2 = 0.798, MAE = 0.446, RMSE = 0.637). The predictive accuracy could further be improved by consensus modeling (R2 = 0.842, MAE = 0.397 and RMSE = 0.563).

Conclusion: An accurate prediction model for bioactivity was built with a consensus method, which was superior to all individual models. Our model should be a valuable tool for lead optimization.

Keywords: Matched molecular pair, matched molecular series, bioactivity prediction, SAR transfer, application domain, lead optimization.

« Previous

[1] 
Topliss JG. Utilization of operational schemes for analog synthesis in drug design. J Med Chem  1972; 15(10): 1006-11.
[http://dx.doi.org/10.1021/jm00280a002] [PMID:  5069767] 
[2] 
Kenny PW, Sadowski J. Structure modification in chemical databases. Chemoinformatics in drug discovery  2005; 23: 271-85.
[http://dx.doi.org/10.1002/3527603743.ch11] 
[3] 
Leach AG, Jones HD, Cosgrove DA, et al. Matched molecular pairs as a guide in the optimization of pharmaceutical properties; a study of aqueous solubility, plasma protein binding and oral exposure. J Med Chem  2006; 49(23): 6672-82.
[http://dx.doi.org/10.1021/jm0605233] [PMID:  17154498] 
[4] 
Hajduk PJ, Sauer DR. Statistical analysis of the effects of common chemical substituents on ligand potency. J Med Chem  2008; 51(3): 553-64.
[http://dx.doi.org/10.1021/jm070838y] [PMID:  18173228] 
[5] 
Wawer M, Bajorath J. Local structural changes, global data views: graphical substructure-activity relationship trailing. J Med Chem  2011; 54(8): 2944-51.
[http://dx.doi.org/10.1021/jm200026b] [PMID:  21443196] 
[6] 
Zhang B, Wassermann AM, Vogt M, Bajorath J. Systematic assessment of compound series with SAR transfer potential. J Chem Inf Model  2012; 52(12): 3138-43.
[http://dx.doi.org/10.1021/ci300481d] [PMID:  23186159] 
[7] 
Ehmki ESR, Kramer C. Matched molecular series: measuring SAR similarity. J Chem Inf Model  2017; 57(5): 1187-96.
[http://dx.doi.org/10.1021/acs.jcim.6b00709] [PMID:  28459552] 
[8] 
Wassermann AM, Bajorath J. A data mining method to facilitate SAR transfer. J Chem Inf Model  2011; 51(8): 1857-66.
[http://dx.doi.org/10.1021/ci200254k] [PMID:  21774471] 
[9] 
Zhang B, Hu Y, Bajorath J. SAR transfer across different targets. J Chem Inf Model  2013; 53(7): 1589-94.
[http://dx.doi.org/10.1021/ci400265b] [PMID:  23777278] 
[10] 
O’Boyle NM, Boström J, Sayle RA, Gill A. Using matched molecular series as a predictive tool to optimize biological activity. J Med Chem  2014; 57(6): 2704-13.
[http://dx.doi.org/10.1021/jm500022q] [PMID:  24601597] 
[11] 
Keefer CE, Chang G. The use of matched molecular series networks for cross target structure activity relationship translation and potency prediction. MedChemComm  2017; 8(11): 2067-78.
[http://dx.doi.org/10.1039/C7MD00465F] [PMID:  30108724] 
[12] 
Gaulton A, Bellis LJ, Bento AP, et al. ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res  2012; 40(Database issue): D1100-7.
[http://dx.doi.org/10.1093/nar/gkr777] [PMID:  21948594] 
[13] 
Hussain J, Rea C. Computationally efficient algorithm to identify matched molecular pairs (MMPs) in large data sets. J Chem Inf Model  2010; 50(3): 339-48.
[http://dx.doi.org/10.1021/ci900450m] [PMID:  20121045] 
[14] 
RDKit. Open-course cheminformatics software. Available at: http://www.rdkit.org
[15] 
Hoerl AE, Kennard RW. Ridge Regression: Applications to Nonorthogonal Problems. Technometrics  1970; 12: 69-82.
[http://dx.doi.org/10.1080/00401706.1970.10488635] 
[16] 
Van Der Walt S, Colbert SC, Varoquaux G. The NumPy array: a structure for efficient numerical computation. Comput Sci Eng  2011; 13: 22.
[http://dx.doi.org/10.1109/MCSE.2011.37] 
[17] 
Pedregosa F. Scikit-learn: machine learning in python. J Mach Learn Res  2011; 12: 2825-30.
[18] 
Liu J, Yang L, Li Y, Pan D, Hopfinger AJ. Prediction of plasma protein binding of drugs using Kier-Hall valence connectivity indices and 4D-fingerprint molecular similarity analyses. J Comput Aided Mol Des  2005; 19(8): 567-83.
[http://dx.doi.org/10.1007/s10822-005-9012-4] [PMID:  16267692] 
[19] 
Liu J, Yang L, Li Y, Pan D, Hopfinger AJ. Constructing plasma protein binding model based on a combination of cluster analysis and 4D-fingerprint molecular similarity analyses. Bioorg Med Chem  2006; 14(3): 611-21.
[http://dx.doi.org/10.1016/j.bmc.2005.08.035] [PMID:  16214346] 
[20] 
Lu J, Peng J, Wang J, et al. Estimation of acute oral toxicity in rat using local lazy learning. J Cheminform  2014; 6: 26.
[http://dx.doi.org/10.1186/1758-2946-6-26] [PMID:  24959207] 
[21] 
Zhu H, Tropsha A, Fourches D, et al. Combinatorial QSAR modeling of chemical toxicants tested against Tetrahymena pyriformis. J Chem Inf Model  2008; 48(4): 766-84.
[http://dx.doi.org/10.1021/ci700443v] [PMID:  18311912] 
[22] 
Hewitt M, Cronin MT, Madden JC, et al. Consensus QSAR models: do the benefits outweigh the complexity? J Chem Inf Model  2007; 47(4): 1460-8.
[http://dx.doi.org/10.1021/ci700016d] [PMID:  17616180] 
[23] 
Zhang S, Golbraikh A, Tropsha A. Development of quantitative structure-binding affinity relationship models based on novel geometrical chemical descriptors of the protein-ligand interfaces. J Med Chem  2006; 49(9): 2713-24.
[http://dx.doi.org/10.1021/jm050260x] [PMID:  16640331] 
[24] 
Zhang S, Golbraikh A, Oloff S, Kohn H, Tropsha A. A novel automated lazy learning QSAR (ALL-QSAR) approach: method development, applications, and virtual screening of chemical databases using validated ALL-QSAR models. J Chem Inf Model  2006; 46(5): 1984-95.
[http://dx.doi.org/10.1021/ci060132x] [PMID:  16995729] 
[25] 
Li J, Lei B, Liu H, et al. QSAR study of malonyl-CoA decarboxylase inhibitors using GA-MLR and a new strategy of consensus modeling. J Comput Chem  2008; 29(16): 2636-47.
[http://dx.doi.org/10.1002/jcc.21002] [PMID:  18484640] 
[26] 
Lei B, Li J, Yao X. A Novel Strategy of Structural Similarity Based Consensus Modeling. Mol Inform  2013; 32(7): 599-608.
[http://dx.doi.org/10.1002/minf.201200170] [PMID:  27481768] 
[27] 
Zhu H, Martin TM, Ye L, Sedykh A, Young DM, Tropsha A. Quantitative structure-activity relationship modeling of rat acute toxicity by oral exposure. Chem Res Toxicol  2009; 22(12): 1913-21.
[http://dx.doi.org/10.1021/tx900189p] [PMID:  19845371] 
[28] 
Netzeva TI, Worth A, Aldenberg T, et al. Current status of methods for defining the applicability domain of (quantitative) structure-activity relationships. The report and recommendations of ECVAM Workshop 52. Altern Lab Anim  2005; 33(2): 155-73.
[http://dx.doi.org/10.1177/026119290503300209] [PMID:  16180989] 
[29] 
Jaworska J, Nikolova-Jeliazkova N, Aldenberg T. QSAR applicabilty domain estimation by projection of the training set descriptor space: a review. Altern Lab Anim  2005; 33(5): 445-59.
[http://dx.doi.org/10.1177/026119290503300508] [PMID:  16268757] 
[30] 
Peterson KA, Feller D, Dixon DA. Chemical accuracy in ab initio thermochemistry and spectroscopy: current strategies and future challenges. Theor Chem Acc  2012; 131.
[31] 
Dietterich TG. Ensemble methods in machine learningMultiple Classifier Systems MCS2000 Lecture Notes in Computer ScienceBerlin.  Heidelberg: Springer Berlin Heidelberg 2000; pp. 1-15.
[http://dx.doi.org/10.1007/3-540-45014-9_1] 
[32] 
Botrous I, Hong Y, Li HUI, et al. Local lazy regression: making use of the neighborhood to improve QSAR predictions. J Chem Inf Model  2015; 46: 1836-47.
[33] 
Hu X, Hu Y, Vogt M, Stumpfe D, Bajorath J. MMP-Cliffs: systematic identification of activity cliffs on the basis of matched molecular pairs. J Chem Inf Model  2012; 52(5): 1138-45.
[http://dx.doi.org/10.1021/ci3001138] [PMID:  22489665] 
[34] 
Guha R, Dutta D, Jurs PC, Chen T. Local lazy regression: making use of the neighborhood to improve QSAR predictions. J Chem Inf Model  2006; 46(4): 1836-47.
[http://dx.doi.org/10.1021/ci060064e] [PMID:  16859315] 

Rights & Permissions Print Cite

Article Metrics

39

3

Journal Information

For Authors

For Editors

For Reviewers

Explore Articles

Open Access

Open Access Articles

For Visitors

DOI https://dx.doi.org/10.2174/1381612826666200427111309	Print ISSN 1381-6128
Publisher Name Bentham Science Publisher	Online ISSN 1873-4286

Current Pharmaceutical Design

Bioactivity Prediction Based on Matched Molecular Pair and Matched Molecular Series Methods

Abstract

"Tuberculosis Prevention, Diagnosis and Drug Discovery"

Current Pharmaceutical challenges in the treatment and diagnosis of neurological dysfunctions

Emerging and re-emerging diseases

Melanoma and Non-Melanoma Skin Cancer Treatment: Standard of Care and Recent Advances

Current Pharmaceutical Design

Bioactivity Prediction Based on Matched Molecular Pair and Matched Molecular Series Methods

Abstract

Call for Papers in Thematic Issues

"Tuberculosis Prevention, Diagnosis and Drug Discovery"

Current Pharmaceutical challenges in the treatment and diagnosis of neurological dysfunctions

Emerging and re-emerging diseases

Melanoma and Non-Melanoma Skin Cancer Treatment: Standard of Care and Recent Advances

Related Journals

Related Books

Related Articles