Bioactivity Prediction Based on Matched Molecular Pair and Matched Molecular Series Methods

Author(s): Xiaoyu Ding, Chen Cui, Dingyan Wang, Jihui Zhao, Mingyue Zheng*, Xiaomin Luo*, Hualiang Jiang, Kaixian Chen

Journal Name: Current Pharmaceutical Design

Volume 26 , Issue 33 , 2020

Become EABM
Become Reviewer
Call for Editor


Background: Enhancing a compound’s biological activity is the central task for lead optimization in small molecules drug discovery. However, it is laborious to perform many iterative rounds of compound synthesis and bioactivity tests. To address the issue, it is highly demanding to develop high quality in silico bioactivity prediction approaches, to prioritize such more active compound derivatives and reduce the trial-and-error process.

Methods: Two kinds of bioactivity prediction models based on a large-scale structure-activity relationship (SAR) database were constructed. The first one is based on the similarity of substituents and realized by matched molecular pair analysis, including SA, SA_BR, SR, and SR_BR. The second one is based on SAR transferability and realized by matched molecular series analysis, including Single MMS pair, Full MMS series, and Multi single MMS pairs. Moreover, we also defined the application domain of models by using the distance-based threshold.

Results: Among seven individual models, Multi single MMS pairs bioactivity prediction model showed the best performance (R2 = 0.828, MAE = 0.406, RMSE = 0.591), and the baseline model (SA) produced the most lower prediction accuracy (R2 = 0.798, MAE = 0.446, RMSE = 0.637). The predictive accuracy could further be improved by consensus modeling (R2 = 0.842, MAE = 0.397 and RMSE = 0.563).

Conclusion: An accurate prediction model for bioactivity was built with a consensus method, which was superior to all individual models. Our model should be a valuable tool for lead optimization.

Keywords: Matched molecular pair, matched molecular series, bioactivity prediction, SAR transfer, application domain, lead optimization.

Topliss JG. Utilization of operational schemes for analog synthesis in drug design. J Med Chem 1972; 15(10): 1006-11.
[] [PMID: 5069767]
Kenny PW, Sadowski J. Structure modification in chemical databases. Chemoinformatics in drug discovery 2005; 23: 271-85.
Leach AG, Jones HD, Cosgrove DA, et al. Matched molecular pairs as a guide in the optimization of pharmaceutical properties; a study of aqueous solubility, plasma protein binding and oral exposure. J Med Chem 2006; 49(23): 6672-82.
[] [PMID: 17154498]
Hajduk PJ, Sauer DR. Statistical analysis of the effects of common chemical substituents on ligand potency. J Med Chem 2008; 51(3): 553-64.
[] [PMID: 18173228]
Wawer M, Bajorath J. Local structural changes, global data views: graphical substructure-activity relationship trailing. J Med Chem 2011; 54(8): 2944-51.
[] [PMID: 21443196]
Zhang B, Wassermann AM, Vogt M, Bajorath J. Systematic assessment of compound series with SAR transfer potential. J Chem Inf Model 2012; 52(12): 3138-43.
[] [PMID: 23186159]
Ehmki ESR, Kramer C. Matched molecular series: measuring SAR similarity. J Chem Inf Model 2017; 57(5): 1187-96.
[] [PMID: 28459552]
Wassermann AM, Bajorath J. A data mining method to facilitate SAR transfer. J Chem Inf Model 2011; 51(8): 1857-66.
[] [PMID: 21774471]
Zhang B, Hu Y, Bajorath J. SAR transfer across different targets. J Chem Inf Model 2013; 53(7): 1589-94.
[] [PMID: 23777278]
O’Boyle NM, Boström J, Sayle RA, Gill A. Using matched molecular series as a predictive tool to optimize biological activity. J Med Chem 2014; 57(6): 2704-13.
[] [PMID: 24601597]
Keefer CE, Chang G. The use of matched molecular series networks for cross target structure activity relationship translation and potency prediction. MedChemComm 2017; 8(11): 2067-78.
[] [PMID: 30108724]
Gaulton A, Bellis LJ, Bento AP, et al. ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res 2012; 40(Database issue): D1100-7.
[] [PMID: 21948594]
Hussain J, Rea C. Computationally efficient algorithm to identify matched molecular pairs (MMPs) in large data sets. J Chem Inf Model 2010; 50(3): 339-48.
[] [PMID: 20121045]
RDKit. Open-course cheminformatics software. Available at:
Hoerl AE, Kennard RW. Ridge Regression: Applications to Nonorthogonal Problems. Technometrics 1970; 12: 69-82.
Van Der Walt S, Colbert SC, Varoquaux G. The NumPy array: a structure for efficient numerical computation. Comput Sci Eng 2011; 13: 22.
Pedregosa F. Scikit-learn: machine learning in python. J Mach Learn Res 2011; 12: 2825-30.
Liu J, Yang L, Li Y, Pan D, Hopfinger AJ. Prediction of plasma protein binding of drugs using Kier-Hall valence connectivity indices and 4D-fingerprint molecular similarity analyses. J Comput Aided Mol Des 2005; 19(8): 567-83.
[] [PMID: 16267692]
Liu J, Yang L, Li Y, Pan D, Hopfinger AJ. Constructing plasma protein binding model based on a combination of cluster analysis and 4D-fingerprint molecular similarity analyses. Bioorg Med Chem 2006; 14(3): 611-21.
[] [PMID: 16214346]
Lu J, Peng J, Wang J, et al. Estimation of acute oral toxicity in rat using local lazy learning. J Cheminform 2014; 6: 26.
[] [PMID: 24959207]
Zhu H, Tropsha A, Fourches D, et al. Combinatorial QSAR modeling of chemical toxicants tested against Tetrahymena pyriformis. J Chem Inf Model 2008; 48(4): 766-84.
[] [PMID: 18311912]
Hewitt M, Cronin MT, Madden JC, et al. Consensus QSAR models: do the benefits outweigh the complexity? J Chem Inf Model 2007; 47(4): 1460-8.
[] [PMID: 17616180]
Zhang S, Golbraikh A, Tropsha A. Development of quantitative structure-binding affinity relationship models based on novel geometrical chemical descriptors of the protein-ligand interfaces. J Med Chem 2006; 49(9): 2713-24.
[] [PMID: 16640331]
Zhang S, Golbraikh A, Oloff S, Kohn H, Tropsha A. A novel automated lazy learning QSAR (ALL-QSAR) approach: method development, applications, and virtual screening of chemical databases using validated ALL-QSAR models. J Chem Inf Model 2006; 46(5): 1984-95.
[] [PMID: 16995729]
Li J, Lei B, Liu H, et al. QSAR study of malonyl-CoA decarboxylase inhibitors using GA-MLR and a new strategy of consensus modeling. J Comput Chem 2008; 29(16): 2636-47.
[] [PMID: 18484640]
Lei B, Li J, Yao X. A Novel Strategy of Structural Similarity Based Consensus Modeling. Mol Inform 2013; 32(7): 599-608.
[] [PMID: 27481768]
Zhu H, Martin TM, Ye L, Sedykh A, Young DM, Tropsha A. Quantitative structure-activity relationship modeling of rat acute toxicity by oral exposure. Chem Res Toxicol 2009; 22(12): 1913-21.
[] [PMID: 19845371]
Netzeva TI, Worth A, Aldenberg T, et al. Current status of methods for defining the applicability domain of (quantitative) structure-activity relationships. The report and recommendations of ECVAM Workshop 52. Altern Lab Anim 2005; 33(2): 155-73.
[] [PMID: 16180989]
Jaworska J, Nikolova-Jeliazkova N, Aldenberg T. QSAR applicabilty domain estimation by projection of the training set descriptor space: a review. Altern Lab Anim 2005; 33(5): 445-59.
[] [PMID: 16268757]
Peterson KA, Feller D, Dixon DA. Chemical accuracy in ab initio thermochemistry and spectroscopy: current strategies and future challenges. Theor Chem Acc 2012; 131.
Dietterich TG. Ensemble methods in machine learningMultiple Classifier Systems MCS2000 Lecture Notes in Computer ScienceBerlin. Heidelberg: Springer Berlin Heidelberg 2000; pp. 1-15.
Botrous I, Hong Y, Li HUI, et al. Local lazy regression: making use of the neighborhood to improve QSAR predictions. J Chem Inf Model 2015; 46: 1836-47.
Hu X, Hu Y, Vogt M, Stumpfe D, Bajorath J. MMP-Cliffs: systematic identification of activity cliffs on the basis of matched molecular pairs. J Chem Inf Model 2012; 52(5): 1138-45.
[] [PMID: 22489665]
Guha R, Dutta D, Jurs PC, Chen T. Local lazy regression: making use of the neighborhood to improve QSAR predictions. J Chem Inf Model 2006; 46(4): 1836-47.
[] [PMID: 16859315]

Rights & PermissionsPrintExport Cite as

Article Details

Year: 2020
Published on: 23 September, 2020
Page: [4195 - 4205]
Pages: 11
DOI: 10.2174/1381612826666200427111309
Price: $65

Article Metrics

PDF: 22
PRC: 1