Most drug discovery programs today originate by selection of ‘hit’ molecules resulting from assays against
large compound screening libraries. The chemical space in which these hits reside has implications for its biological activity
in vivo and likelihood of progression to a drug candidate. We have created a database of commercially available
screening compounds and natural products in order to analyse the drug- and lead-likeness of commercial screening compounds
and compare them with i) orally administered drugs, ii) non-orally administered drugs, and iii) compounds with
significant biological activity but unspecified or not yet determined route of administration from the public databases
DrugBank and ChEMBL. The data set contained 15.5 million entries from 102 vendors, which resulted in just over 8 million
unique chemical structures. We review these data for current drug/lead-likeness, then utilise substructure-based filters
for promiscuity and unwanted groups, and finally compare chemical properties for structures within the different sub-sets.
While the majority of the commercial compounds satisfy various drug-likeness rules, they show a larger molecular weight
and higher hydrophobicity compared to orally available drugs, with generally higher aromaticity and lower solubility.
This ‘right shift’ of chemical properties has also been found in the majority of the compounds with significant biological
activity in ChEMBL, reflecting a common trend in current drug discovery, towards larger, more hydrophobic compounds
and fewer drug-like compounds. In particular, successful drugs were found to possess much lower median logD values
than those found for compound collections. In addition, commercial compounds show a quite narrow distribution in molecular
weight, with a median absolute deviation of only 78 Da around a median of 387 Da. For high-throughput screening
a highly stringent combination of several lead-likeness and substructure filters against unwanted groups could be applied,
resulting in 2 million lead-like structures. For fragment based screening approaches the rule of three (Ro3) would select
around 400,000 structures.
Keywords: Drug discovery, drug-likeness, screening libraries, physicochemical properties, non-orally administered drugs, current drug discovery, hydrophobic compounds, small molecules, pharmaceutical research, Drug-likeness, patent strategy, physicochemical properties, organic compounds, hydrophobicity, natural products
Rights & PermissionsPrintExport