Let's consider the growing popularity of phenotypic screens within the drug development community.
Biochemical assays once dominated the field of high throughput chemical screening, but phenotypic
experiments are more adept at modeling real physiological behavior, and can simultaneously mimic drug
delivery factors including cell membrane permeability, intracellular localization, and aspects of transport and
metabolism. Efficiency and cost of phenotypic screens have improved substantially over the years. For all of
these reasons, the technology is well suited to fostering lead optimization.
The technology has disadvantages, however, in that data complexity hinders de novo SAR rationalization. For
preliminary screens over diverse chemotypes, the simpler biochemical construct affords a good platform for
classifying hits according to target-specific modulation, thus facilitating systematic pharmacophore perception.
With phenotypic screens, activity trends across different chemotypes may actually reflect modulation of
different biochemical targets or via distinct interaction modes. Even within the same chemotype family, pharmacophore perception
may be occluded by imprecise partitioning of observed bioactivity measurements between fundamental biochemical modulation
versus variation across ADME-like factors.
So the question arises: can phenotypic screens ultimately support systematic SAR-driven pharmacophore perception and thus fully
supplant biochemical screens? The answer may be computational in nature: given a large, accurate phenotypic data set, one should
theoretically be able to sort through the various influences to distill a reliable chemotype-specific SAR that distinguishes targetspecific
trends from deliverability issues. This can be achieved via data mining.
Unfortunately, many people who realize that data mining is designed for such challenges may be missing background insight that is
key to exploiting such methods. Most often overlooked is the fact that excellent calculations will rarely rescue weak data. Grasping
pharmacophore effects from a screening study requires data sets that are strategically sensitive to variations in molecular effects that
dictate physiology. In order for a given chemical to exert specific biochemical activity, it must have the right solubility profile to be
available to those biomolecules that must collectively admit, transport and bind the modulator in order to effect applicable bioactivity.
Ligand solubility is determined by chemical substructures. Furthermore, every relevant intermolecular interaction is directly
influenced by ligand chemical composition. Thus, if one knows which chemical substructures balance appropriate solubility with the
interactions required to reach and bind the biomolecular target, one should be able to predict ligand activity. This is the essence of
rational drug design and is, furthermore, the type of insight that careful mining of a well crafted data set should yield.
A thorough discourse on 'careful' data mining, would take up much more than the space available to an editorial like this, but of more
immediate interest to non-informaticians who design chemical screens would be a quick synopsis of what well crafted data sets might
To a significant extent, the data set will depend on whether one is trying to refine one's knowledge of the target specific SAR in a
system, or if one is searching de novo for promising new chemotypes with activity toward a given phenotype. Ultimately, the strategy
lies in giving the informatician the right spread of data to partition the global bioactivity measurement into contributing factors.
In the simpler case where a chemotype family of interest has already been identified, the set of compounds screened should focus on
chemicals with the desired core scaffold (or close variants thereof) so that subsequent data analysis will be emphasize SAR within that
family and avoid informational contamination from other mechanisms of action. However, it can help to retain a small population of
compounds derived from chemical families that likely act via distinct mechanisms; this enables computational differentiation not only
between chemical properties that influence whether a given compound is active, but also perception of attributes that push modulators
toward one mechanism over another. Secondly, although it may seem counter-intuitive to populate your screen with compounds
known to have marginal solubility, membrane permeability or transport efficacy, a data set that includes them will also produce
analysis that can distinguish molecular properties that favor target-specific interactions, versus those that amplify bioactivity simply by
ensuring greater compound availability to the target.
For speculative preliminary studies, systematic screening set selection reflects other considerations. Unlike targeted screening, the
useful information from general preliminary screens is enhanced by embracing broad chemical functionality to reduce the number of
potentially relevant chemotypes that are overlooked. Preliminary screens should also rigorously eschew compounds with poor ADME
properties, since at this early stage there is little benefit in degrading chemotype-specific assessments with inactivity arising from
factors other than target-compatibility. Resulting SAR analysis may blend factors that reflect target-specific effects with deliverability,
but at least will lay a solid foundation for subsequent targeted lead discovery and refinement studies.