Increasingly, chemical libraries are being produced which are focused on a biological target or group of related targets, rather than simply being constructed in a combinatorial fashion. A screening collection compiled from such libraries will contain multiple analogues of a number of discrete series of compounds. The question arises as to how many analogues are necessary to represent each series in order to ensure that an active series will be identified. Based on a simple probabilistic argument and supported by in-house screening data, guidelines are given for the number of compounds necessary to achieve a “hit”, or series of hits, at various levels of certainty. Obtaining more than one hit from the same series is useful since this gives early acquisition of SAR (structure-activity relationship) and confirms a hit is not a singleton. We show that screening collections composed of only small numbers of analogues of each series are suboptimal for SAR acquisition. Based on these studies, we recommend a minimum series size of about 200 compounds. This gives a high probability of confirmatory SAR (i.e. at least two hits from the same series). More substantial early SAR (at least 5 hits from the same series) can be gained by using series of about 650 compounds each. With this level of information being generated, more accurate assessment of the likely success of the series in hit-to-lead and later stage development becomes possible.
Keywords: Hit rate, diversity, library size, chemotype, series, parallel array, combinatorial chemistry, structure activity relationship
Rights & PermissionsPrintExport