The magnitude of the challenges in preclinical drug discovery is evident in the large amount of capital invested
in such efforts in pursuit of a small static number of eventually successful marketable therapeutics. An explosion in the
availability of potentially drug-like compounds and chemical biology data on these molecules can provide us with the
means to improve the eventual success rates for compounds being considered at the preclinical level, but only if the
community is able to access available information in an efficient and meaningful way. Thus, chemical database resources
are critical to any serious drug discovery effort. This paper explores the basic principles underlying the development and
implementation of chemical databases, and examines key issues of how molecular information may be encoded within
these databases so as to enhance the likelihood that users will be able to extract meaningful information from data queries.
In addition to a broad survey of conventional data representation and query strategies, key enabling technologies such as
new context-sensitive chemical similarity measures and chemical cartridges are examined, with recommendations on how
such resources may be integrated into a practical database environment.
Keywords: Chemical cartridge, chemical database, fingerprints, frequent subgraphs, hashing, molecular structure
Rights & PermissionsPrintExport