Fingerprint representations of molecular structure and properties were among the first computational tools for similarity searching and are widely applied to this date, which is in part due to their computational efficiency and ease-ofuse. Moreover, despite their simplicity, 2D molecular fingerprints have been surprisingly successful in identifying novel active compounds. However, properly applying 2D fingerprint representations and similarity metrics in the search for active molecules is much more complicated than one might assume. For example, no generally applicable similarity functions and threshold values exist that reliably indicate a level of molecular similarity that results in similar biological activity. Furthermore, fingerprint search calculations are known to be biased by molecular size and complexity effects, which often lead to a significant increase in false-positive rates. In this contribution, we will describe known caveats associated with fingerprint similarity searching and strategies to overcome these difficulties including the design of complexity-independent fingerprint representations and similarity metrics.
Keywords: Molecular similarity, molecular fingerprints, similarity metrics, similarity searching, biological activity, molecular complexity effects
Rights & PermissionsPrintExport