Structural Similarity and Descriptor Spaces for Clustering and Development of QSAR Models§
Irene Luque Ruiz,
Gonzalo Cerruela Garcia,
Miguel Angel Gomez-Nieto.
In this paper we study and analyze the behavior of different representational spaces for the clustering and
building of QSAR models. Representational spaces based on fingerprint similarity, structural similarity using maximum
common subgraphs (MCS) and all maximum common subgraphs (AMCS) approaches are compared against
representational spaces based on structural fragments and non-isomorphic fragments (NIF), built using different molecular
descriptors. Algorithms for extraction of MCS, AMCS and NIF are described and support vector machine is used for the
classification of a dataset corresponding with 74 compounds of 1,4-benzoquinone derivatives. Molecular descriptors are
tested in order to build QSAR models for the prediction of the antifungal activity of the dataset. Descriptors based on the
consideration of graph connectivity and distances are the most appropriate for building QSAR models. Moreover, models
based on approximate similarity improve the statistical of the equations thanks to combining structural similarity, nonisomorphic
fragments and descriptors approaches for the creation of more robust and finer prediction equations.
Keywords: QSAR, approximate similarity, molecular descriptors, MCS, AMCS, molecular fragments.
Rights & PermissionsPrintExport