In the spirit of reporting valid and reliable Quantitative Structure-Activity Relationship (QSAR) models, the
aim of our research was to assess how the leverage (analysis with Hat matrix, hi) and the influential (analysis with Cook’s
distance, Di) of QSAR models may reflect the models reliability and their characteristics. The datasets included in this
research were collected from previously published papers. Seven datasets which accomplished the imposed inclusion
criteria were analyzed. Three models were obtained for each dataset (full-model, hi-model and Di-model) and several
statistical validation criteria were applied to the models. In 5 out of 7 sets the correlation coefficient increased when
compounds with either hi or Di higher than the threshold were removed. Withdrawn compounds varied from 2 to 4 for himodels
and from 1 to 13 for Di-models. Validation statistics showed that Di-models possess systematically better
agreement than both full-models and hi-models. Removal of influential compounds from training set significantly
improves the model and is recommended to be conducted in the process of quantitative structure-activity relationships
developing. Cook’s distance approach should be combined with hat matrix analysis in order to identify the compounds
candidates for removal.
Keywords: Influential points, leverage effect, model sensitivity, model validation, quantitative structure-activity relationship (QSAR), Cook’s distance.
Rights & PermissionsPrintExport