Background: Proper validation is an important aspect of QSAR modelling. External
validation is one of the widely used validation methods in QSAR where the model is built on a subset
of the data and validated on the rest of the samples. However, its effectiveness for datasets with a
small number of samples but a large number of predictors remains suspect.
Objective: Calculating hundreds or thousands of molecular descriptors using currently available
software has become the norm in QSAR research, owing to computational advances in the past few
decades. Thus, for n chemical compounds and p descriptors calculated for each molecule, the typical
chemometric dataset today has a high value of p but small n (i.e. n << p). Motivated by the evidence
of inadequacies of external validation in estimating the true predictive capability of a statistical model
in recent literature, this paper performs an extensive and comparative study of this method with several
other validation techniques.
Methodology: We compared four validation methods: Leave-one-out, K-fold, external and multi-split
validation, using statistical models built using the LASSO regression, which simultaneously performs
variable selection and modelling. We used 300 simulated datasets and one real dataset of 95
congeneric amine mutagens for this evaluation.
Results: External validation metrics have high variation among different random splits of the data,
hence are not recommended for predictive QSAR models. LOO has the overall best performance
among all validation methods applied in our scenario.
Conclusion: Results from external validation are too unstable for the datasets we analyzed. Based on
our findings, we recommend using the LOO procedure for validating QSAR predictive models built on
high-dimensional small-sample data.