Background: The poly(ADP-ribose) polymerases (PARP) is a nuclear enzyme superfamily
present in eukaryotes.
Methods: In the present report, some efficient linear and non-linear methods including multiple linear
regression (MLR), support vector machine (SVM) and artificial neural networks (ANN) were successfully
used to develop and establish quantitative structure-activity relationship (QSAR) models
capable of predicting pEC50 values of tetrahydropyridopyridazinone derivatives as effective PARP
inhibitors. Principal component analysis (PCA) was used to a rational division of the whole data set
and selection of the training and test sets. A genetic algorithm (GA) variable selection method was
employed to select the optimal subset of descriptors that have the most significant contributions to
the overall inhibitory activity from the large pool of calculated descriptors.
Results: The accuracy and predictability of the proposed models were further confirmed using crossvalidation,
validation through an external test set and Y-randomization (chance correlations) approaches.
Moreover, an exhaustive statistical comparison was performed on the outputs of the proposed
models. The results revealed that non-linear modeling approaches, including SVM and ANN
could provide much more prediction capabilities.
Conclusion: Among the constructed models and in terms of root mean square error of predictions
(RMSEP), cross-validation coefficients (Q2 LOO and Q2 LGO), as well as R2 and F-statistical value for
the training set, the predictive power of the GA-SVM approach was better. However, compared with
MLR and SVM, the statistical parameters for the test set were more proper using the GA-ANN model.