Title:Probing the Hypothesis of SAR Continuity Restoration by the Removal of Activity Cliffs Generators in QSAR
VOLUME: 22 ISSUE: 33
Author(s):Maykel Cruz-Monteagudo, José L. Medina-Franco, Yunier Perera-Sardiña, Fernanda Borges, Eduardo Tejera, Cesar Paz-y-Miño, Yunierkis Pérez-Castillo, Aminael Sánchez-Rodríguez, Zuleidys Contreras-Posada and M. Natália D. S. Cordeiro
Affiliation:Departamento de Química e Bioquímica, Faculdade de Ciências, Universidade do Porto, Rua do Campo Alegre, s/n, 4169-007 Porto, Portugal.
Keywords:Activity cliffs, activity cliffs generators, chemoinformatics, machine learning, QSAR, virtual screening.
Abstract:In this work we report the first attempt to study the effect of activity cliffs over the
generalization ability of machine learning (ML) based QSAR classifiers, using as study case a
previously reported diverse and noisy dataset focused on drug induced liver injury (DILI) and
more than 40 ML classification algorithms. Here, the hypothesis of structure-activity relationship
(SAR) continuity restoration by activity cliffs removal is tested as a potential solution to
overcome such limitation. Previously, a parallelism was established between activity cliffs
generators (ACGs) and instances that should be misclassified (ISMs), a related concept from
the field of machine learning. Based on this concept we comparatively studied the classification
performance of multiple machine learning classifiers as well as the consensus classifier
derived from predictive classifiers obtained from training sets including or excluding ACGs.
The influence of the removal of ACGs from the training set over the virtual screening performance was also studied
for the respective consensus classifiers algorithms. In general terms, the removal of the ACGs from the training
process slightly decreased the overall accuracy of the ML classifiers and multi-classifiers, improving their sensitivity
(the weakest feature of ML classifiers trained with ACGs) but decreasing their specificity. Although these results
do not support a positive effect of the removal of ACGs over the classification performance of ML classifiers,
the “balancing effect” of ACG removal demonstrated to positively influence the virtual screening performance of
multi-classifiers based on valid base ML classifiers. Specially, the early recognition ability was significantly favored
after ACGs removal. The results presented and discussed in this work represent the first step towards the application
of a remedial solution to the activity cliffs problem in QSAR studies.