Background: Alzheimer's disease is the most common form of dementia, characterized by loss of neurons and synapses in the cerebral cortex and certain subcortical regions, leading to altered and unsuitable activities. In this study, we focused on the influence of rndom selection (RS) and SOMs (self-organizing maps) data splitting on the external predictivity of quantitative structure-property relationship (QSPR) models of some N-ary derivatives as butyrylcholinesterase (BChE) inhibitors. A QSPR model relates molecular descriptors to a chemical property can save time and money in drug discovery and development. Model validation is a critical step in QSPR model generation; for this purpose, it is necessary to carry out the data splitting on original data set. The GA is very useful for finding global minima for high dimensionality of the problem (e.g. variable selection in QSPR) when response surface has many local optima.
Methods: The molecular structures and experimental values for inhibition constants of BChE catalytic activities obtained from the literature. In this study, total number of 88 compounds was divided into training and test set by means of RS and SOM methods. The Chem3D module was used in order to create the 3D structures of compounds; geometry optimization, using the Polak-Ribiere algorithm. Using Dragon package over 1145 molecular descriptors such as 3D-MoRSE, GETAWAY and WHIM descriptors were derived to characterize the structures of ChEs inhibitors derivatives, properly. The constant variables, variables which have low correlation with response and collinear descriptors were omitted, and the number of descriptors was reduced to 422 in the data set. The QSAR models were constructed using stepwise-MLR and GA-MLR.
Results: The best MLR models with four, five and six variables were built to obtain the best QSAR model. The best multivariate linear models in both stepwise-MLR and GA-MLR methods had five variables. The best significant relationships, using comparison of Q2 of models, for logki values of BChE catalytic activity inhibition in the models obtained in S-MLR and GA-MLR methods are presented for all of the random sets and SOM set.
Conclusion: The results of this study showed that a GA-MLR generally performs better than stepwise-MLR. The five variable models were chosen as the best models after evaluating the other models in both GA-MLR and S-MLR methods. The Q2 results indicate that the test set consists of compounds that are evenly distributed within the chemical space; hence, in QSPR modeling, rational splitting methods such as SOM rather than random selection should be used. Also, the Q2 comparison of GA-MLR and stepwise-MLR methods highlights the power of GA-MLR for feature selection. According to the interpretation of QSPR descriptors indicates that, QSAR equation can be useful in designing new N-aryl derivatives as butyrylcholinesterase inhibitors compounds with improved inhibition catalytic activity