Predictive QSAR Models for Polyspecific Drug Targets: The Importance of Feature Selection
Michael A. Demel,
Andreas G.K. Janecek,
Gerhard F. Ecker,
Wilfried N. Gansterer.
Since the advent of QSAR (quantitative structure activity relationship) modeling quantitative representations of molecular structures are encoded in terms of information-preserving descriptor values. Nowadays, a nearly infinite variety of potential descriptors is available and descriptor selection is no longer a task which can be done manually. There is an increasing need for automation in order to reduce the dimensionality of the descriptor space. Classical feature selection (FS) and dimensionality reduction (DR) methods like principal component analysis, which relies on the selection of those descriptors that contribute most to the variance of a data set, often fail in providing the best classification result. More sophisticated methods like genetic algorithms, self-organizing-maps and stepwise linear discriminant analysis have proven to be useful techniques in the process of selecting descriptors with a significant discriminative power. The topic FS and DR becomes even more important when predictive models are approached which should describe the QSAR of highly promiscuous target proteins. The ABC-transporter family, the cardiac hERG-potassium channel, and the hepatic cytochrom-P450-family are classical representatives of such poly-specific proteins. In this case the interaction pattern is a rather complex one and thus the selection of the most predictive descriptors needs advanced methods. This review surveys FS and DR methods that have recently been successfully applied to classify ligands of poly-specific target proteins.
Keywords: Feature selection, dimensionality reduction, machine learning, chemical descriptors, ABC-transporter, nuclear hormone receptors, cytochrome P450, hERG
Rights & PermissionsPrintExport