Adapting Interrelated Two-Way Clustering Method for Quantitative Structure-Activity Relationship (QSAR) Modeling of Mutagenicity/Non- Mutagenicity of a Diverse Set of Chemicals
Subhash C. Basak,
Gregory D. Grunwald.
Interrelated Two-way Clustering (ITC) is an unsupervised clustering method developed to divide samples into
two groups in gene expression data obtained through microarrays, selecting important genes simultaneously in the
process. This has been found to be a better approach than conventional clustering methods like K-means or selforganizing
map for the scenarios when number of samples is much smaller than number of variables (n«p). In this paper
we used the ITC approach for classification of a diverse set of 508 chemicals regarding mutagenicity. A large number of
topological indices (TIs), 3-dimensional, and quantum chemical descriptors, as well as atom pairs (APs) has been used as
explanatory variables. In this paper, ITC has been used only for predictor selection, after which ridge regression is
employed to build the final predictive model. The proper leave-one-out (LOO) method of cross-validation in this scenario
is to take as holdout each of the 508 compounds before predictor thinning and compare the predicted values with the
experimental data. ITC based results obtained here are comparable to those developed earlier.
Keywords: Atom pairs, interrelated two-way clustering, mutagenicity, quantum chemical descriptors, ridge regression,
Rights & PermissionsPrintExport