A discriminant analysis model is presented for carcinogenic risk. The data set is obtained from the two-year rodent study FDA/CDER database and was divided into a training set of 1022 organic compounds and an external validation test set of 50 compounds. The model is designed to use as a decision support tool for a defined decision threshold, and is thus a binary discrimination into “high risk” and “low risk” categories. The carcinogenic risk classification is based on the method for estimating human risk from two-year rodent studies developed at the FDA/CDER/ICSAS. The paradigm chosen for this model allows a straightforward risk analysis based on historic information, as well as the computation of coverage, probability and confidence metrics that can further qualify the computed result. The molecular structures were represented as MDL mol files. The molecular structure information was obtained as topological structure descriptors, including atom-type and group-type E-State and hydrogen E-State indices, molecular connectivity chi indices, topological polarity, and counts of molecular features. The MDL®QSAR software computed all these descriptors. Furthermore, the discriminant analyses were all performed with the MDL®QSAR software. The reported model is based on fifty-three descriptors, using the nonparametric normal kernel method and the Mahalanobis distance to determine proximity. The model performed very well on the fifty compounds of the test set, yielding the following statistics: 76% correctly classified “high risk” (carcinogenic) and 84% correctly classified as “low risk” (non-carcinogenic).
Keywords: carcinogenicity, discriminant analysis, in silico, predictive toxicology, topological structure descriptors, qsar, e-state, chemoinformatics
Rights & PermissionsPrintExport