Background: Colorectal cancer (CRC) is the third most common cancer among women and men in the USA, and recent studies have shown an increasing incidence in less developed regions, including Sub-Saharan Africa (SSA). We developed a hybrid (DNA mutation and RNA expression) signature and assessed its predictive properties for the mutation status and survival of CRC patients.
Methods: Publicly-available microarray and RNASeq data from 54 matched formalin-fixed paraffin-embedded (FFPE) samples from the Affymetrix GeneChip and RNASeq platforms, were used to obtain differentially expressed genes between mutant and wild-type samples. We applied the support-vector machines, artificial neural networks, random forests, k-nearest neighbor, naïve Bayes, negative binomial linear discriminant analysis, and the Poisson linear discriminant analysis algorithms for classification. Cox proportional hazards model was used for survival analysis.
Results: Compared to the genelist from each of the individual platforms, the hybrid genelist had the highest accuracy, sensitivity, specificity, and AUC for mutation status, across all the classifiers and is prognostic for survival in patients with CRC. NBLDA method was the best performer on the RNASeq data, while the SVM method was the most suitable classifier for CRC across the two data types. Nine genes were found to be predictive of survival.
Conclusion: This signature could be useful in clinical practice, especially for colorectal cancer diagnosis and therapy. Future studies should determine the effectiveness of integration in cancer survival analysis and the application on unbalanced data, where the classes are of different sizes, as well as on data with multiple classes.