Background: Colorectal cancer (CRC) is the third most common cancer among women
and men in the USA, and recent studies have shown an increasing incidence in less developed
regions, including Sub-Saharan Africa (SSA). We developed a hybrid (DNA mutation and RNA
expression) signature and assessed its predictive properties for the mutation status and survival of
Methods: Publicly-available microarray and RNASeq data from 54 matched formalin-fixed
paraffin-embedded (FFPE) samples from the Affymetrix GeneChip and RNASeq platforms, were
used to obtain differentially expressed genes between mutant and wild-type samples. We applied
the support-vector machines, artificial neural networks, random forests, k-nearest neighbor, naïve
Bayes, negative binomial linear discriminant analysis, and the Poisson linear discriminant analysis
algorithms for classification. Cox proportional hazards model was used for survival analysis.
Results: Compared to the genelist from each of the individual platforms, the hybrid genelist had
the highest accuracy, sensitivity, specificity, and AUC for mutation status, across all the classifiers
and is prognostic for survival in patients with CRC. NBLDA method was the best performer on the
RNASeq data, while the SVM method was the most suitable classifier for CRC across the two data
types. Nine genes were found to be predictive of survival.
Conclusion: This signature could be useful in clinical practice, especially for colorectal cancer
diagnosis and therapy. Future studies should determine the effectiveness of integration in cancer
survival analysis and the application on unbalanced data, where the classes are of different sizes,
as well as on data with multiple classes.