In this review, we have discussed the class-prediction and discovery methods that are applied to gene expression data, along with the implications of the findings. We attempted to present a unified approach that considers both class-prediction and class-discovery. We devoted a substantial part of this review to an overview of pattern classification/recognition methods and discussed important issues such as preprocessing of gene expression data, curse of dimensionality, feature extraction/selection, and measuring or estimating classifier performance. We discussed and summarized important properties such as generalizability (sensitivity to overtraining), built-in feature selection, ability to report prediction strength, and transparency (ease of understanding of the operation) of different class-predictor design approaches to provide a quick and concise reference. We have also covered the topic of biclustering, which is an emerging clustering method that processes the entries of the gene expression data matrix in both gene and sample directions simultaneously, in detail.
Keywords: cDNA microarrays, Fisher's Linear Discriminant Analysis (FLDA), Artificial Neural Networks, multidimensional scaling, cross-validation (CV), Super-Paramagnetic Clustering algorithm
Rights & PermissionsPrintExport