Background: Gene set enrichment analyses (GSEA) provide a useful and powerful
approach to identify differentially expressed gene sets with prior biological knowledge. Several
GSEA algorithms have been proposed to perform enrichment analyses on groups of genes.
However, many of these algorithms have focused on the identification of differentially expressed
gene sets in a given phenotype.
Objective: In this paper, we propose a gene set analytic framework, Gene Set Correlation Analysis
(GSCoA), that simultaneously measures within and between gene sets variation to identify sets of
genes enriched for differential expression and highly co-related pathways.
Methods: We apply co-inertia analysis to the comparisons of cross-gene sets in gene expression data
to measure the co-structure of expression profiles in pairs of gene sets. Co-inertia analysis (CIA) is
one multivariate method to identify trends or co-relationships in multiple datasets, which contain the
same samples. The objective of CIA is to seek ordinations (dimension reduction diagrams) of two
gene sets such that the square covariance between the projections of the gene sets on successive axes
is maximized. Simulation studies illustrate that CIA offers superior performance in identifying corelationships
between gene sets in all simulation settings when compared to correlation-based gene
Result and Conclusion: We also combine between-gene set CIA and GSEA to discover the
relationships between gene sets significantly associated with phenotypes. In addition, we provide a
graphical technique for visualizing and simultaneously exploring the associations between and
within gene sets and their interaction and network. We then demonstrate the integration of within
and between gene sets variation using CIA and GSEA, applied to the p53 gene expression data using
the c2 curated gene sets. Ultimately, the GSCoA approach provides an attractive tool for the
identification and visualization of novel associations between pairs of gene sets by integrating corelationships
between gene sets into gene set analysis.