Generic placeholder image

Current Bioinformatics

Editor-in-Chief

ISSN (Print): 1574-8936
ISSN (Online): 2212-392X

Research Article

Gene Set Correlation Analysis and Visualization Using Gene Expression Data

Author(s): Chen-An Tsai* and James J. Chen

Volume 16, Issue 3, 2021

Published on: 29 June, 2020

Page: [406 - 421] Pages: 16

DOI: 10.2174/1574893615999200629124444

Price: $65

Abstract

Background: Gene set enrichment analyses (GSEA) provide a useful and powerful approach to identify differentially expressed gene sets with prior biological knowledge. Several GSEA algorithms have been proposed to perform enrichment analyses on groups of genes. However, many of these algorithms have focused on the identification of differentially expressed gene sets in a given phenotype.

Objective: In this paper, we propose a gene set analytic framework, Gene Set Correlation Analysis (GSCoA), that simultaneously measures within and between gene sets variation to identify sets of genes enriched for differential expression and highly co-related pathways.

Methods: We apply co-inertia analysis to the comparisons of cross-gene sets in gene expression data to measure the co-structure of expression profiles in pairs of gene sets. Co-inertia analysis (CIA) is one multivariate method to identify trends or co-relationships in multiple datasets, which contain the same samples. The objective of CIA is to seek ordinations (dimension reduction diagrams) of two gene sets such that the square covariance between the projections of the gene sets on successive axes is maximized. Simulation studies illustrate that CIA offers superior performance in identifying corelationships between gene sets in all simulation settings when compared to correlation-based gene set methods.

Result and Conclusion: We also combine between-gene set CIA and GSEA to discover the relationships between gene sets significantly associated with phenotypes. In addition, we provide a graphical technique for visualizing and simultaneously exploring the associations between and within gene sets and their interaction and network. We then demonstrate the integration of within and between gene sets variation using CIA and GSEA, applied to the p53 gene expression data using the c2 curated gene sets. Ultimately, the GSCoA approach provides an attractive tool for the identification and visualization of novel associations between pairs of gene sets by integrating corelationships between gene sets into gene set analysis.

Keywords: Gene set enrichment analyses, gene set correlation analysis, co-inertia analysis, covariance, p53 gene expression data, gene set analysis.

Graphical Abstract
[1]
Mootha VK, Lindgren CM, Eriksson KF, et al. PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat Genet 2003; 34(3): 267-73.
[http://dx.doi.org/10.1038/ng1180] [PMID: 12808457]
[2]
Subramanian A, Tamayo P, Mootha VK, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA 2005; 102(43): 15545-50.
[http://dx.doi.org/10.1073/pnas.0506580102] [PMID: 16199517]
[3]
Goeman JJ, van de Geer SA, de Kort F, van Houwelingen HC. A global test for groups of genes: testing association with a clinical outcome. Bioinformatics 2004; 20(1): 93-9.
[http://dx.doi.org/10.1093/bioinformatics/btg382] [PMID: 14693814]
[4]
Tian L, Greenberg SA, Kong SW, Altschuler J, Kohane IS, Park PJ. Discovering statistically significant pathways in expression profiling studies. Proc Natl Acad Sci USA 2005; 102(38): 13544-9.
[http://dx.doi.org/10.1073/pnas.0506577102] [PMID: 16174746]
[5]
Tomfohr J, Lu J, Kepler TB. Pathway level analysis of gene expression using singular value decomposition. BMC Bioinformatics 2005; 6: 225.
[http://dx.doi.org/10.1186/1471-2105-6-225] [PMID: 16156896]
[6]
Chen JJ, Lee T, Delongchamp RR, Chen T, Tsai CA. Significance analysis of groups of genes in expression profiling studies. Bioinformatics 2007; 23(16): 2104-12.
[http://dx.doi.org/10.1093/bioinformatics/btm310] [PMID: 17553853]
[7]
Dinu I, Potter JD, Mueller T, et al. Improving gene set analysis of microarray data by SAM-GS. BMC Bioinformatics 2007; 8: 242.
[http://dx.doi.org/10.1186/1471-2105-8-242] [PMID: 17612399]
[8]
Efron B, Tibshirani R. On testing the significance of sets of genes. Ann Appl Stat 2007; 1: 107-29.
[http://dx.doi.org/10.1214/07-AOAS101]
[9]
Adewale AJ, Dinu I, Potter JD, Liu Q, Yasui Y. Pathway analysis of microarray data via regression. J Comput Biol 2008; 15(3): 269-77.
[http://dx.doi.org/10.1089/cmb.2008.0002] [PMID: 18331198]
[10]
Hummel M, Meister R, Mansmann U. GlobalANCOVA: exploration and assessment of gene group effects. Bioinformatics 2008; 24(1): 78-85.
[http://dx.doi.org/10.1093/bioinformatics/btm531] [PMID: 18024976]
[11]
Goeman JJ, Mansmann U. Multiple testing on the directed acyclic graph of gene ontology. Bioinformatics 2008; 24(4): 537-44.
[http://dx.doi.org/10.1093/bioinformatics/btm628] [PMID: 18203773]
[12]
Tsai CA, Chen JJ. Multivariate analysis of variance test for gene set analysis. Bioinformatics 2009; 25(7): 897-903.
[http://dx.doi.org/10.1093/bioinformatics/btp098] [PMID: 19254923]
[13]
Goeman JJ, Bühlmann P. Analyzing gene expression data in terms of gene sets: methodological issues. Bioinformatics 2007; 23(8): 980-7.
[http://dx.doi.org/10.1093/bioinformatics/btm051] [PMID: 17303618]
[14]
Nam D, Kim SY. Gene-set approach for expression pattern analysis. Brief Bioinform 2008; 9(3): 189-97.
[http://dx.doi.org/10.1093/bib/bbn001] [PMID: 18202032]
[15]
Brown VM, Ossadtchi A, Khan AH, Cherry SR, Leahy RM, Smith DJ. High-throughput imaging of brain gene expression. Genome Res 2002; 12(2): 244-54.
[http://dx.doi.org/10.1101/gr.204102] [PMID: 11827944]
[16]
Lai Y, Wu B, Chen L, Zhao H. A statistical method for identifying differential gene-gene co-expression patterns. Bioinformatics 2004; 20(17): 3146-55.
[http://dx.doi.org/10.1093/bioinformatics/bth379] [PMID: 15231528]
[17]
Kostka D, Spang R. Finding disease specific alterations in the co-expression of genes. Bioinformatics 2004; 20(Suppl. 1): i194-9.
[http://dx.doi.org/10.1093/bioinformatics/bth909] [PMID: 15262799]
[18]
Oldham MC, Horvath S, Geschwind DH. Conservation and evolution of gene coexpression networks in human and chimpanzee brains. Proc Natl Acad Sci USA 2006; 103(47): 17973-8.
[http://dx.doi.org/10.1073/pnas.0605938103] [PMID: 17101986]
[19]
Watson M. CoXpress: differential co-expression in gene expression data. BMC Bioinformatics 2006; 7: 509.
[http://dx.doi.org/10.1186/1471-2105-7-509] [PMID: 17116249]
[20]
Choi Y, Kendziorski C. Statistical methods for gene set co-expression analysis. Bioinformatics 2009; 25(21): 2780-6.
[http://dx.doi.org/10.1093/bioinformatics/btp502] [PMID: 19689953]
[21]
Hong S, Zhou Z, Zio E, Hong K. Condition assessment for the performance degradation of bearing based on a combinatorial feature extraction method. Digit Signal Process 2014; 27: 159-66.
[http://dx.doi.org/10.1016/j.dsp.2013.12.010]
[22]
Hong S, Zhou Z, Zio E, Wang W. An adaptive method for health trend prediction of rotating bearings. Digit Signal Process 2014; 35: 159-66.
[http://dx.doi.org/10.1016/j.dsp.2013.12.010]
[23]
Culhane AC, Perrière G, Considine EC, Cotter TG, Higgins DG. Between-group analysis of microarray data. Bioinformatics 2002; 18(12): 1600-8.
[http://dx.doi.org/10.1093/bioinformatics/18.12.1600] [PMID: 12490444]
[24]
Kim TM, Yim SH, Jeong YB, Jung YC, Chung YJ. PathCluster: a framework for gene set-based hierarchical clustering. Bioinformatics 2008; 24(17): 1957-8.
[http://dx.doi.org/10.1093/bioinformatics/btn357] [PMID: 18628289]
[25]
Donato M, Xu Z, Tomoiaga A, et al. Analysis and correction of crosstalk effects in pathway analysis. Genome Res 2013; 23(11): 1885-93.
[http://dx.doi.org/10.1101/gr.153551.112] [PMID: 23934932]
[26]
Del Sorbo MR, Balzano W, Donato M, Draghici S. Assessing co-regulation of directly linked genes in biological networks using microarray time series analysis. Biosystems 2013; 114(2): 149-54.
[http://dx.doi.org/10.1016/j.biosystems.2013.07.006] [PMID: 23876997]
[27]
Choi JK, Yu U, Yoo OJ, Kim S. Differential coexpression analysis using microarray data and its application to human cancer. Bioinformatics 2005; 21(24): 4348-55.
[http://dx.doi.org/10.1093/bioinformatics/bti722] [PMID: 16234317]
[28]
Rahmatallah Y, Emmert-Streib F, Glazko G. Gene Sets Net Correlations Analysis (GSNCA): a multivariate differential coexpression test for gene sets. Bioinformatics 2014; 30(3): 360-8.
[http://dx.doi.org/10.1093/bioinformatics/btt687] [PMID: 24292935]
[29]
Tomoiaga A, Westfall P, Donato M, et al. Pathway crosstalk effects: shrinkage and disentanglement using a Bayesian hierarchical model. Stat Biosci 2016; 8(2): 374-94.
[http://dx.doi.org/10.1007/s12561-016-9160-1]
[30]
Dolédec S, Chessel D. Co-inertia analysis: an alternative method for studying species-environment relationships. Freshw Biol 1994; 31: 277-94.
[http://dx.doi.org/10.1111/j.1365-2427.1994.tb01741.x]
[31]
Thioulouse J, Lobry JR. Co-inertia analysis of amino-acid physico-chemical properties and protein composition with the ADE package. Comput Appl Biosci 1995; 11(3): 321-9.
[http://dx.doi.org/10.1093/bioinformatics/11.3.321] [PMID: 7583702]
[32]
Jeffery IB, Madden SF, McGettigan PA, Perrière G, Culhane AC, Higgins DG. Integrating transcription factor binding site information with gene expression datasets. Bioinformatics 2007; 23(3): 298-305.
[http://dx.doi.org/10.1093/bioinformatics/btl597] [PMID: 17127681]
[33]
Culhane AC, Perrière G, Higgins DG. Cross-platform comparison and visualisation of gene expression data using co-inertia analysis. BMC Bioinformatics 2003; 4: 59.
[http://dx.doi.org/10.1186/1471-2105-4-59] [PMID: 14633289]
[34]
Culhane AC, Thioulouse J, Perrière G, Higgins DG. MADE4: an R package for multivariate analysis of gene expression data. Bioinformatics 2005; 21(11): 2789-90.
[http://dx.doi.org/10.1093/bioinformatics/bti394] [PMID: 15797915]
[35]
Thioulouse J, Chessel D, Dolédec S, et al. ADE-4: a multivariate analysis and graphical display software. Stat Comput 1997; 7(1): 75-83.
[http://dx.doi.org/10.1023/A:1018513530268]
[36]
Chessel D, Dufour AB, Thioulouse J. The ADE4 package-I: One-table methods. R News 2004; 4(1): 5-10.
[37]
Dray S, Chessel D, Thioulouse J. Co-inertia analysis and the linking of ecological tables. Ecology 2003; 84(11): 3078-89.
[http://dx.doi.org/10.1890/03-0178]
[38]
Totani L, Piccoli A, Dell’Elba G, et al. Phosphodiesterase type 4 blockade prevents platelet-mediated neutrophil recruitment at the site of vascular injury. Arterioscler Thromb Vasc Biol 2014; 34(8): 1689-96.
[http://dx.doi.org/10.1161/ATVBAHA.114.303939] [PMID: 24925970]
[39]
Trivedi CM, Patel RC, Patel CV. Homeobox gene HOXA9 inhibits nuclear factor-κ B dependent activation of endothelium. Atherosclerosis 2007; 195(2): e50-60.
[http://dx.doi.org/10.1016/j.atherosclerosis.2007.04.055] [PMID: 17586512]
[40]
Huang RS, Duan S, Bleibel WK, et al. A genome-wide approach to identify genetic variants that contribute to etoposide-induced cytotoxicity. Proc Natl Acad Sci USA 2007; 104(23): 9758-63.
[http://dx.doi.org/10.1073/pnas.0703736104] [PMID: 17537913]
[41]
Pickrell JK, Marioni JC, Pai AA, et al. Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature 2010; 464(7289): 768-72.
[http://dx.doi.org/10.1038/nature08872] [PMID: 20220758]
[42]
Hänzelmann S, Castelo R, Guinney J. GSVA: gene set variation analysis for microarray and RNA-seq data. BMC Bioinformatics 2013; 14: 7.
[http://dx.doi.org/10.1186/1471-2105-14-7] [PMID: 23323831]
[43]
Wang C, Gong B, Bushel PR, et al. The concordance between RNA-seq and microarray data depends on chemical treatment and transcript abundance. Nat Biotechnol 2014; 32(9): 926-32.
[http://dx.doi.org/10.1038/nbt.3001] [PMID: 25150839]

Rights & Permissions Print Cite
© 2024 Bentham Science Publishers | Privacy Policy