Data Integration of Hybrid Microarray and Single Cell Expression Data to Enhance Gene Network Inference

Author(s): Wei Zhang, Wenchao Li, Jianming Zhang*, Ning Wang.

Journal Name: Current Bioinformatics

Volume 14 , Issue 3 , 2019

Become EABM
Become Reviewer

Graphical Abstract:


Abstract:

Background: Gene Regulatory Network (GRN) inference algorithms aim to explore casual interactions between genes and transcriptional factors. High-throughput transcriptomics data including DNA microarray and single cell expression data contain complementary information in network inference.

Objective: To enhance GRN inference, data integration across various types of expression data becomes an economic and efficient solution.

Method: In this paper, a novel E-alpha integration rule-based ensemble inference algorithm is proposed to merge complementary information from microarray and single cell expression data. This paper implements a Gradient Boosting Tree (GBT) inference algorithm to compute importance scores for candidate gene-gene pairs. The proposed E-alpha rule quantitatively evaluates the credibility levels of each information source and determines the final ranked list.

Results: Two groups of in silico gene networks are applied to illustrate the effectiveness of the proposed E-alpha integration. Experimental outcomes with size50 and size100 in silico gene networks suggest that the proposed E-alpha rule significantly improves performance metrics compared with single information source.

Conclusion: In GRN inference, the integration of hybrid expression data using E-alpha rule provides a feasible and efficient way to enhance performance metrics than solely increasing sample sizes.

Keywords: Gene regulatory network, ensemble inference, gradient boosting tree, data integration.

[1]
Kholodenko B, Yaffe MB, Kolch W. Computational approaches for analyzing information flow in biological networks. Sci Signal 2012; 5(220): re1.
[2]
Bower JM, Bolouri H. Computational modeling of genetic and biochemical networks 2004.
[3]
Imam S, Schäuble S, Brooks AN, Baliga NS, Price ND. Data-driven integration of genome-scale regulatory and metabolic network models. Front Microbiol 2015; 6: 409-18.
[4]
Gyori BM, Bachman JA, Subramanian K, Muhlich JL, Galescu L, Sorger PK. From word models to executable models of signaling networks using automated assembly. Mol Syst Biol 2017; 13(11): 954-79.
[5]
Rodrigo G, Carrera J, Landrain TE, Jaramillo A. Perspectives on the automatic design of regulatory systems for synthetic biology. FEBS Lett 2012; 586(15): 2037-42.
[6]
Bitencourt-Ferreira G, de Azevedo WF. Development of a machine-learning model to predict Gibbs free energy of binding for protein-ligand complexes. Biophys Chem 2018; 240: 63-9.
[7]
de Ávila MB, de Azevedo WF Jr. Development of machine learning models to predict inhibition of 3-dehydroquinate dehydratase. Chem Biol Drug Des 2018; 92(2): 1468-74.
[8]
Amaral MEA, Nery LR, Leite CE, de Azevedo Junior W.F., Campos MM. Pre-clinical effects of metformin and aspirin on the cell lines of different breast cancer subtypes. Invest New Drugs 2018; 36(5): 782-96.
[9]
Levin NMB, Pintro VO, Bitencourt-Ferreira G, de Mattos BB, de Castro Silvério A, de Azevedo WF Jr. Development of CDK-targeted scoring functions for prediction of binding affinity. Biophys Chem 2018; 235: 1-8.
[10]
Xavier MM, Heck GS, Avila MB, et al. SAnDReS a computational tool for statistical analysis of docking results and development of scoring functions. Comb Chem High Throughput Screen 2016; 19(10): 801-12.
[11]
Marbach D, Costello JC, Küffner R, et al. Wisdom of crowds for robust gene network inference. Nat Methods 2012; 9(8): 796-804.
[12]
Huynh-Thu VA, Irrthum A, Wehenkel L, Geurts P. Inferring regulatory networks from expression data using tree-based methods. PLoS One 2010; 5(9): e12776.
[13]
Liu LZ, Wu FX, Zhang WJ. A group LASSO-based method for robustly inferring gene regulatory networks from multiple time-course datasets. BMC Syst Biol 2014; 8(Suppl. 3): S1.
[14]
Huynh-Thu VA, Geurts P. dynGENIE3: dynamical GENIE3 for the inference of gene networks from time series expression data. Sci Rep 2018; 8(1): 3384-95.
[15]
Huynh-Thu VA, Sanguinetti G. Combining tree-based and dynamical systems for the inference of gene regulatory networks. Bioinformatics 2015; 31(10): 1614-22.
[16]
Sławek J, Arodź T. ENNET: inferring large gene regulatory networks from expression data using gradient boosting. BMC Syst Biol 2013; 7: 106-18.
[17]
Lim N, Senbabaoglu Y, Michailidis G, d’Alché-Buc F. OKVAR-Boost: a novel boosting algorithm to infer nonlinear dynamics and interactions in gene regulatory networks. Bioinformatics 2013; 29(11): 1416-23.
[18]
Park S, Kim JM, Shin W, et al. BTNET: boosted tree based gene regulatory network inference algorithm using time-course measurement data. BMC Syst Biol 2018; 12(Suppl. 2): 20-30.
[19]
Zarayeneh N, Ko E, Oh JH, et al. Integration of multi-omics data for integrative gene regulatory network inference. Int J Data Min Bioinform 2017; 18(3): 223-39.
[20]
Barzel B, Barabási AL. Network link prediction by global silencing of indirect correlations. Nat Biotechnol 2013; 31(8): 720-5.
[21]
Feizi S, Marbach D, Médard M, Kellis M. Network deconvolution as a general method to distinguish direct dependencies in networks. Nat Biotechnol 2013; 31(8): 726-33.
[22]
Ghanat Bari M, Ung CY, Zhang C, Zhu S, Li H. Machine Learningassisted network inference approach to identify a new class of genes that coordinate the functionality of cancer networks. Sci Rep 2017; 7(1): 6993-7005.
[23]
Tabe-Bordbar S, Emad A, Zhao SD, Sinha S. A closer look at cross-validation for assessing the accuracy of gene regulatory networks and models. Sci Rep 2018; 8(1): 6620-30.
[24]
Lin D, Zhang J, Li J, Calhoun VD, Deng HW, Wang YP. Group sparse canonical correlation analysis for genomic data integration. BMC Bioinformatics 2013; 14: 245-61.
[25]
Petralia F, Wang P, Yang J, Tu Z. Integrative random forest for gene regulatory network inference. Bioinformatics 2015; 31(12): i197-205.
[26]
Banf M, Rhee SY. Enhancing gene regulatory network inference through data integration with markov random fields. Sci Rep 2017; 7: 41174.
[27]
Ocone A, Haghverdi L, Mueller NS, Theis FJ. Reconstructing gene regulatory dynamics from high-dimensional single-cell snapshot data. Bioinformatics 2015; 31(12): i89-96.
[28]
Matsumoto H, Kiryu H, Furusawa C, et al. SCODE: an efficient regulatory network inference algorithm from single-cell RNA-Seq during differentiation. Bioinformatics 2017; 33(15): 2314-21.
[29]
Chan TE, Stumpf MPH, Babtie AC. Gene regulatory network inference from single-cell data using multivariate information measures Cell Syst 2017; 5(3): 251-267. e3.
[30]
Chen S, Mar JC. Evaluating methods of inferring gene regulatory networks highlights their lack of performance for single cell gene expression data. BMC Bioinformatics 2018; 19(1): 232.
[31]
Ma T, Liang F, Oesterreich S, Tseng GC. A joint Bayesian model for integrating microarray and RNA sequencing transcriptomic data. J Comput Biol 2017; 24(7): 647-62.
[32]
Castillo D, Gálvez JM, Herrera LJ, Román BS, Rojas F, Rojas I. Integration of RNA-Seq data with heterogeneous microarray data for breast cancer profiling. BMC Bioinformatics 2017; 18(1): 506.
[33]
Nookaew I, Papini M, Pornputtapong N, et al. A comprehensive comparison of RNA-Seq-based transcriptome analysis from reads to differential gene expression and cross-comparison with microarrays: a case study in Saccharomyces cerevisiae. Nucleic Acids Res 2012; 40(20): 10084-97.
[34]
Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc Series B Stat Methodol 1996; 58: 267-88.
[35]
Hwang D, Rust AG, Ramsey S, et al. A data integration methodology for systems biology. Proc Natl Acad Sci USA 2005; 102(48): 17296-301.
[36]
Lam KY, Westrick ZM, Müller CL, Christiaen L, Bonneau R. Fused regression¨ for multi-source gene regulatory network inference. PLOS Comput Biol 2016; 12(12): e1005157.
[37]
Schaffter T, Marbach D, Floreano D. GeneNetWeaver: in silico benchmark generation and performance profiling of network inference methods. Bioinformatics 2011; 27(16): 2263-70.
[38]
Comprehensive molecular portraits of human breast tumours. Nature 2012; 490(7418): 61-70.
[39]
Curtis C, Shah SP, Chin SF, et al. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature 2012; 486(7403): 346-52.
[40]
Metzger-Filho O, Michiels S, Bertucci F, et al. Genomic grade adds prognostic value in invasive lobular carcinoma. Ann Oncol 2013; 24(2): 377-84.
[41]
Edgar R, Domrachev M, Lash AE. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 2002; 30(1): 207-10.
[42]
Nascimento M, Silva FFE, Sáfadi T, et al. Independent Component Analysis (ICA) based-clustering of temporal RNA-seq data. PLoS One 2017; 12(7): e0181195.


Rights & PermissionsPrintExport Cite as

Article Details

VOLUME: 14
ISSUE: 3
Year: 2019
Page: [255 - 268]
Pages: 14
DOI: 10.2174/1574893614666190104142228

Article Metrics

PDF: 47
HTML: 4