Application of a Deep Matrix Factorization Model on Integrated Gene Expression Data

Yong-Jing      Hao; Mi-Xiao       Hou; Ying-Lian       Gao; Jin-Xing      Liu; Xiang-Zhen      Kong

Abstract

Background: Non-negative Matrix Factorization (NMF) has been extensively used in gene expression data. However, most NMF-based methods have single-layer structures, which may achieve poor performance for complex data. Deep learning, with its carefully designed hierarchical structure, has shown significant advantages in learning data features.

Objective: In bioinformatics, on the one hand, to discover differentially expressed genes in gene expression data; on the other hand, to obtain higher sample clustering results. It can provide the reference value for the prevention and treatment of cancer.

Method: In this paper, we apply a deep NMF method called Deep Semi-NMF on the integrated gene expression data. In each layer, the coefficient matrix is directly decomposed into the basic and coefficient matrix of the next layer. We apply this factorization model on The Cancer Genome Atlas (TCGA) genomic data.

Results: The experimental results demonstrate the superiority of Deep Semi-NMF method in identifying differentially expressed genes and clustering samples.

Conclusion: The Deep Semi-NMF model decomposes a matrix into multiple matrices and multiplies them to form a matrix. It can also improve the clustering performance of samples while digging out more accurate key genes for disease treatment.

Keywords: NMF, gene expression data, TCGA, deep semi-NMF, feature selection, clustering.

« Previous Next »

Graphical Abstract

[1] 
Zhang Q, Sheng J. [Development and application of gene chip technology]. Zhongguo Yi Xue Ke Xue Yuan Xue Bao  2008; 30(3): 344-7.
[PMID: 18686620] 
[2] 
Wang Y, Zeng X, Iyer NJ, Bryant DW, Mockler TC, Mahalingam R. Exploring the switchgrass transcriptome using second-generation sequencing technology. PLoS One  2012; 7(3) e34225
[http://dx.doi.org/10.1371/journal.pone.0034225] [PMID: 22479570] 
[3] 
Ding C, Peng H. Minimum redundancy feature selection from microarray gene expression data. J Bioinform Comput Biol  2005; 3(2): 185-205.
[http://dx.doi.org/10.1142/S0219720005001004] [PMID: 15852500] 
[4] 
Tamborero D, Gonzalez-Perez A, Lopez-Bigas N. Oncodrive- CLUST: exploiting the positional clustering of somatic mutations to identify cancer genes. Bioinformatics  2013; 29(18): 2238-44.
[http://dx.doi.org/10.1093/bioinformatics/btt395] [PMID: 23884480] 
[5] 
Dai XH, Wang Z, Jiang P, Xia F, Sun YX. Survey on Intelligent Information Processing in Wireless Sensor Networks. Chuangan Jishu Xuebao  2006; 3794(9): 123-32.
[6] 
Abdi H, Williams LJ. Principal component analysis. Wiley Interdiscip Rev Comput Stat  2010; 2(4): 433-59.
[http://dx.doi.org/10.1002/wics.101] 
[7] 
Skrobot VL, Castro EVR, Pereira RCC, Pasa VMD, Fortes ICP. Use of Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) in Gas Chromatographic (GC) Data in the Investigation of Gasoline Adulteration. Energy Fuels  2007; 21(6): 3394-400.
[http://dx.doi.org/10.1021/ef0701337] 
[8] 
Roweis ST, Saul LK. Nonlinear dimensionality reduction by locally linear embedding. Science  2000; 290(5500): 2323-6.
[http://dx.doi.org/10.1126/science.290.5500.2323] [PMID: 11125150] 
[9] 
Liu C. Comparative Assessment of Independent Component Analysis (ICA) for Face Recognition. Appears in the Second International Conference on Audio- and Video-based Biometric Person AuthenticationAVBPA’99.  Washington D.C. USA  1999.
[10] 
Lee DD, Seung HS. Learning the parts of objects by non-negative matrix factorization. Nature  1999; 401(6755): 788-91.
[http://dx.doi.org/10.1038/44565] [PMID: 10548103] 
[11] 
Peharz R, Pernkopf F. Sparse nonnegative matrix factorization with ℓ(0)-constraints. Neurocomputing  2012; 80(1): 38-46.
[http://dx.doi.org/10.1016/j.neucom.2011.09.024] [PMID: 22505792] 
[12] 
Shen B, Liu BD, Wang Q, Ji R. Robust nonnegative matrix factorization via L1 norm regularization by multiplicative updating rules. IEEE International Conference on Image Processing (ICIP) 2014.
[13] 
Dai LY, Chun-Mei F, Jin-Xing L, Chun-Hou Z, Jiguo Y, Mi-Xiao H. Robust nonnegative matrix factorization via joint graph laplacian and discriminative information for identifying differentially expressed genes. Complexity  2017; 14: 1-11.
[14] 
Cai D, He X, Han J, Huang TS. Graph regularized nonnegative matrix factorization for data representation. IEEE Trans Pattern Anal Mach Intell  2011; 33(8): 1548-60.
[http://dx.doi.org/10.1109/TPAMI.2010.231] [PMID: 21173440] 
[15] 
Kong D, Ding C, Huang H. Robust nonnegative matrix factorization using l21-norm. Proceedings of the 20th ACM International Conference On Information And Knowledge Management  2011; 673-82.
[http://dx.doi.org/10.1145/2063576.2063676] 
[16] 
Long X, Lu H, Peng Y, Li W. Graph regularized discriminative non-negative matrix factorization for face recognition. Multimedia Tools Appl  2014; 72(3): 2679-99.
[http://dx.doi.org/10.1007/s11042-013-1572-z] 
[17] 
Ding C, Li T, Jordan MI. Convex and semi-nonnegative matrix factorizations. IEEE Trans Pattern Anal Mach Intell  2010; 32(1): 45-55.
[http://dx.doi.org/10.1109/TPAMI.2008.277] [PMID: 19926898] 
[18] 
Trigeorgis G, Bousmalis K, Zafeiriou S, Schuller BW. A deep matrix factorization method for learning attribute representations. IEEE Trans Pattern Anal Mach Intell  2017; 39(3): 417-29.
[http://dx.doi.org/10.1109/TPAMI.2016.2554555] [PMID: 28113886] 
[19] 
Lee D, Seung H. Algorithms for Non-negative Matrix Factorization. In: Proceedings of the 13th International Conference on Neural Information Processing Systems.   2001; pp. 556-62.
[20] 
Wang L, Zhang Y, Feng J. On the Euclidean distance of images. IEEE Trans Pattern Anal Mach Intell  2005; 27(8): 1334-9.
[http://dx.doi.org/10.1109/TPAMI.2005.165] [PMID: 16119271] 
[21] 
Oh JH, Gao J, Rosenblatt K. Biological Data Outlier Detection Based on Kullback-Leibler Divergence. IEEE International Conference on Bioinformatics and Biomedicine USA  2008; 249-54.
[http://dx.doi.org/10.1109/BIBM.2008.76] 
[22] 
Seshadri V. The Inverse Gaussian Distribution: Statistical Theory and Applications. Technometrics  1999; 32(2): 235-5.
[http://dx.doi.org/10.1007/978-1-4612-1456-4] 
[23] 
Lizama C. The Poisson distribution, abstract fractional difference equations, and stability. Proc Am Math Soc  2017; 145: 3809-27.
[http://dx.doi.org/10.1090/proc/12895] 
[24] 
Nakatsukasa Y, Soma T. Finding a low-rank basis in a matrix subspace. Springer-Verlag: New York 2017.
[http://dx.doi.org/10.1007/s10107-016-1042-2] 
[25] 
Hall-Aspland SA, Hall AP, Rogers TL. A new approach to the solution of the linear mixing model for a single isotope: application to the case of an opportunistic predator. Oecologia  2005; 143(1): 143-7.
[http://dx.doi.org/10.1007/s00442-004-1783-0] [PMID: 15599768] 
[26] 
Wold S, Esbensen K, Geladi P. Principal component analysis. Chemom Intell Lab Syst  1987; 2(1): 37-52.
[http://dx.doi.org/10.1016/0169-7439(87)80084-9] 
[27] 
Roux JL, Hershey JR, Weninger F. Deep NMF for speech separation. IEEE International Conference on Acoustics 2015.

Rights & Permissions Print Cite

Article Metrics

30

2

Journal Information

For Authors

For Editors

For Reviewers

Explore Articles

Open Access

Open Access Articles

For Visitors

DOI https://dx.doi.org/10.2174/1574893614666191017094331	Print ISSN 1574-8936
Publisher Name Bentham Science Publisher	Online ISSN 2212-392X

Current Bioinformatics

Application of a Deep Matrix Factorization Model on Integrated Gene Expression Data

Abstract

Graphical Abstract

Related Journals

Related Books

Related Articles