Background: Hepatocellular carcinoma (HCC) is one of the malignancies with high mortality rate, and identify relevant biomarkers of HCC is helpful for early diagnosis and patient care. Though some high-dimensional omic data contains intrinsic biomedical information about HCC, how to integrate analysis them effectively and find promising biomarkers of HCC is still an important and difficult issue.
Methods: We present a novel biomarker identification approach, named GEDNN, based on multi-omic data and graph-embedded deep neural network. To achieve a more comprehensive understanding of HCC, we first collected and normalized the three following types of HCC-related data: DNA methylation, copy number variation (CNV), and gene expression. The ANOVA was adopted to filter out redundant genes. Then, we measured the connectivity between gene pairs by Pearson correlation coefficient of gene pairs, and further construct gene graph. Next, graph-embedded feedforward neural network (DFN) and back-propagation of convolutional neural network (CNN) were combined to integratively analyze the three types of omics data and achieve the importance score of gene biomarkers.
Results: Extensive experimental results showed that the biomarkers screened by the proposed method were effective in classifying and predicting HCC. Furthermore, the gene analysis further showed that the biomarkers screened by our method were strongly associated with the development of HCC.
Conclusion: In this paper, we propose the GEDNN method to assess the importance of genes for more accurate identification of cancer biomarkers, which facilitates the effective classification of cancers. The proposed method is applied to multi-omics data of HCC, including RNASeq, DNAMeth and CNV, considering the complementary information between different types of data. We construct a gene graph by Pearson correlation coefficients as additional information for DFN, thus reducing the importance score of redundant genes. In addition, the proposed method also incorporates back-propagation of CNN to further obtain the importance of features.