Background: Feature selection methods have been commonly used in differential expression
analysis. The selected genes can serve as potential biomarkers, and play important roles in disease diagnosis
and prognosis. Recently, many studies have shown that an efficient way to enhance the performance
of feature selection is incorporating data properties, such as the correlation between instances or attributes
in heterogeneous data. Gene expression data is a typical kind of linked data, in which genes are related by
co-regulation, and samples are groups by similar disease status. However, most of the analysis approaches
for gene expression data are designed for generic data, without consideration of data characteristics.
Objective: In this paper, we aim to identify miRNA biomarkers by using feature selection methods. Benefitting
from the abundant mRNA-miRNA parallel expression data, mining the linked data can provide
valuable information for feature selection and biomarker identification.
Method: Using mRNA-miRNA paired data, we infer connections between data samples by mRNA expression
levels, and incorporate the link information into a graph regularization method to achieve feature
selection for miRNAs.
Results: The experiments were conducted on three public miRNA-mRNA microarray data sets. The new
method greatly reduces feature dimensionality, and achieves high classification accuracy. Experimental
comparisons show that it outperforms the classic regularization methods and state-of-the-art feature selection
Conclusion: Taking data properties into consideration has been demonstrated as an effective way to improve
the performance of feature selection. Specifically, link information in gene expression data provides
useful hints to design structured regularization and assists biomarker identification.