Background: A central problem of systems biology is the reconstruction of the topology of gene regulatory networks (GRNs) using high throughput genomic data like microarray gene expression data. The main challenge in gene expression data is that the number of genes is high, number of samples is low, and the data are often impregnated with noise.
Objective: In this paper, we present a method for Gene Regulatory Network Inference using Rotation Forest (GENIRF).
Methods: The rotation forest will exploit the embedded variable ranking mechanism of tree-based ensemble methods and dimension reduction. This feature solves the main challenge in gene expression data. GENIRF decomposes the prediction of a gene regulatory network between p genes into p different regression problems. Each regression problem is constructed with a transformed expression pattern and rotation forest. The expression pattern of the target gene is predicted from the expression patterns of all the remaining genes, using rotation forest.
Results: GENIRF does not make any hypotheses regarding the nature of gene regulation, so it can identify combinatorial and non-linear interactions in GRN. Experimental results on the DREAM4 in silico multifactorial challenge simulated data indicate that GENIRF has better accuracy and compares favorably with existing well known algorithms. Furthermore, it is a fast and scalable method.
Conclusion: GENIRF shows high performance across this benchmark with different performance metrics and the overall score of GENIRF is slightly better than other method. We have also shown that the dimension reduction of the gene expression data can further improve the performance of GENIRF and other methods. In addition, GENIRF is competitive in terms of computational efficiency, especially with ensemble methods and for big data, our method can be easily parallelized.