The correct identification of differentially expressed genes is a key concept of many areas of genetic studies. Since 1990s, many different approaches, methods, algorithms and statistics tools have been developed to analyze gene expression levels of thousands of genes.
However, due to the growing complexity of managing, processing and interpreting sequencing data in order to obtain reliable results, there is no consensus about the most appropriate protocols and tools for the identification of differentially expressed genes, starting from RNA-Seq data.
Thus, we propose an integrated and comprehensive approach that combines the most used algorithms for DEG analysis, starting from the raw count data table. The proposed method consists of three main steps: 1) preliminary data analysis and visualization; 2) differential gene expression analysis, using Bioconductor packages (DESeq2, edgeR, Limma, SAMSeq, TweeDESeq) and standard ANOVA (ez and afex packages); 3) integration of results, using two main graphical outputs, through SuperExactTest, UpSetR plots and ComplexHeatmaps packages.
In this way, a more robust output could be obtained in a simple manner, and with no previous bioinformatic knowledge.
Keywords: Clustering comparison, Combination-based procedure, Concordance analysis, Differential Expression Analysis, Intersections, Integration of results, Normalization, Overlap proportion, Parametric and non- parametric, Performance evaluation, p-values and Π score, RNA-Seq Data, ROC, Sensitivity, Simulations, Specificity, SuperExactTest, Tools comparison, UpSetR and ComplexHeatmaps, Validity of DEG tools.