Abstract
Background: Many bioinformatics pipelines are available nowadays to analyze transcriptomics data produced by high-throughput RNA sequencing. They implement different workflows that address several analysis tasks, supported by the use of third party programs. Nevertheless, a proper workflow definition for RNA-seq data analysis is still lacking.
Objective: To proper define what a comprehensive RNA-seq data analysis workflow should be. Compare all available pipelines and, if such a solution is not available, implement a new pipeline.
Method: We have developed a new pipeline integrating state-of-the art programs for different parts of the RNA-seq analysis. We also have used RUbioSeq libraries to achieve a scalable solution.
Results: We have defined a comprehensive RNA-seq data analysis workflow, comprising the most common needs demanded by biologists and implemented it in a new pipeline, nextpresso. We also validate it in two case studies presented here.
Conclusion: Nexpresso is a new, freely available, pipeline covering the most common needs of RNA-seq data analysis. It is easy to configure, generates user friendly results and scales well for larger studies comprising a high number of samples.
Keywords: RNA-seq, next generation sequencing, transcriptomics, workflow, pipeline, spike-in controls, concurrent processing.