Background: Many bioinformatics pipelines are available nowadays
to analyze transcriptomics data produced by high-throughput RNA sequencing.
They implement different workflows that address several analysis tasks,
supported by the use of third party programs. Nevertheless, a proper
workflow definition for RNA-seq data analysis is still lacking.
Objective: To proper define what a comprehensive RNA-seq data analysis
workflow should be. Compare all available pipelines and, if such a solution
is not available, implement a new pipeline.
Method: We have developed a new pipeline integrating state-of-the art
programs for different parts of the RNA-seq analysis. We also have used
RUbioSeq libraries to achieve a scalable solution.
Results: We have defined a comprehensive RNA-seq data analysis workflow,
comprising the most common needs demanded by biologists and implemented it
in a new pipeline, nextpresso. We also validate it in two case studies
Conclusion: Nexpresso is a new, freely available, pipeline covering the most
common needs of RNA-seq data analysis. It is easy to configure, generates
user friendly results and scales well for larger studies comprising a high
number of samples.
Keywords: RNA-seq, NGS, transcriptomics, gene expression, pipeline, spike-in
controls, parallelized execution
Rights & PermissionsPrintExport