Background: Error Correction is an important task in the analysis and manipulations of NGS
data. The purpose of error correction is to facilitate data analysis for large projects like de novo
assembly project. Here we present a new hybrid algorithm for error correction of long reads using short
reads. Our algorithm can be flexibly adapted to different types of errors. Next, we make a de novo
assembly for corrected long reads.
Objective: We present MiRCA (MinIon Reads Correction Algorithm) a hybrid approach based on the
sequences alignments that detects and corrects errors for MinIon long reads using Illumina short reads.
Methods: In our approach, we operate in four steps. First, we make a Quality Control and Cleaning
data. Second, we use the contig forming for the Pre-Error Correction Step. Third, we use the alignment
to align pre-assembled contig to long reads and we use this alignment to correction erroneous long
reads. Finally, we do an assembly for the corrected long reads.
Results: The results of mapping of S.cerevisaeW303 and E.coli genomes shows that our error correction
approach produce a high quality long reads with mapping rate ~99% to the reference genome in
reasonable time. For denovo assembly, the corrected long reads gives good assembly in a short running
time compared to other error correction tools.
Conclusion: MiRCA is a new hybrid approach that detects and corrects errors. It uses an alignmentbased
approach using pre-assembled short reads as a reference to correct nanopore long reads. The
experimental evaluation of the corrected long reads on the reference genome of S. cerevisae and E.coli
shows that MiRCA ensures best error correction compared to existing related works.