Most efficient read mappers build a Ferragina-Manzini index of a genome sequence and
then process reads against it. In order to handle differences between reads and corresponding genome
fragments, approximate read occurrences are searched in the index. This technique is particularly
efficient for mapping reads of length ∼30bp with up to 2-3 errors, as first massive sequencers required.
However, within the last few years, in most popular sequencing technologies read length increased to
75 − 200bp. Since the number of required index queries is exponential with respect to the number of
errors, it is hard to maintain the allowed error rate within this method.
We propose a new approach that overcomes this problem. The main idea is to use the Ferragina-Manzini index to filter
potential approximate read occurrences. Filtering is based on the intermediate partitioning concept, i.e. reads are split into
parts, which are searched in index with reduced number of errors.
We implemented this method in Bmap program. Our experiments show that Bmap outperforms current methods in
efficiency without sacrificing mapping accuracy.