Background: Approximate string matching algorithms are widely used in bioinformatics, among
which the bit-parallel Myers algorithm is a popular approach to compute the Levenshtein distance between
two genome sequences. The bit vector encoding of the Myers algorithm makes it feasible to extend to
modern parallel architectures with wider-than-ever vector registers and many cores.
Objective: Myers algorithm has already been integrated into some NGS all mappers such as RazerS and
GEM for the verification stage. Due to the huge number of NGS reads to be processed, it is demanded to
accelerate the bit-parallel Myers algorithm for higher throughput of NGS all mappers. In this paper, we aim
to design an ultra-fast implementation of Myers algorithm on Intel Xeon Phi based architectures, including
KNL-based processors and KNC-based co-processors.
Method: We designed a two-level framework to fully exploit the computing power of Xeon Phi based
many-core architectures. At the coarse-grained thread level, we used multi-threading to invoke many cores.
At the fine-grained VPU level, we proposed a novel vectorized computing method for the Myers algorithm.
The in-depth analysis for memory access leads to a more cache friendly searching strategy.
Results: Performance evaluation revealed that MyPhi achieved a peak performance of 1.03 and 1.62
TCUPS (Trillion Cell Updates per Second) on KNC-based Xeon Phi 7110 co-processor and KNL-based
Xeon Phi 7210 processor, respectively, which outperformed a multi-threaded scalar implementation on dual
six-core CPUs by an average speed up of 8.95 and 14.08.
Conclusion: We presented the MyPhi to compute the Levenshtein distance between the two strings
efficiently on Xeon Phi based architectures. Performance evaluation has shown good speedups over other
CPU-based and accelerator-enabled works as well as good scalability. MyPhi can be further used as building
blocks for short read aligners, clustering algorithms and potentially other sequence aligning tools.