Background: Approximate string matching algorithms are widely used in bioinformatics, among which the bit-parallel Myers algorithm is a popular approach to compute the Levenshtein distance between two genome sequences. The bit vector encoding of the Myers algorithm makes it feasible to extend to modern parallel architectures with wider-than-ever vector registers and many cores.
Objective: Myers algorithm has already been integrated into some NGS all mappers such as RazerS and GEM for the verification stage. Due to the huge number of NGS reads to be processed, it is demanded to accelerate the bit-parallel Myers algorithm for higher throughput of NGS all mappers. In this paper, we aim to design an ultra-fast implementation of Myers algorithm on Intel Xeon Phi based architectures, including KNL-based processors and KNC-based co-processors.
Method: We designed a two-level framework to fully exploit the computing power of Xeon Phi based many-core architectures. At the coarse-grained thread level, we used multi-threading to invoke many cores. At the fine-grained VPU level, we proposed a novel vectorized computing method for the Myers algorithm. The in-depth analysis for memory access leads to a more cache friendly searching strategy.
Results: Performance evaluation revealed that MyPhi achieved a peak performance of 1.03 and 1.62 TCUPS (Trillion Cell Updates per Second) on KNC-based Xeon Phi 7110 co-processor and KNL-based Xeon Phi 7210 processor, respectively, which outperformed a multi-threaded scalar implementation on dual six-core CPUs by an average speed up of 8.95 and 14.08.
Conclusion: We presented the MyPhi to compute the Levenshtein distance between the two strings efficiently on Xeon Phi based architectures. Performance evaluation has shown good speedups over other CPU-based and accelerator-enabled works as well as good scalability. MyPhi can be further used as building blocks for short read aligners, clustering algorithms and potentially other sequence aligning tools.