Background: Many applications in voice processing have high inherent parallelism. Field
programmable gate array (FPGA) has shown very high performance in spite of its low operational
frequency by fully extracting the parallelism. Nevertheless, recent CPU and graphic processing unit
(GPU) have also an inherent for high performance.
Methods: In fact, it becomes possible to utilize the parallelism using multi-cores, which support
improved single instruction multiple data (SIMD) instruction. Recent GPUs support a large number of
cores, and have a potential for high performance in many applications. Our goals are at first to compare
GPU and FPGA implementation of the linear prediction coding (LPC) algorithm, in order to understand
the trade-off between the flexibility but relatively low speed of an FPGA and the high speed and fixed
architecture of the GPU. Secondly, we try to apply various levels optimization from overlapping data
transfers to fine-tuning operation sequences.
Results: The experimental results highlight the relative strengths and limitations of the two systems.
Conclusion: Our experiments show that, for several samples corresponding to several speeches coding,
GPU manages speedups of up to 3x compared to the FPGA and around 35x compared to a sequential