Reviewing Feature Non-Linear Transformations for Robust Speech Recognition

Author(s): Luz Garcia, Jose Carlos Segura and Angel de la Torre

Pp: 190-196 (7)

* (Excluding Mailing and Handling)

Abstract

The aim of Robust Speech Recognition is to reduce as much as possible the environmental mismatch between the training and test conditions in order to optimally use the acoustic models in the recognition process. There are several factors producing such mismatch: inter-speaker variability, intra-speaker variability, and changes in the speaker environment or in the channel characteristics. The changes in the environment represent a challenging area of work and constitute one of the main driving forces of research in voice processing, that nowadays faces application scenarios like mobile phones, moving cars, spontaneous speech, speech masked by other speech, speech masked by music or non-stationary noises. The different strategies that fight the effects of additive noise in the voice signal and the recognition process will be summarized in this review, focusing in the normalization techniques and particularly in the non linear transformations of the MFCC features. Histogram Equalization and Parametric Histogram Equalization with their variants and evolutions will be analyzed as main representatives of this family of non-linear feature transformations.

Keywords: robust speech recognition, feature normalization, histogram equalization, parametric equalization, smoothing filters, temporal information.

Cite as