Single-ended Speech Quality Evaluation Using Linear Combination of the Quality Score Estimates of Multi-instances Features

Author(s): Rajesh Kumar Dubey*, Arun Kumar.

Journal Name: Recent Advances in Electrical & Electronic Engineering
Formerly Recent Patents on Electrical & Electronic Engineering

Volume 12 , Issue 5 , 2019

Become EABM
Become Reviewer

Graphical Abstract:


Abstract:

Background: In a single-ended speech quality evaluation, the measurement of Mean Opinion Score (MOS) is done objectively without the use of clean speech as a reference. In this work, multiple time-instances or multi-instances features using Multi-Resolution Auditory Model (MRAM) along with other relevant features such as Mel-Frequency Cepstral Coefficients (MFCC), and Line Spectral Frequencies (LSF) features are used for single-ended speech quality measurement. The Voice Activity Detection (VAD) algorithm separates the presence of speech-regions from silence in a speech signal.

Methods: The multi-instances features are computed using MRAM, MFCC and LSF for different combinations of speech-regions to capture degradations due to multiple time-localized effects or the attacks of short-time transient distortions such as impulsive noise and their distinctions from plosive sounds in speech. These multi-instances features are used for Gaussian Mixture Model (GMM) MAPPING to compute the objective MOS values, corresponding to all multi-instances features.

Results: The overall objective MOS estimation of the speech signal is calculated by averaging all the values of objective MOS corresponding to the different multi-instances features of a speech signal. The results in terms of Pearson’s Correlation Coefficients (PCC) and root mean square error (RMSE) between the subjective MOS and the estimated overall objective MOS of speech signals are computed and compared with International Telecommunication Union-telephony (ITU-T) Recommendation P.563 and recently published works on similar types of databases.

Conclusion: The improved values of PCC and RMSE between the subjective and the estimated overall objective MOS show the efficacy of the approach.

Keywords: Auditory model, degraded speech, multiple time-instances features, multi-resolution, non-intrusive, speech quality.

Rights & PermissionsPrintExport Cite as

Article Details

VOLUME: 12
ISSUE: 5
Year: 2019
Page: [464 - 474]
Pages: 11
DOI: 10.2174/2352096511666180917100208
Price: $58

Article Metrics

PDF: 6