Single-ended Speech Quality Evaluation Using Linear Combination of the Quality Score Estimates of Multi-instances Features

Rajesh	   Kumar   Dubey; Arun	      Kumar

Abstract

Background: In a single-ended speech quality evaluation, the measurement of Mean Opinion Score (MOS) is done objectively without the use of clean speech as a reference. In this work, multiple time-instances or multi-instances features using Multi-Resolution Auditory Model (MRAM) along with other relevant features such as Mel-Frequency Cepstral Coefficients (MFCC), and Line Spectral Frequencies (LSF) features are used for single-ended speech quality measurement. The Voice Activity Detection (VAD) algorithm separates the presence of speech-regions from silence in a speech signal.

Methods: The multi-instances features are computed using MRAM, MFCC and LSF for different combinations of speech-regions to capture degradations due to multiple time-localized effects or the attacks of short-time transient distortions such as impulsive noise and their distinctions from plosive sounds in speech. These multi-instances features are used for Gaussian Mixture Model (GMM) MAPPING to compute the objective MOS values, corresponding to all multi-instances features.

Results: The overall objective MOS estimation of the speech signal is calculated by averaging all the values of objective MOS corresponding to the different multi-instances features of a speech signal. The results in terms of Pearson’s Correlation Coefficients (PCC) and root mean square error (RMSE) between the subjective MOS and the estimated overall objective MOS of speech signals are computed and compared with International Telecommunication Union-telephony (ITU-T) Recommendation P.563 and recently published works on similar types of databases.

Conclusion: The improved values of PCC and RMSE between the subjective and the estimated overall objective MOS show the efficacy of the approach.

Keywords: Auditory model, degraded speech, multiple time-instances features, multi-resolution, non-intrusive, speech quality.

« Previous

Graphical Abstract

[1] 
M. Werner, T. Junge,  and P. Vary, "Quality control for AMR
speech channels in GSM networks In:", Proceedings of the IEEE
International Conference on Acoustic., Speech and Signal
Processing. Montreal, Quebec, Canada, vol. 3, 2004, pp. 1076-1079.
[2] 
 ITU-T Recommendation P. 800. Methods for subjective
determination of transmission quality, Aug. 30, 1996.
[3] 
L. Malfait, J. Berger,  and M. Kastner, "P.563-The ITU-T standard for single-ended speech quality assessment", IEEE Trans. Audio Speech Lang. Process., vol. 14, no. 6, pp. 1924-1934, 2006.
[4] 
 ITU-T Recommendation P.563. Single ended method for objective
speech quality assessment in narrow-band telephony applications,
May, 2004.
[5] 
https://www.itu.int/itu-t/workprog/wp_item.aspx?isn=13743 [accessed].
[6] 
R.K. Dubey,  and A. Kumar, "Non-intrusive speech quality assessment
using several combinations of auditory features", Int. J. Speech
Technol. Springer, vol. 16, no. 1, pp. 89-101, 2013.
[7] 
V. Grancharov, D.Y. Jhao, J. Lindblom,  and W.B. Kleijn, "Low-complexity, non-intrusive speech quality assessment", IEEE Trans. Audio Speech Lang. Process., vol. 14, no. 6, pp. 1948-1956, 2006.
[8] 
D.S. Kim, "ANIQUE: An auditory model for single ended speech quality estimation", IEEE Trans. Audio Speech Lang. Process., vol. 13, no. 5, pp. 821-831, 2005.
[9] 
R.K. Dubey,  and A. Kumar, "Multiple time-instances features of degraded speech for single ended quality measurement", J. Adv. Electric. Electron. Eng., vol. 15, no. 3, pp. 400-407, 2017.
[10] 
R.F. Lyon, "A computational model of filtering, detection, and
compression in the cochlea In:", Proceedings of the IEEE International
Conference on Acoustics, Speech and Signal Processing. California, USA, 1982, pp. 1282-1285.
[11] 
L.R. Rabiner,  and M.R. Sambur, "Voiced-unvoiced-silence detection
using the Itakura LPC distance measure In:", Proceedings of
the IEEE International Conference on Acoustics, Speech and Signal
Processing. New Jersey, USA, vol. 2, 1977, pp. 323-326.
[12] 
M. Narwaria,  and W. Lin, "I.V. McLoughlin, S. Emmanuel and L.T. Chia, “Nonintrusive quality assessment of noise suppressed speech with mel-filtered energies and support vector regression", IEEE Trans. Audio Speech Lang. Process., vol. 20, no. 4, pp. 1217-12322012, .
[13] 
N.H. Soni,  and H.A. Patil, "Novel deep auto encoder features for
non-intrusive speech quality assessment In:", Proceedings of the
24th European Signal Processing Conference (EUSIPCO). Budapest,
Hungary, 2016, pp. 2315-2319.
[14] 
Q. Li, Y. Fang, W. Lin,  and D. Thalmann, "Non-intrusive quality
assessment for enhanced speech signals based on spectro-temporal
features In:", Proceedings of the International Conference on Multimedia
and Expo Workshops (ICMEW). Chengdu, China, 2014,
pp. 1-6.
[15] 
M.R. Islam, M.A. Rahman, M.N. Hasan, A.N.M.S. Hossain, A.N. Uddin,  and M.A. Haque, "Non-intrusive objective evaluation of speech
quality in noisy condition, In:", Proceedings of the 9th International
Conference on Electrical and Computer Engineering (ICECE),. Dhaka, Bangladesh, 2016, pp. 586-589.
[16] 
Q. Li, W. Lin, Y. Fang,  and D. Thalmann, "Bag-of-words representation
for non-intrusive speech quality assessment In:", Proceedings
of the International Conference on Signal and Information Processing. Chengdu, China, 2015, pp. 616-619.
[17] 
A.K. Karmakar,  and R.K. Patney, "Design of optimal wavelet packet trees based on auditory perception criterion", IEEE Signal Process. Lett., vol. 14, no. 4, pp. 240-243, 2007.
[18] 
R.K. Dubey,  and A. Kumar, "Non-intrusive speech quality assessment using multi-resolution auditory model features for degraded narrowband speech", IET Sig. Prosess., vol. 9, no. 9, pp. 638-646, 2015.
[19] 
A.K. Karmakar,  and R.K. Patney, "A multiresolution model of auditory excitation pattern and its application to objective evaluation of perceived speech quality", IEEE Trans. Audio Speech Lang. Process., vol. 14, no. 6, pp. 912-1923, 2006.
[20] 
R.K. Dubey,  and A. Kumar, "Non-intrusive objective speech quality
evaluation using multiple time-scale estimates of multi-resolution
auditory model (MRAM) features In:", Proceedings of the International
conference CIPECH-16. Ghaziabad, India, 2016, pp. 249-
253.
[21] 
M.R. Schroeder, "Optimizing digital speech coders by exploiting masking properties of the human ear", J. Acoust. Soc. Am., vol. 66, no. 6, pp. 1647-1652, 1979.
[22] 
B.C.J. Moore,  An introduction to the psychology of hearing”, 4th
ed. London: Elsevier, Academic Press, 1997.
[23] 
W. Han, C.F. Chan, C.S. Choy,  and K.P. Pun, "An efficient MFCC
extraction method in speech recognition In:", Proceedings of the
IEEE International Symposium on Circuits and Systems. Island of
Kos, Greece, 2006, pp. 145-148.
[24] 
M.R. Hasan, M. Jamil, G. Rabbani,  and M.S. Rahman, "Speaker
identification using mel-frequency cepstral coefficient In:", Proceedings
of the 3rd International Conference on Electrical &
Computer Engineering. Dhaka, Bangladesh, 2004, pp. 565-568.
[25] 
B.J. Lee, S. Kim,  and H.G. Kang, "Speaker recognition based on
transformed line spectral frequencies In:", Proceedings of the
Intelligent Signal Processing and Communication Systems. Seoul,
South Korea, South Korea, pp. 177-180.
[26] 
 ITU-T Recommendation P. Supplement-23, ITU-T Coded-Speech
Database, Feb. 1998.
[27] 
Y. Hu,  and P.C. Loizou, "Subjective comparison and evaluation of speech enhancement algorithms", J. Speech Communications, Elsevier, vol. 49, pp. 588-601, 2007.
[28] 
http://www.utdallas.edu/~loizou/speech/noizeus [Accessed Feb.
2009].
[29] 
R.K. Dubey,  and A. Kumar, "Comparison of subjective and objective
speech quality assessment for different degradations/noise
conditions In:", Proceedings of the International Conference on
Signal Processing and Communication (ICSC),. Noida, India, 2015,
pp. 261-266.
[30] 
A.P. Dempster, N.M. Laird,  and D.B. Rubin, "Maximum likelihood from incomplete data via the EM algorithm", J. Royal Statistical Society, Series B (Methodological),, vol. 39, no. 1, pp. 1-38, 1977.
[31] 
J.H. Steiger, "Tests for comparing elements of a correlation matrix", Psychol. Bull., vol. 87, pp. 245-251, 1980.
[32] 
M. Hoerger, "ZH: An updated version of Steiger's Z and web-based
calculator for testing the statistical significance of the difference
between dependent correlations", 2013 URL:, http://www. psychmike.com/dependent_correlations.php

Rights & Permissions Print Cite

Call for Papers in Thematic Issues

Submission closes on : 01 September, 2024

Advancements in Brain Tumor Detection and Treatment using Machine Learning, Deep Learning, and Blockchain Technology

Brain tumors are among the most challenging diseases to diagnose and treat, requiring specialized expertise and advanced technology for accurate detection and effective treatment. In recent years, machine learning and deep learning algorithms have shown promise in improving the accuracy and efficiency of brain tumor detection through medical imaging analysis. ...read more

Guest Editor(s): sandeep kumar

Submission closes on : 19 August, 2024

Advances of Biometric Data Analysis to Boost Intelligent Applications

With the rapid development of artificial intelligence (AI) technology, more and more intelligent devices and applications are appearing in our daily lives, such as smart home, smart agriculture, health diagnosis, educational support, and environmental monitoring. Interaction with these devices or applications has become a pressing issue that can greatly improve ...read more

Guest Editor(s): Dr. Fa Zhu

Submission closes on : 13 August, 2024

Attainment of SDGs through the Advancement in Solar Energy Systems

With less than a decade until we reach 2030, it is crucial to address the deep inequalities affecting not only our health but also our quality of life, and the economy of countries worldwide. Few of the UN's Sustainable Development Goals (SDGs) can be directly and indirectly achieved through the ...read more

Guest Editor(s): Praveen Kumar B

Submission closes on : 14 October, 2024

Computational Intelligence theory and practices

Computational intelligence represents the extraordinary capacity of the human intellect to assert and understand in an environment of uncertainty and imprecision. Computational intelligence is new-age multidisciplinary branch of artificial intelligence which can reduce the human work. The main goal of computational intelligence is to develop intelligent systems to solve real-world ...read more

Guest Editor(s): Dr. Vikash Yadav

More

Related Journals

Recent Patents on Engineering

Related Books

Solid State & Microelectronics Technology

Voltammetry for Sensing Applications

Article Metrics

14

2

Journal Information

For Authors

For Editors

For Reviewers

Explore Articles

Open Access

Open Access Articles

For Visitors

DOI https://dx.doi.org/10.2174/2352096511666180917100208	Print ISSN 2352-0965
Publisher Name Bentham Science Publisher	Online ISSN 2352-0973

Recent Advances in Electrical & Electronic Engineering