Deep learning for minimum mean-square error approaches to speech enhancement

Loading...
Thumbnail Image
File version

Accepted Manuscript (AM)

Author(s)
Nicolson, Aaron
Paliwal, Kuldip K
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)
Date
2019
Size
File type(s)
Location
Abstract

Recently, the focus of speech enhancement research has shifted from minimum mean-square error (MMSE) approaches, like the MMSE short-time spectral amplitude (MMSE-STSA) estimator, to state-of-the-art masking- and mapping-based deep learning approaches. We aim to bridge the gap between these two differing speech enhancement approaches. Deep learning methods for MMSE approaches are investigated in this work, with the objective of producing intelligible enhanced speech at a high quality. Since the speech enhancement performance of an MMSE approach improves with the accuracy of the used a priori signal-to-noise ratio (SNR) estimator, a residual long short-term memory (ResLSTM) network is utilised here to accurately estimate the a priori SNR. MMSE approaches utilising the ResLSTM a priori SNR estimator are evaluated using subjective and objective measures of speech quality and intelligibility. The tested conditions include real-world non-stationary and coloured noise sources at multiple SNR levels. MMSE approaches utilising the proposed a priori SNR estimator are able to achieve higher enhanced speech quality and intelligibility scores than recent masking- and mapping-based deep learning approaches. The results presented in this work show that the performance of an MMSE approach to speech enhancement significantly increases when utilising deep learning. Availability: The proposed a priori SNR estimator is available at: https://github.com/anicolson/DeepXi.

Journal Title

Speech Communication

Conference Title
Book Title
Edition
Volume

111

Issue
Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement

© 2019 Elsevier. Licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Licence (http://creativecommons.org/licenses/by-nc-nd/4.0/) which permits unrestricted, non-commercial use, distribution and reproduction in any medium, providing that the work is properly cited.

Item Access Status
Note
Access the data
Related item(s)
Subject

Artificial intelligence

Cognitive and computational psychology

Linguistics

Persistent link to this record
Citation
Collections