DeepMMSE: A Deep Learning Approach to MMSE-Based Noise Power Spectral Density Estimation

Loading...
Thumbnail Image
File version
Accepted Manuscript (AM)
Author(s)
Zhang, Qiquan
Nicolson, Aaron
Wang, Mingjiang
Paliwal, Kuldip
Wang, Chenxu
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)
Date
2020
Size
File type(s)
Location
License
Abstract

An accurate noise power spectral density (PSD) tracker is an indispensable component of a single-channel speech enhancement system. Bayesian-motivated minimum mean-square error (MMSE)-based noise PSD estimators have been the most prominent in recent time. However, they lack the ability to track highly non-stationary noise sources due to current methods of a priori signal-to-noise (SNR) estimation. This is caused by the underlying assumption that the noise signal changes at a slower rate than the speech signal. As a result, MMSE-based noise PSD trackers exhibit a large tracking delay and produce noise PSD estimates that require bias compensation. Motivated by this, we propose an MMSE-based noise PSD tracker that employs a temporal convolutional network (TCN) a priori SNR estimator. The proposed noise PSD tracker, called DeepMMSE makes no assumptions about the characteristics of the noise or the speech, exhibits no tracking delay, and produces an accurate estimate that requires no bias correction. Our extensive experimental investigation shows that the proposed DeepMMSE method outperforms state-of-the-art noise PSD trackers and demonstrates the ability to track abrupt changes in the noise level. Furthermore, when employed in a speech enhancement framework, the proposed DeepMMSE method is able to outperform state-of-the-art noise PSD trackers, as well as multiple deep learning approaches to speech enhancement. Availability: DeepMMSE is available at: https://github.com/anicolson/DeepXi.

Journal Title
IEEE/ACM Transactions on Audio, Speech, and Language Processing
Conference Title
Book Title
Edition
Volume
28
Issue
Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement
© 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Item Access Status
Note
Access the data
Related item(s)
Subject
Electrical engineering
Electronics, sensors and digital hardware
Science & Technology
Acoustics
Engineering
Persistent link to this record
Citation
Zhang, Q; Nicolson, A; Wang, M; Paliwal, K; Wang, C, DeepMMSE: A Deep Learning Approach to MMSE-Based Noise Power Spectral Density Estimation, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2020, 28, pp. 1404-1415
Collections