A Deep Learning-Based Kalman Filter for Speech Enhancement
File version
Version of Record (VoR)
Author(s)
Nicolson, Aaron
Paliwal, Kuldip K
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)
Date
Size
File type(s)
Location
Shanghai, China
License
Abstract
The existing Kalman filter (KF) suffers from poor estimates of the noise variance and the linear prediction coefficients (LPCs) in real-world noise conditions. This results in a degraded speech enhancement performance. In this paper, a deep learning approach is used to more accurately estimate the noise variance and LPCs, enabling the KF to enhance speech in various noise conditions. Specifically, a deep learning approach to MMSE-based noise power spectral density (PSD) estimation, called DeepMMSE, is used. The estimated noise PSD is used to compute the noise variance. We also construct a whitening filter with its coefficients computed from the estimated noise PSD. It is then applied to the noisy speech, yielding pre-whitened speech for computing the LPCs. The improved noise variance and LPC estimates enable the KF to minimise the residual noise and distortion in the enhanced speech. Experimental results show that the proposed method exhibits higher quality and intelligibility in the enhanced speech than the benchmark methods in various noise conditions for a wide-range of SNR levels.
Journal Title
Conference Title
Interspeech 2020
Book Title
Edition
Volume
Issue
Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement
© 2020 ISCA and the Author(s). The attached file is reproduced here in accordance with the copyright policy of the publisher. For information about this conference please refer to the conference’s website or contact the author(s).
Item Access Status
Note
Access the data
Related item(s)
Subject
Deep learning
Neural networks
Persistent link to this record
Citation
Roy, SK; Nicolson, A; Paliwal, KK, A Deep Learning-Based Kalman Filter for Speech Enhancement, Interspeech 2020, 2020