μ-law SGAN for generating spectra with more details in speech enhancement

No Thumbnail Available
File version
Author(s)
Li, Hongfeng
Xu, Yanyan
Ke, Dengfeng
Su, Kaile
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)
Date
2020
Size
File type(s)
Location
License
Abstract

The goal of monaural speech enhancement is to separate clean speech from noisy speech. Recently, many studies have employed generative adversarial networks (GAN) to deal with monaural speech enhancement tasks. When using generative adversarial networks for this task, the output of the generator is a speech waveform or a spectrum, such as a magnitude spectrum, a mel-spectrum or a complex-valued spectrum. The spectra generated by current speech enhancement methods in the time-frequency domain usually lack details, such as consonants and harmonics with low energy. In this paper, we propose a new type of adversarial training framework for spectrum generation, named μ-law spectrum generative adversarial networks (μ-law SGAN). We introduce a trainable μ-law spectrum compression layer (USCL) into the proposed discriminator to compress the dynamic range of the spectrum. As a result, the compressed spectrum can display more detailed information. In addition, we use the spectrum transformed by USCL to regularize the generator's training, so that the generator can pay more attention to the details of the spectrum. Experimental results on the open dataset Voice Bank + DEMAND show that μ-law SGAN is an effective generative adversarial architecture for speech enhancement. Moreover, visual spectrogram analysis suggests that μ-law SGAN pays more attention to the enhancement of low energy harmonics and consonants.

Journal Title

Neural Networks

Conference Title
Book Title
Edition
Volume

136

Issue
Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement
Item Access Status
Note
Access the data
Related item(s)
Subject

Nanotechnology

-law SGAN

Deep neural networks

Generative adversarial networks

Signal processing

Speech enhancement

Persistent link to this record
Citation

Li, H; Xu, Y; Ke, D; Su, K, μ-law SGAN for generating spectra with more details in speech enhancement, Neural Networks, 2020, 136, pp. 17-27

Collections