Temporal convolutional network with frequency dimension adaptive attention for speech enhancement

No Thumbnail Available
File version
Author(s)
Zhang, Q
Song, Q
Nicolson, A
Lan, T
Li, H
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)
Date
2021
Size
File type(s)
Location

Brno, Czechia

License
Abstract

Despite much progress, most temporal convolutional networks (TCN) based speech enhancement models are mainly focused on modeling the long-term temporal contextual dependencies of speech frames, without taking into account the distribution information of speech signal in frequency dimension. In this study, we propose a frequency dimension adaptive attention (FAA) mechanism to improve TCNs, which guides the model selectively emphasize the frequency-wise features with important speech information and also improves the representation capability of network. Our extensive experimental investigation demonstrates that the proposed FAA mechanism is able to consistently provide significant improvements in terms of speech quality (PESQ), intelligibility (STOI) and three other composite metrics. More promisingly, it has better generalization ability to real-world noisy environment.

Journal Title
Conference Title

Interspeech 2021

Book Title
Edition
Volume

4

Issue
Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement
Item Access Status
Note
Access the data
Related item(s)
Subject

Information systems

temporal convolutional networks

frequency dimension adaptive

representation capability

intelligibility (STOI)

Persistent link to this record
Citation

Zhang, Q; Song, Q; Nicolson, A; Lan, T; Li, H, Temporal convolutional network with frequency dimension adaptive attention for speech enhancement, Proceedings of the Annual Conference of the International Speech Communication Association, Interspeech, 2021, 4, pp. 2763-2767