Temporal convolutional network with frequency dimension adaptive attention for speech enhancement
File version
Author(s)
Song, Q
Nicolson, A
Lan, T
Li, H
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)
Date
Size
File type(s)
Location
Brno, Czechia
License
Abstract
Despite much progress, most temporal convolutional networks (TCN) based speech enhancement models are mainly focused on modeling the long-term temporal contextual dependencies of speech frames, without taking into account the distribution information of speech signal in frequency dimension. In this study, we propose a frequency dimension adaptive attention (FAA) mechanism to improve TCNs, which guides the model selectively emphasize the frequency-wise features with important speech information and also improves the representation capability of network. Our extensive experimental investigation demonstrates that the proposed FAA mechanism is able to consistently provide significant improvements in terms of speech quality (PESQ), intelligibility (STOI) and three other composite metrics. More promisingly, it has better generalization ability to real-world noisy environment.
Journal Title
Conference Title
Interspeech 2021
Book Title
Edition
Volume
4
Issue
Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement
Item Access Status
Note
Access the data
Related item(s)
Subject
Information systems
temporal convolutional networks
frequency dimension adaptive
representation capability
intelligibility (STOI)
Persistent link to this record
Citation
Zhang, Q; Song, Q; Nicolson, A; Lan, T; Li, H, Temporal convolutional network with frequency dimension adaptive attention for speech enhancement, Proceedings of the Annual Conference of the International Speech Communication Association, Interspeech, 2021, 4, pp. 2763-2767