A Time-Frequency Attention Module for Neural Speech Enhancement

Loading...
Thumbnail Image
File version

Version of Record (VoR)

Author(s)
Zhang, Qiquan
Qian, Xinyuan
Ni, Zhaoheng
Nicolson, Aaron
Ambikairajah, Eliathamby
Li, Haizhou
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)
Date
2023
Size
File type(s)
Location
Abstract

Speech enhancement plays an essential role in a wide range of speech processing applications. Recent studies on speech enhancement tend to investigate how to effectively capture the long-term contextual dependencies of speech signals to boost performance. However, these studies generally neglect the time-frequency (T-F) distribution information of speech spectral components, which is equally important for speech enhancement. In this paper, we propose a simple yet very effective network module, which we term the T-F attention (TFA) module, that uses two parallel attention branches, i.e., time-frame attention and frequency-channel attention, to explicitly exploit position information to generate a 2-D attention map to characterise the salient T-F speech distribution. We validate our TFA module as part of two widely used backbone networks (residual temporal convolution network and Transformer) and conduct speech enhancement with four most popular training objectives. Our extensive experiments demonstrate that our proposed TFA module consistently leads to substantial enhancement performance improvements in terms of the five most widely used objective metrics, with negligible parameter overheads. In addition, we further evaluate the efficacy of speech enhancement as a front-end for a downstream speech recognition task. Our evaluation results show that the TFA module significantly improves the robustness of the system to noisy conditions.

Journal Title

IEEE/ACM Transactions on Audio, Speech, and Language Processing

Conference Title
Book Title
Edition
Volume

31

Issue
Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/

Item Access Status
Note
Access the data
Related item(s)
Subject

Neural engineering

Speech recognition

Science & Technology

Technology

Acoustics

Engineering, Electrical & Electronic

Engineering

Persistent link to this record
Citation

Zhang, Q; Qian, X; Ni, Z; Nicolson, A; Ambikairajah, E; Li, H, A Time-Frequency Attention Module for Neural Speech Enhancement, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2023, 31, pp. 462-475

Collections