Masked multi-head self-attention for causal speech enhancement

No Thumbnail Available
File version
Author(s)
Nicolson, Aaron
Paliwal, Kuldip K
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)
Date
2020
Size
File type(s)
Location
License
Abstract

Accurately modelling the long-term dependencies of noisy speech is critical to the performance of a speech enhancement system. Current deep learning approaches to speech enhancement employ either a recurrent neural network (RNN) or a temporal convolutional network (TCN). However, RNNs and TCNs both demonstrate deficiencies when modelling long-term dependencies. Enter multi-head attention (MHA) — a mechanism that has outperformed both RNNs and TCNs in tasks such as machine translation. By using sequence similarity, MHA possesses the ability to more efficiently model long-term dependencies. Moreover, masking can be employed to ensure that the MHA mechanism remains causal — an attribute critical for real-time processing. Motivated by these points, we investigate a deep neural network (DNN) that utilises masked MHA for causal speech enhancement. The conditions used to evaluate the proposed DNN include real-world non-stationary and coloured noise sources at multiple SNR levels. Our extensive experimental investigation demonstrates that the proposed DNN can produce enhanced speech at a higher quality and intelligibility than both RNNs and TCNs. We conclude that deep learning approaches employing masked MHA are more suited for causal speech enhancement than RNNs and TCNs. Availability—MHANet is available at https://github.com/anicolson/DeepXi

Journal Title

Speech Communication

Conference Title
Book Title
Edition
Volume

125

Issue
Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement
Item Access Status
Note
Access the data
Related item(s)
Subject

Artificial intelligence

Cognitive and computational psychology

Linguistics

Science & Technology

Technology

Acoustics

Computer Science, Interdisciplinary Applications

Computer Science

Persistent link to this record
Citation

Nicolson, A; Paliwal, KK, Masked multi-head self-attention for causal speech enhancement, Speech Communication, 2020, 125, pp. 80-96

Collections