• myGriffith
    • Staff portal
    • Contact Us⌄
      • Future student enquiries 1800 677 728
      • Current student enquiries 1800 154 055
      • International enquiries +61 7 3735 6425
      • General enquiries 07 3735 7111
      • Online enquiries
      • Staff phonebook
    View Item 
    •   Home
    • Griffith Research Online
    • Journal articles
    • View Item
    • Home
    • Griffith Research Online
    • Journal articles
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Browse

  • All of Griffith Research Online
    • Communities & Collections
    • Authors
    • By Issue Date
    • Titles
  • This Collection
    • Authors
    • By Issue Date
    • Titles
  • Statistics

  • Most Popular Items
  • Statistics by Country
  • Most Popular Authors
  • Support

  • Contact us
  • FAQs
  • Admin login

  • Login
  • Masked multi-head self-attention for causal speech enhancement

    Author(s)
    Nicolson, Aaron
    Paliwal, Kuldip K
    Griffith University Author(s)
    Paliwal, Kuldip K.
    Nicolson, Aaron M.
    Year published
    2020
    Metadata
    Show full item record
    Abstract
    Accurately modelling the long-term dependencies of noisy speech is critical to the performance of a speech enhancement system. Current deep learning approaches to speech enhancement employ either a recurrent neural network (RNN) or a temporal convolutional network (TCN). However, RNNs and TCNs both demonstrate deficiencies when modelling long-term dependencies. Enter multi-head attention (MHA) — a mechanism that has outperformed both RNNs and TCNs in tasks such as machine translation. By using sequence similarity, MHA possesses the ability to more efficiently model long-term dependencies. Moreover, masking can be employed ...
    View more >
    Accurately modelling the long-term dependencies of noisy speech is critical to the performance of a speech enhancement system. Current deep learning approaches to speech enhancement employ either a recurrent neural network (RNN) or a temporal convolutional network (TCN). However, RNNs and TCNs both demonstrate deficiencies when modelling long-term dependencies. Enter multi-head attention (MHA) — a mechanism that has outperformed both RNNs and TCNs in tasks such as machine translation. By using sequence similarity, MHA possesses the ability to more efficiently model long-term dependencies. Moreover, masking can be employed to ensure that the MHA mechanism remains causal — an attribute critical for real-time processing. Motivated by these points, we investigate a deep neural network (DNN) that utilises masked MHA for causal speech enhancement. The conditions used to evaluate the proposed DNN include real-world non-stationary and coloured noise sources at multiple SNR levels. Our extensive experimental investigation demonstrates that the proposed DNN can produce enhanced speech at a higher quality and intelligibility than both RNNs and TCNs. We conclude that deep learning approaches employing masked MHA are more suited for causal speech enhancement than RNNs and TCNs. Availability—MHANet is available at https://github.com/anicolson/DeepXi
    View less >
    Journal Title
    Speech Communication
    Volume
    125
    DOI
    https://doi.org/10.1016/j.specom.2020.10.004
    Subject
    Artificial intelligence
    Cognitive and computational psychology
    Linguistics
    Science & Technology
    Technology
    Acoustics
    Computer Science, Interdisciplinary Applications
    Computer Science
    Publication URI
    http://hdl.handle.net/10072/414499
    Collection
    • Journal articles

    Footer

    Disclaimer

    • Privacy policy
    • Copyright matters
    • CRICOS Provider - 00233E

    Tagline

    • Gold Coast
    • Logan
    • Brisbane - Queensland, Australia
    First Peoples of Australia
    • Aboriginal
    • Torres Strait Islander