Bidirectional Long-Short Term Memory Network-based Estimation of Reliable Spectral Component Locations
View/ Open
File version
Version of Record (VoR)
Author(s)
Nicolson, Aaron
Paliwal, Kuldip K
Griffith University Author(s)
Year published
2018
Metadata
Show full item recordAbstract
An accurate Ideal Binary Mask (IBM) estimate is essential for Missing Feature Theory (MFT)-based speaker identifica-tion, as incorrectly labelled spectral components (where a com-ponent is either reliable or unreliable) will degrade the perfor-mance of an Automatic Speaker Identification (ASI) system ad-versely in the presence of noise. In this work a Bidirectional Re-current Neural Network (BRNN) with Long-Short Term Mem-ory (LSTM) cells is proposed for improved IBM estimation. The proposed system had an average IBM estimate accuracy improvement of 4.5% and an average MFT-based speaker iden-tification accuracy improvement ...
View more >An accurate Ideal Binary Mask (IBM) estimate is essential for Missing Feature Theory (MFT)-based speaker identifica-tion, as incorrectly labelled spectral components (where a com-ponent is either reliable or unreliable) will degrade the perfor-mance of an Automatic Speaker Identification (ASI) system ad-versely in the presence of noise. In this work a Bidirectional Re-current Neural Network (BRNN) with Long-Short Term Mem-ory (LSTM) cells is proposed for improved IBM estimation. The proposed system had an average IBM estimate accuracy improvement of 4.5% and an average MFT-based speaker iden-tification accuracy improvement of 3.1% over all tested SNRdB levels, when compared to the previously proposed Multilayer Perceptron (MLP)-IBM estimator. When used for speech en-hancement the proposed system had an average MOS-LQO (ob-jective quality measure) improvement of 0.32 and an average QSTI (objective intelligibility measure) improvement of 0.01 over all tested SNRdB levels, when compared to the MLP-IBM estimator. The results presented in this work highlight the effec-tiveness of the proposed BRNN-IBM estimator for MFT-based speaker identification and IBM-based speech enhancement.
View less >
View more >An accurate Ideal Binary Mask (IBM) estimate is essential for Missing Feature Theory (MFT)-based speaker identifica-tion, as incorrectly labelled spectral components (where a com-ponent is either reliable or unreliable) will degrade the perfor-mance of an Automatic Speaker Identification (ASI) system ad-versely in the presence of noise. In this work a Bidirectional Re-current Neural Network (BRNN) with Long-Short Term Mem-ory (LSTM) cells is proposed for improved IBM estimation. The proposed system had an average IBM estimate accuracy improvement of 4.5% and an average MFT-based speaker iden-tification accuracy improvement of 3.1% over all tested SNRdB levels, when compared to the previously proposed Multilayer Perceptron (MLP)-IBM estimator. When used for speech en-hancement the proposed system had an average MOS-LQO (ob-jective quality measure) improvement of 0.32 and an average QSTI (objective intelligibility measure) improvement of 0.01 over all tested SNRdB levels, when compared to the MLP-IBM estimator. The results presented in this work highlight the effec-tiveness of the proposed BRNN-IBM estimator for MFT-based speaker identification and IBM-based speech enhancement.
View less >
Conference Title
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6
Volume
2018-September
Copyright Statement
© 2018 ISCA and the Author(s). The attached file is reproduced here in accordance with the copyright policy of the publisher. For information about this conference please refer to the conference’s website or contact the author(s).
Subject
Natural language processing