Multi-task learning deep neural networks for speech feature denoising
MetadataShow full item record
Traditional automatic speech recognition (ASR) systems usually get a sharp performance drop when noise presents in speech. To make a robust ASR, we introduce a new model using the multi-task learning deep neural networks (MTL-DNN) to solve the speech denoising task in feature level. In this model, the networks are initialized by pre-training restricted Boltzmann machines (RBM) and fine-tuned by jointly learning multiple interactive tasks using a shared representation. In multi-task learning, we choose a noisy-clean speech pair fitting task as the primary task and separately explore two constraints as the secondary tasks: phone label and phone cluster. In experiments, the denoised speech is reconstructed by the MTL-DNN using the noisy speech as input and it is respectively evaluated by the DNN-hidden Markov model (HMM) based and the Gaussian Mixture Model (GMM)-HMM based ASR systems. Results show that, using the denoised speech, the word error rate (WER) is respectively reduced by 53.14% and 34.84% compared with baselines. The MTL-DNN model also outperforms the general single-task learning deep neural networks (STL-DNN) model with a performance improvement of 4.93% and 3.88% respectively.
INTERSPEECH 2015. Speech Beyond Speech: Towards a Better Understanding of the Most Important Biosignal
© 2015 ISCA and the Author[s]. The attached file is reproduced here in accordance with the copyright policy of the publisher. For information about this conference please refer to the conference’s website or contact the author[s].
Artificial Intelligence and Image Processing not elsewhere classified