dc.contributor.author | Huang, Bin | |
dc.contributor.author | Ke, Dengfeng | |
dc.contributor.author | Zheng, Hao | |
dc.contributor.author | Xu, Bo | |
dc.contributor.author | Xu, Yanyan | |
dc.contributor.author | Su, Kaile | |
dc.contributor.editor | Sebastian Möller, Hermann Ney, Bernd Möbius, Elmar Nöth, Stefan Steidl | |
dc.date.accessioned | 2017-09-14T06:36:24Z | |
dc.date.available | 2017-09-14T06:36:24Z | |
dc.date.issued | 2015 | |
dc.identifier.uri | http://hdl.handle.net/10072/123537 | |
dc.description.abstract | Traditional automatic speech recognition (ASR) systems usually
get a sharp performance drop when noise presents in
speech. To make a robust ASR, we introduce a new model using
the multi-task learning deep neural networks (MTL-DNN)
to solve the speech denoising task in feature level. In this model,
the networks are initialized by pre-training restricted Boltzmann
machines (RBM) and fine-tuned by jointly learning multiple
interactive tasks using a shared representation. In multi-task
learning, we choose a noisy-clean speech pair fitting task as the
primary task and separately explore two constraints as the secondary
tasks: phone label and phone cluster. In experiments,
the denoised speech is reconstructed by the MTL-DNN using
the noisy speech as input and it is respectively evaluated by the
DNN-hidden Markov model (HMM) based and the Gaussian
Mixture Model (GMM)-HMM based ASR systems. Results
show that, using the denoised speech, the word error rate (WER)
is respectively reduced by 53.14% and 34.84% compared with
baselines. The MTL-DNN model also outperforms the general
single-task learning deep neural networks (STL-DNN) model
with a performance improvement of 4.93% and 3.88% respectively. | |
dc.description.peerreviewed | Yes | |
dc.language | English | |
dc.publisher | International Speech Communication Association (ISCA) | |
dc.publisher.place | Australia | |
dc.publisher.uri | http://interspeech2015.org/ | |
dc.relation.ispartofconferencename | INTERSPEECH 2015 | |
dc.relation.ispartofconferencetitle | INTERSPEECH 2015. Speech Beyond Speech: Towards a Better Understanding of the Most Important Biosignal | |
dc.relation.ispartofdatefrom | 2015-09-06 | |
dc.relation.ispartofdateto | 2015-09-10 | |
dc.relation.ispartoflocation | Dresden, Germany | |
dc.subject.fieldofresearch | Artificial intelligence not elsewhere classified | |
dc.subject.fieldofresearchcode | 460299 | |
dc.title | Multi-task learning deep neural networks for speech feature denoising | |
dc.type | Conference output | |
dc.type.description | E1 - Conferences | |
dc.type.code | E - Conference Publications | |
dc.description.version | Version of Record (VoR) | |
gro.rights.copyright | © 2015 ISCA and the Author[s]. The attached file is reproduced here in accordance with the copyright policy of the publisher. For information about this conference please refer to the conference’s website or contact the author[s]. | |
gro.hasfulltext | Full Text | |
gro.griffith.author | Su, Kaile | |