Show simple item record

dc.contributor.authorHuang, Bin
dc.contributor.authorKe, Dengfeng
dc.contributor.authorZheng, Hao
dc.contributor.authorXu, Bo
dc.contributor.authorXu, Yanyan
dc.contributor.authorSu, Kaile
dc.contributor.editorSebastian Möller, Hermann Ney, Bernd Möbius, Elmar Nöth, Stefan Steidl
dc.date.accessioned2017-09-14T06:36:24Z
dc.date.available2017-09-14T06:36:24Z
dc.date.issued2015
dc.identifier.urihttp://hdl.handle.net/10072/123537
dc.description.abstractTraditional automatic speech recognition (ASR) systems usually get a sharp performance drop when noise presents in speech. To make a robust ASR, we introduce a new model using the multi-task learning deep neural networks (MTL-DNN) to solve the speech denoising task in feature level. In this model, the networks are initialized by pre-training restricted Boltzmann machines (RBM) and fine-tuned by jointly learning multiple interactive tasks using a shared representation. In multi-task learning, we choose a noisy-clean speech pair fitting task as the primary task and separately explore two constraints as the secondary tasks: phone label and phone cluster. In experiments, the denoised speech is reconstructed by the MTL-DNN using the noisy speech as input and it is respectively evaluated by the DNN-hidden Markov model (HMM) based and the Gaussian Mixture Model (GMM)-HMM based ASR systems. Results show that, using the denoised speech, the word error rate (WER) is respectively reduced by 53.14% and 34.84% compared with baselines. The MTL-DNN model also outperforms the general single-task learning deep neural networks (STL-DNN) model with a performance improvement of 4.93% and 3.88% respectively.
dc.description.peerreviewedYes
dc.languageEnglish
dc.publisherInternational Speech Communication Association (ISCA)
dc.publisher.placeAustralia
dc.publisher.urihttp://interspeech2015.org/
dc.relation.ispartofconferencenameINTERSPEECH 2015
dc.relation.ispartofconferencetitleINTERSPEECH 2015. Speech Beyond Speech: Towards a Better Understanding of the Most Important Biosignal
dc.relation.ispartofdatefrom2015-09-06
dc.relation.ispartofdateto2015-09-10
dc.relation.ispartoflocationDresden, Germany
dc.subject.fieldofresearchArtificial intelligence not elsewhere classified
dc.subject.fieldofresearchcode460299
dc.titleMulti-task learning deep neural networks for speech feature denoising
dc.typeConference output
dc.type.descriptionE1 - Conferences
dc.type.codeE - Conference Publications
dc.description.versionVersion of Record (VoR)
gro.rights.copyright© 2015 ISCA and the Author[s]. The attached file is reproduced here in accordance with the copyright policy of the publisher. For information about this conference please refer to the conference’s website or contact the author[s].
gro.hasfulltextFull Text
gro.griffith.authorSu, Kaile


Files in this item

This item appears in the following Collection(s)

  • Conference outputs
    Contains papers delivered by Griffith authors at national and international conferences.

Show simple item record