Show simple item record

dc.contributor.authorSo, Stephenen_US
dc.contributor.authorPaliwal, Kuldipen_US
dc.contributor.editorM.G.J. Swertsen_US
dc.date.accessioned2017-05-03T13:01:25Z
dc.date.available2017-05-03T13:01:25Z
dc.date.issued2006en_US
dc.date.modified2009-09-21T05:50:51Z
dc.identifier.issn01676393en_US
dc.identifier.doi10.1016/j.specom.2005.10.002en_AU
dc.identifier.urihttp://hdl.handle.net/10072/14445
dc.description.abstractIn this paper, we investigate the use of block quantisers based on Gaussian mixture models (GMMs) for the coding of Mel frequency-warped cepstral coefficient (MFCC) features in distributed speech recognition (DSR) applications. Specifically, we consider the multi-frame scheme, where temporal correlation across MFCC frames is exploited by the Karhunen-Loeve transform of the block quantiser. Compared with vector quantisers, the GMM-based block quantiser has relatively low computational and memory requirements which are independent of bitrate. More importantly, it is bitrate scalable, which means that the bitrate can be adjusted without the need for re-training. Static parameters such as the GMM and transform matrices are stored at the encoder and decoder and bit allocations are calculated "on-the-fly" without intensive processing. We have evaluated the quantisation scheme on the Aurora-2 database in a DSR framework. We show that jointly quantising more frames and using more mixture components in the GMM leads to higher recognition performance. The multi-frame GMM-based block quantiser achieves a word error rate (WER) of 2.5% at 800 bps, which is less than 1% degradation from the baseline (unquantised) word recognition accuracy, and graceful degradation down to a WER of 7% at 300 bps.en_US
dc.description.peerreviewedYesen_US
dc.description.publicationstatusYesen_AU
dc.languageEnglishen_US
dc.language.isoen_AU
dc.publisherElsevier BVen_US
dc.publisher.placeNetherlandsen_US
dc.publisher.urihttp://www.elsevier.com/wps/find/journaldescription.cws_home/505597/description#descriptionen_AU
dc.relation.ispartofstudentpublicationNen_AU
dc.relation.ispartofpagefrom746en_US
dc.relation.ispartofpageto758en_US
dc.relation.ispartofissue6en_US
dc.relation.ispartofjournalSpeech Communicationen_US
dc.relation.ispartofvolume48en_US
dc.rights.retentionYen_AU
dc.subject.fieldofresearchcode280206en_US
dc.subject.fieldofresearchcode280204en_US
dc.titleScalable distributed speech recognition using Gaussian mixture model-based block quantizationen_US
dc.typeJournal articleen_US
dc.type.descriptionC1 - Peer Reviewed (HERDC)en_US
dc.type.codeC - Journal Articlesen_US
gro.facultyGriffith Sciences, Griffith School of Engineeringen_US
gro.rights.copyrightCopyright 2006 Elsevier. Please refer to the journal's website for access to the definitive, published version.en_AU
gro.date.issued2006
gro.hasfulltextNo Full Text


Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)

  • Journal articles
    Contains articles published by Griffith authors in scholarly journals.

Show simple item record