Speech Recognition with a Generative Factor Analyzed Hidden Markov Model
We present a generative factor analyzed hidden Markov model (GFA-HMM) for automatic speech recognition. In a traditional HMM, the observation vectors are represented by mixture of Gaussians (MoG) that are dependent on discrete-valued hidden state sequence. The GFA-HMM introduces a hierarchy of continuous-valued latent representation of observation vectors, where latent vectors in one level are acoustic-unit dependent and the latent vectors in a higher level are acoustic-unit independent. An expectation maximization (EM) algorithm is derived for maximum likelihood parameter estimation of the model. The GFA-HMM can achieve a much more compact representation of the intra-frame statistics of observation vectors than traditional HMM. We conducted an experiment to show that the GFA-HMM can achieve better performances over traditional HMM with the same amount of training data but much smaller number of model parameters.
Proceedings of the 8th European Conference on Speech Communication and Technology (EUROSPEECH-2003)