Evaluation of the modified group delay feature for isolated word recognition

View/ Open
Author(s)
Alsteris, LD
Paliwal, KK
Griffith University Author(s)
Year published
2005
Metadata
Show full item recordAbstract
The results of our recent human perception experiments indicate that the short-time phase spectrum can significantly contribute to speech intelligibility over small window durations (i.e., 20-40 ms). This motivates us to investigate the use of the short-time phase spectrum to derive features for automatic speech recognition, which generally uses small window durations of 20-40 ms for spectral analysis. In this paper, we specifically investigate the frequency-derivative of the short-time phase spectrum (i.e., group delay function, GDF) from which to extract features. We demonstrate, with some simple examples, the volatility ...
View more >The results of our recent human perception experiments indicate that the short-time phase spectrum can significantly contribute to speech intelligibility over small window durations (i.e., 20-40 ms). This motivates us to investigate the use of the short-time phase spectrum to derive features for automatic speech recognition, which generally uses small window durations of 20-40 ms for spectral analysis. In this paper, we specifically investigate the frequency-derivative of the short-time phase spectrum (i.e., group delay function, GDF) from which to extract features. We demonstrate, with some simple examples, the volatility of the GDF to noise, pitch epochs and windowing effects. We summarise the work by Yegnanarayana and Murthy on the modified GDF (MGDF), which serves to remedy the problems of the GDF. We then implement Murthy and Gadde's MGDF-based features (MODGDF) to determine if they provide an improvement over the popular MFCC representation either by themselves or in combination with MFCCs on an isolated word recognition task.
View less >
View more >The results of our recent human perception experiments indicate that the short-time phase spectrum can significantly contribute to speech intelligibility over small window durations (i.e., 20-40 ms). This motivates us to investigate the use of the short-time phase spectrum to derive features for automatic speech recognition, which generally uses small window durations of 20-40 ms for spectral analysis. In this paper, we specifically investigate the frequency-derivative of the short-time phase spectrum (i.e., group delay function, GDF) from which to extract features. We demonstrate, with some simple examples, the volatility of the GDF to noise, pitch epochs and windowing effects. We summarise the work by Yegnanarayana and Murthy on the modified GDF (MGDF), which serves to remedy the problems of the GDF. We then implement Murthy and Gadde's MGDF-based features (MODGDF) to determine if they provide an improvement over the popular MFCC representation either by themselves or in combination with MFCCs on an isolated word recognition task.
View less >
Conference Title
ISSPA 2005: The 8th International Symposium on Signal Processing and its Applications, Vols 1 and 2, Proceedings
Volume
2
Copyright Statement
© 2005 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.