MFCC computation from magnitude spectrum of higher lag autocorrelation coefficients for robust speech recognition
Processing of the speech signal in the autocorrelation domain in the context of robust feature extraction is based on the following two properties: 1) pole preserving property (the poles of a given (original) signal are preserved in its autocorrelation function), and 2) noise separation property (the autocorrelation function of a noise signal is confined to lower lags, while the speech signal contribution is spread over all the lags in the autocorrelation function, thus providing a way to eliminate noise by discarding lower-lag autocorrelation coefficients). In this paper, we use these properties to derive robust features for automatic speech recognition. We compute the magnitude spectrum of the one-sided higher-lag autocorrelation sequence, process it through a Mel filter bank and parameterise it in terms of Mel Frequency Cepstral Coefficients (MFCCs). Since the proposed method combines autocorrelation domain processing with Mel filter bank analysis, we call the resulting MFCCs, Autocorrelation Mel Frequency Cepstral Coefficients (AMFCCs). Recognition experiments are conducted on the Aurora II database and it is found that the AMFCC representation performs as well as the MFCC representation in clean conditions and provides more robust performance in the presence of background noise.
Interspeech 2004 (ICSLP)