ASR on speech reconstructed from short-time fourier phase spectra
In our earlier papers, we have measured human intelligibility of speech stimuli reconstructed either from the short-time magnitude spectra (magnitude-only stimuli) or the short-time phase spectra (phase-only stimuli) of a speech stimulus. We demonstrated that, even for small analysis window durations of 20-40 ms (of relevance to automatic speech recognition), the short-time phase spectrum can contribute to speech intelligibility as much as the short-time magnitude spectrum. In this paper, we perform automatic speech recognition on magnitude-only and phase-only stimuli. When employing an MFCC-based front-end, the recognition achieved for these phase-only stimuli is much worse than magnitude-only stimuli at small analysis window durations, which is not consistent with their corresponding human intelligibility results. This implies that the MFCC feature set is not capturing all of the discriminating information present in the speech signal.
Interspeech 2004 (ICSLP)