Modulation Domain Based Processing for Speech Enhancement
MetadataShow full item record
For a long time, the spectral envelope has been accepted as the principal carrier of information important to speech. Therefore much of the work done for speech applications, such as automatic speech recognition and speech enhancement, has aimed to process this envelope. For speech enhancement, given the quasi-stationarity of speech, many approaches have been based on short-time processing of speech in a Fourier analysis-modification-synthesis (AMS) framework. Within this framework, either the magnitude and/or phase spectrum can be modified by a noise suppression or signal estimation approach to achieve enhancement. Most commonly, it is the short-time (acoustic) magnitude spectrum which is modified in order to suppress noise. While there are many methods for enhancement in the literature, it is generally agreed that current methods only achieve in making noise less perceptually annoying while maintaining intelligibility, leaving much room for improvement. In more recent years, the low-frequency temporal modulations of the spectral envelope have received increasing attention. Findings of physiological and psychoacoustic experiments have indicated the importance of these modulations in the human auditory system. This has led to the view that these temporal modulations convey much of the information necessary for speech perception. Many of the efforts to apply modulation processing to the enhancement of speech originated from work in automatic speech recognition, and are based on filtering the trajectories of each acoustic band. However, these filters were typically designed to operate over the entire utterance, without accounting for the properties of speech and noise in the signal. Consequently, processed speech quality is quite poor where corrupting noise types are dissimilar from that used to design the filters.
Thesis (PhD Doctorate)
Doctor of Philosophy (PhD)
Griffith School of Engineering
Figures 2.1, 2.4, 2.5, 2.6 and 2.10 have been removed from the digital copy to comply with copyright.
Automatic speech recognition