Modulation Domain Based Processing for Speech Enhancement

View/ Open
Author(s)
Primary Supervisor
Paliwal, Kuldip
Other Supervisors
So, Stephen
Year published
2013
Metadata
Show full item recordAbstract
For a long time, the spectral envelope has been accepted as the principal carrier of information important to speech. Therefore much of the work done for speech applications, such as automatic speech recognition and speech enhancement, has aimed to process this envelope. For speech enhancement, given the quasi-stationarity of speech, many approaches have been based on short-time processing of speech in a Fourier analysis-modification-synthesis (AMS) framework. Within this framework, either the magnitude and/or phase spectrum can be modified by a noise suppression or signal estimation approach to achieve enhancement. Most ...
View more >For a long time, the spectral envelope has been accepted as the principal carrier of information important to speech. Therefore much of the work done for speech applications, such as automatic speech recognition and speech enhancement, has aimed to process this envelope. For speech enhancement, given the quasi-stationarity of speech, many approaches have been based on short-time processing of speech in a Fourier analysis-modification-synthesis (AMS) framework. Within this framework, either the magnitude and/or phase spectrum can be modified by a noise suppression or signal estimation approach to achieve enhancement. Most commonly, it is the short-time (acoustic) magnitude spectrum which is modified in order to suppress noise. While there are many methods for enhancement in the literature, it is generally agreed that current methods only achieve in making noise less perceptually annoying while maintaining intelligibility, leaving much room for improvement. In more recent years, the low-frequency temporal modulations of the spectral envelope have received increasing attention. Findings of physiological and psychoacoustic experiments have indicated the importance of these modulations in the human auditory system. This has led to the view that these temporal modulations convey much of the information necessary for speech perception. Many of the efforts to apply modulation processing to the enhancement of speech originated from work in automatic speech recognition, and are based on filtering the trajectories of each acoustic band. However, these filters were typically designed to operate over the entire utterance, without accounting for the properties of speech and noise in the signal. Consequently, processed speech quality is quite poor where corrupting noise types are dissimilar from that used to design the filters.
View less >
View more >For a long time, the spectral envelope has been accepted as the principal carrier of information important to speech. Therefore much of the work done for speech applications, such as automatic speech recognition and speech enhancement, has aimed to process this envelope. For speech enhancement, given the quasi-stationarity of speech, many approaches have been based on short-time processing of speech in a Fourier analysis-modification-synthesis (AMS) framework. Within this framework, either the magnitude and/or phase spectrum can be modified by a noise suppression or signal estimation approach to achieve enhancement. Most commonly, it is the short-time (acoustic) magnitude spectrum which is modified in order to suppress noise. While there are many methods for enhancement in the literature, it is generally agreed that current methods only achieve in making noise less perceptually annoying while maintaining intelligibility, leaving much room for improvement. In more recent years, the low-frequency temporal modulations of the spectral envelope have received increasing attention. Findings of physiological and psychoacoustic experiments have indicated the importance of these modulations in the human auditory system. This has led to the view that these temporal modulations convey much of the information necessary for speech perception. Many of the efforts to apply modulation processing to the enhancement of speech originated from work in automatic speech recognition, and are based on filtering the trajectories of each acoustic band. However, these filters were typically designed to operate over the entire utterance, without accounting for the properties of speech and noise in the signal. Consequently, processed speech quality is quite poor where corrupting noise types are dissimilar from that used to design the filters.
View less >
Thesis Type
Thesis (PhD Doctorate)
Degree Program
Doctor of Philosophy (PhD)
School
Griffith School of Engineering
Copyright Statement
The author owns the copyright in this thesis, unless stated otherwise.
Note
Figures 2.1, 2.4, 2.5, 2.6 and 2.10 have been removed from the digital copy to comply with copyright.
Subject
Spectral envelop
Speech enhancement
Automatic speech recognition