Phase Spectrum Based Speech Processing and Spectral Energy Estimation for Robust Speech Recognition
View/ Open
Author(s)
Primary Supervisor
Paliwal, Kuldip
Other Supervisors
So, Stephen
Year published
2011
Metadata
Show full item recordAbstract
Speech is the dominant mode of communication between humans; simple to learn, easy to use and integral for modern life. Given the importance of speech, development of a human-machine speech interface has been greatly anticipated. This challenging task is encapsulated in the digital speech processing research field. In this dissertation, two specific areas of research are considered: 1) the use of short-time Fourier spectral phase in digital speech processing and 2) use of the minimum mean square error spectral energy estimator for environment-robust automatic speech recognition. In speech processing and modelling, the ...
View more >Speech is the dominant mode of communication between humans; simple to learn, easy to use and integral for modern life. Given the importance of speech, development of a human-machine speech interface has been greatly anticipated. This challenging task is encapsulated in the digital speech processing research field. In this dissertation, two specific areas of research are considered: 1) the use of short-time Fourier spectral phase in digital speech processing and 2) use of the minimum mean square error spectral energy estimator for environment-robust automatic speech recognition. In speech processing and modelling, the short-time Fourier spectral phase has been considered of minor importance. This is because classic psychoacoustic experiments have shown speech intelligibility to be closely related to short-time Fourier spectral magnitude. Given this result, it is unsurprising that the majority of speech processing literature has involved exploitation of the short-time magnitude spectrum. Despite this, recent studies have shown useful information can be extracted from the spectral phase of speech. As a result, it is now known that spectral phase possesses much of the same intelligibility information as spectral magnitude. It is this avenue of research that is explored in greater detail within this dissertation. In particular, we investigate two phase derived quantities – the short-time instantaneous frequency spectrum and the short-time group delay spectrum. The properties of both spectra are investigated mathematically and empirically, identifying the relationship between known speech features and the underlying phase spectrum. We continue the investigation by examining two related quantities – the instantaneous frequency deviation and the group delay deviation. As a result of this research, two novel phase-based spectral representations are proposed, both of which show a high degree information applicable to speech processing.
View less >
View more >Speech is the dominant mode of communication between humans; simple to learn, easy to use and integral for modern life. Given the importance of speech, development of a human-machine speech interface has been greatly anticipated. This challenging task is encapsulated in the digital speech processing research field. In this dissertation, two specific areas of research are considered: 1) the use of short-time Fourier spectral phase in digital speech processing and 2) use of the minimum mean square error spectral energy estimator for environment-robust automatic speech recognition. In speech processing and modelling, the short-time Fourier spectral phase has been considered of minor importance. This is because classic psychoacoustic experiments have shown speech intelligibility to be closely related to short-time Fourier spectral magnitude. Given this result, it is unsurprising that the majority of speech processing literature has involved exploitation of the short-time magnitude spectrum. Despite this, recent studies have shown useful information can be extracted from the spectral phase of speech. As a result, it is now known that spectral phase possesses much of the same intelligibility information as spectral magnitude. It is this avenue of research that is explored in greater detail within this dissertation. In particular, we investigate two phase derived quantities – the short-time instantaneous frequency spectrum and the short-time group delay spectrum. The properties of both spectra are investigated mathematically and empirically, identifying the relationship between known speech features and the underlying phase spectrum. We continue the investigation by examining two related quantities – the instantaneous frequency deviation and the group delay deviation. As a result of this research, two novel phase-based spectral representations are proposed, both of which show a high degree information applicable to speech processing.
View less >
Thesis Type
Thesis (PhD Doctorate)
Degree Program
Doctor of Philosophy (PhD)
School
Griffith School of Engineering
Copyright Statement
The author owns the copyright in this thesis, unless stated otherwise.
Item Access Status
Public
Subject
Digital speech processing
Speech recognition
Fourier spectral magnitude
Automatic speech recognition