Robust Speech Recognition in Adverse Environments
File version
Author(s)
Primary Supervisor
Paliwal, Kuldip K.
Other Supervisors
Editor(s)
Date
Size
File type(s)
Location
License
Abstract
The performance of an automatic speech recognition system degrades drastically when there is a mismatch between training and testing environments. The aim of robust speech recognition is to overcome this mismatch. Numerous methods have been reported in the literature that attempt to provide robustness to this mismatch. This thesis investigates several different techniques at different stages of the recognition process that are suitable for robust speech recognition. All experiments are conducted on the ISOLET database. The TIMIT database was also used to confirm some of the experimental results. A number of speech enhancement techniques have been used in the past for speech recognition to achieve robustness with respect to noise. A speech enhancement system attempts to reduce noise from the noisy speech signal and is used as a pre-processor to a speech recogniser. In this thesis, a singular value decomposition (SVD) based speech enhancement method is used for robust speech recognition. The speech recognition performance of the SVD method is compared to that of the popular spectral subtraction method. Speech recognition performance is directly affected by the performance of the feature extraction stage. This thesis provides a comprehensive evaluation of a number of acoustic front-ends for robust speech recognition. It also investigates the use of human auditory properties for robust feature extraction. Two acoustic front-ends based on simultaneous masking and variable frequency and temporal resolutions are proposed and their performance is investigated for speech distorted by additive noise and channel distortion. This thesis also investigates the degradation in speech recognition performance due to speech coding distortion. For this, seven different speech coders operating at different bit rates are simulated and the speech recogniser is utilised through each of these coders. The MAP adaptation technique is then applied to adapt the model parameters to the speech coding environment. The resulting system is found to perform well in the presence of the speech coding distortion.
Journal Title
Conference Title
Book Title
Edition
Volume
Issue
Thesis Type
Thesis (PhD Doctorate)
Degree Program
Doctor of Philosophy (PhD)
School
School of Microelectronic Engineering
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement
The author owns the copyright in this thesis, unless stated otherwise.
Item Access Status
Note
Access the data
Related item(s)
Subject
Robust speech recognition
Singular value decomposition
Speech enhancement
Spectral subtraction
Speech coding distortion