Robust Speech Recognition in Adverse Environments

View/ Open
Author(s)
Primary Supervisor
Paliwal, Kuldip K.
Year published
2000
Metadata
Show full item recordAbstract
The performance of an automatic speech recognition system degrades drastically when there is a mismatch between training and testing environments. The aim of robust speech recognition is to overcome this mismatch. Numerous methods have been reported in the literature that attempt to provide robustness to this mismatch. This thesis investigates several different techniques at different stages of the recognition process that are suitable for robust speech recognition. All experiments are conducted on the ISOLET database. The TIMIT database was also used to confirm some of the experimental results. A number of speech enhancement ...
View more >The performance of an automatic speech recognition system degrades drastically when there is a mismatch between training and testing environments. The aim of robust speech recognition is to overcome this mismatch. Numerous methods have been reported in the literature that attempt to provide robustness to this mismatch. This thesis investigates several different techniques at different stages of the recognition process that are suitable for robust speech recognition. All experiments are conducted on the ISOLET database. The TIMIT database was also used to confirm some of the experimental results. A number of speech enhancement techniques have been used in the past for speech recognition to achieve robustness with respect to noise. A speech enhancement system attempts to reduce noise from the noisy speech signal and is used as a pre-processor to a speech recogniser. In this thesis, a singular value decomposition (SVD) based speech enhancement method is used for robust speech recognition. The speech recognition performance of the SVD method is compared to that of the popular spectral subtraction method. Speech recognition performance is directly affected by the performance of the feature extraction stage. This thesis provides a comprehensive evaluation of a number of acoustic front-ends for robust speech recognition. It also investigates the use of human auditory properties for robust feature extraction. Two acoustic front-ends based on simultaneous masking and variable frequency and temporal resolutions are proposed and their performance is investigated for speech distorted by additive noise and channel distortion. This thesis also investigates the degradation in speech recognition performance due to speech coding distortion. For this, seven different speech coders operating at different bit rates are simulated and the speech recogniser is utilised through each of these coders. The MAP adaptation technique is then applied to adapt the model parameters to the speech coding environment. The resulting system is found to perform well in the presence of the speech coding distortion.
View less >
View more >The performance of an automatic speech recognition system degrades drastically when there is a mismatch between training and testing environments. The aim of robust speech recognition is to overcome this mismatch. Numerous methods have been reported in the literature that attempt to provide robustness to this mismatch. This thesis investigates several different techniques at different stages of the recognition process that are suitable for robust speech recognition. All experiments are conducted on the ISOLET database. The TIMIT database was also used to confirm some of the experimental results. A number of speech enhancement techniques have been used in the past for speech recognition to achieve robustness with respect to noise. A speech enhancement system attempts to reduce noise from the noisy speech signal and is used as a pre-processor to a speech recogniser. In this thesis, a singular value decomposition (SVD) based speech enhancement method is used for robust speech recognition. The speech recognition performance of the SVD method is compared to that of the popular spectral subtraction method. Speech recognition performance is directly affected by the performance of the feature extraction stage. This thesis provides a comprehensive evaluation of a number of acoustic front-ends for robust speech recognition. It also investigates the use of human auditory properties for robust feature extraction. Two acoustic front-ends based on simultaneous masking and variable frequency and temporal resolutions are proposed and their performance is investigated for speech distorted by additive noise and channel distortion. This thesis also investigates the degradation in speech recognition performance due to speech coding distortion. For this, seven different speech coders operating at different bit rates are simulated and the speech recogniser is utilised through each of these coders. The MAP adaptation technique is then applied to adapt the model parameters to the speech coding environment. The resulting system is found to perform well in the presence of the speech coding distortion.
View less >
Thesis Type
Thesis (PhD Doctorate)
Degree Program
Doctor of Philosophy (PhD)
School
School of Microelectronic Engineering
Copyright Statement
The author owns the copyright in this thesis, unless stated otherwise.
Subject
Robust speech recognition
Singular value decomposition
Speech enhancement
Spectral subtraction
Speech coding distortion