Speech Recognition and Enhancement using Autocorrelation Domain Processing

Loading...
Thumbnail Image
File version
Primary Supervisor

Paliwal, Kuldip

Other Supervisors

Vlacic, Ljubo

Editor(s)
Date
2007
Size
File type(s)
Location
License
Abstract

From a young age, humans learn language skills and develop them to the point that they become reflex like. As a communication modality, speech is efficient, natural and intrinsically understood. By developing spoken language interfaces for machines, the same kinds of benefits can be realised for the human-machine interaction. Development of machine based speech recognition has been in progress for the past 50 years. In this time significant advances have been made, but the performance of current solutions in the presence of ambient acoustic noise is one factor holding the technology back. Contributing to the overall deficiency of the system is the performance of current feature extraction methods. These techniques cannot be described as robust when deployed in the dynamic acoustic environments typically encountered in everyday life. Ambient background noise also affects speech communication between humans. Restoration of a degraded speech signal by a speech enhancement algorithm can help to reduce this effect. Techniques developed for improving the noise robustness of feature extraction algorithms can also find application in speech enhancement algorithms. Contributions made in this thesis are aimed at improving the performance of automatic speech recognition in the presence of ambient acoustic noise and the quality of speech perceived by human listeners in the same conditions. The proposed techniques are based on processing the degraded speech signal in the ii autocorrelation domain. Based on the differences in the production mechanisms of speech and noise signals, transforming them into the autocorrelation domain provides a favourable representation for noise robust processing. We found that by utilising the higher-lag coefficients of the autocorrelation sequence and discarding the lower-lag coefficients, more noise robust spectral estimates could be made. This approach was found to be adept at suppressing particular classes of non-stationary noise that conventional methods fail to handle. We also explored a topic in speech enhancement of phase spectrum estimation and showed positive results. The proposed feature extraction and speech enhancement techniques, while performing very well for some non-stationary noises, were less effective against the stationary cases. This work highlights the autocorrelation domain as a domain for noise robust speech processing in the presence of dynamic ambient noises. With improvements in short-time autocorrelation estimation, it is expected that the performance of the techniques for stationary noises can also be improved.

Journal Title
Conference Title
Book Title
Edition
Volume
Issue
Thesis Type

Thesis (PhD Doctorate)

Degree Program

Doctor of Philosophy (PhD)

School

Griffith School of Engineering

Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement

The author owns the copyright in this thesis, unless stated otherwise.

Item Access Status

Public

Note
Access the data
Related item(s)
Subject

human-machine interaction

speech recognition

speech enhancement algorithm

Autocorrelation Domain Processing

Persistent link to this record
Citation