Feature Extraction and Dimensionality Reduction in Pattern Recognition and Their Application in Speech Recognition
Files
File version
Author(s)
Primary Supervisor
Paliwal, Kuldip
Other Supervisors
Editor(s)
Date
Size
File type(s)
Location
License
Abstract
Conventional pattern recognition systems have two components: feature analysis and pattern classification. Feature analysis is achieved in two steps: parameter extraction step and feature extraction step. In the parameter extraction step, information relevant for pattern classification is extracted from the input data in the form of parameter vector. In the feature extraction step, the parameter vector is transformed to a feature vector. Feature extraction can be conducted independently or jointly with either parameter extraction or classification. Linear Discriminant Analysis (LDA) and Principal Component Analysis (PCA) are the two popular independent feature extraction algorithms. Both of them extract features by projecting the parameter vectors into a new feature space through a linear transformation matrix. But they optimize the transformation matrix with different intentions. PCA optimizes the transformation matrix by finding the largest variations in the original feature space. LDA pursues the largest ratio of between-class variation and within-class variation when projecting the original feature space to a subspace. The drawback of independent feature extraction algorithms is that their optimization criteria are different from the classifier’s minimum classification error criterion, which may cause inconsistency between feature extraction and the classification stages of a pattern recognizer and consequently, degrade the performance of classifiers. A direct way to overcome this problem is to conduct feature extraction and classification jointly with a consistent criterion. Minimum classification Error (MCE) training algorithm provides such an integrated framework. MCE algorithm was first proposed for optimizing classifiers. It is a type of discriminative learning algorithm but achieves minimum classification error directly. The flexibility of the framework of MCE algorithm makes it convenient to conduct feature extraction and classification jointly. Conventional feature extraction and pattern classification algorithms, LDA, PCA, MCE training algorithm, minimum distance classifier, likelihood classifier and Bayesian classifier, are linear algorithms. The advantage of linear algorithms is their simplicity and ability to reduce feature dimensionalities. However, they have the limitation that the decision boundaries generated are linear and have little computational flexibility. SVM is a recently developed integrated pattern classification algorithm with non-linear formulation. It is based on the idea that the classification that a.ords dot-products can be computed efficiently in higher dimensional feature spaces. The classes which are not linearly separable in the original parametric space can be linearly separated in the higher dimensional feature space. Because of this, SVM has the advantage that it can handle the classes with complex nonlinear decision boundaries. However, SVM is a highly integrated and closed pattern classification system. It is very difficult to adopt feature extraction into SVM’s framework. Thus SVM is unable to conduct feature extraction tasks. This thesis investigates LDA and PCA for feature extraction and dimensionality reduction and proposes the application of MCE training algorithms for joint feature extraction and classification tasks. A generalized MCE (GMCE) training algorithm is proposed to mend the shortcomings of the MCE training algorithms in joint feature and classification tasks. SVM, as a non-linear pattern classification system is also investigated in this thesis. A reduced-dimensional SVM (RDSVM) is proposed to enable SVM to conduct feature extraction and classification jointly. All of the investigated and proposed algorithms are tested and compared firstly on a number of small databases, such as Deterding Vowels Database, Fisher’s IRIS database and German’s GLASS database. Then they are tested in a large-scale speech recognition experiment based on TIMIT database.
Journal Title
Conference Title
Book Title
Edition
Volume
Issue
Thesis Type
Thesis (PhD Doctorate)
Degree Program
Doctor of Philosophy (PhD)
School
School of Microelectronic Engineering
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement
The author owns the copyright in this thesis, unless stated otherwise.
Item Access Status
Public
Note
Access the data
Related item(s)
Subject
feature extraction
dimensionality
pattern recognition
speech recognition
Linear Discriminant Analysis
LDA
Principal Component Analysis
PCA
Minimum Classification Error algorithm
MCE
Support Vector Machine
SVM