VQ-BASED WRITTEN LANGUAGE IDENTIFICATION

No Thumbnail Available
File version
Author(s)
Pham, T
Tran, D
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)

B. Boashash

Date
2003
Size
File type(s)
Location

PARIS, FRANCE

License
Abstract

Humans can recognize different types of written languages by their grammars and vocabularies. However, computers see everything as numbers. We present a computational algorithm for machine classification of written languages using the method of vector quantization. For a language document, each word is converted to a sequence of numbers and forms as a vector of numerical values according to its characters. This collection of vectors is then represented by a codebook that contains a number of template vectors for classification. The proposed method is more effective for machine learning than the n-gram based method, which has been widely used for written language identification. Experimental results of classifying a set of five closely roman-typed scripts show the promising application of the proposed method.

Journal Title
Conference Title

SEVENTH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND ITS APPLICATIONS, VOL 1, PROCEEDINGS

Book Title
Edition
Volume
Issue
Thesis Type
Degree Program
School
DOI
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement
Item Access Status
Note
Access the data
Related item(s)
Subject
Persistent link to this record
Citation