Multi-lingual text recognition from video frames

Sharma, Nabin; Mandal, Ranju; Sharma, Rabi; Roy, Partha P; Pal, Umapada; Blumenstein, Michael

doi:10.1109/ICDAR.2015.7333902

Multi-lingual text recognition from video frames

Author(s)

Sharma, Nabin
Mandal, Ranju
Sharma, Rabi
Roy, Partha P
Pal, Umapada
Blumenstein, Michael

Griffith University Author(s)

Mandal, Ranju

Date

2015

Abstract

Text recognition from video frames is a challenging task due to low resolution, blur, complex and coloured backgrounds, noise, to mention a few. Consequently, the traditional ways of text recognition from scanned documents having simple backgrounds fails when applied to video text. Although there are various techniques available for text recognition from handwritten and printed documents with simple backgrounds, text recognition from video frames has not been comprehensively investigated, especially for multi-lingual videos. In this paper, we present a technique for multi-lingual video text recognition which involves script identification in the first stage, followed by word and character recognition, and finally the results are refined using a post-processing technique. Considering the inherent problems in videos, a Spatial Pyramid Matching (SPM) based technique, using patch-based SIFT descriptors and SVM classifier, is employed for script identification. In the next stage, a Hidden Markov Model (HMM) based approach is used for word and character recognition, which utilizes the context information. Finally, a lexicon-based post-processing technique is applied to verify and refine the word recognition results. The proposed method was tested on a dataset comprising of 4800 words from three different scripts, namely, Roman (English), Hindi and Bengali. The script identification results obtained are encouraging. The word and character recognition results are also encouraging considering the complexity and problems associated with video text processing.

Conference Title

2015 13TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR)

DOI

10.1109/ICDAR.2015.7333902

Subject

Artificial intelligence not elsewhere classified

Persistent link to this record

http://hdl.handle.net/10072/340498

Collections

Conference outputs

Full item page

Multi-lingual text recognition from video frames

File version

Author(s)

Griffith University Author(s)

Primary Supervisor

Other Supervisors

Editor(s)

Date

Size

File type(s)

Location

License

Abstract

Journal Title

Conference Title

Book Title

Edition

Volume

Issue

Thesis Type

Degree Program

School

Publisher link

DOI

Patent number

Funder(s)

Grant identifier(s)

Rights Statement

Rights Statement

Item Access Status

Note

Access the data

Related item(s)

Subject

Persistent link to this record

Citation

Collections