MultiModal Ensemble Approach Leveraging Spatial, Skeletal, and Edge Features for Enhanced Bangla Sign Language Recognition

Loading...
Thumbnail Image
File version

Accepted Manuscript (AM)

Author(s)
Shams, Khan Abrar
Reaz, Md Rafid
Rafi, Mohammad Ryan Ur
Islam, Sanjida
Rahman, Md Shahriar
Rahman, Rafeed
Reza, Md Tanzim
Parvez, Mohammad Zavid
Chakraborty, Subrata
Pradhan, Biswajeet
Alamri, Abdullah
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)
Date
2024
Size
File type(s)
Location
Abstract

Sign language is the primary form of communication for individuals with auditory impairment. In Bangladesh, Bangla Sign Language (BdSL) is widely used among the hearing-impaired population. However, due to the general public’s limited awareness of sign language, communicating with them using BdSL can be challenging. Consequently, there is a growing demand for an automated system capable of efficiently understanding BdSL. For automation, various Deep Learning (DL) architectures can be employed to translate Bangla Sign Language into readable digital text. The automation system incorporates live cameras continuously capturing images, which are then processed by a DL model. However, factors such as lighting, background noise, skin tone, hand orientations, and other aspects of the image circumstances may introduce uncertainty variables. To address this, we propose a procedure that reduces these uncertainties by considering three different modalities: spatial information, skeleton awareness, and edge awareness. We introduce three image pre-processing techniques alongside three CNN models. The CNN models are combined using nine distinct ensemble meta-learning algorithms, with five of them being modifications of averaging and voting techniques. In the result analysis, our individual CNN models achieved higher training accuracy at 99.77%, 98.11%, and 99.30%, respectively, than most of the other state-of-the-art image classification architectures, except for ResNet50, which achieved 99.87%. Meanwhile, the ensemble model achieved the highest accuracy of 95.13% on the testing set, outperforming all individual CNN models. This analysis demonstrates that considering multiple modalities can significantly improve the system’s overall performance in hand pattern recognition.

Journal Title

IEEE Access

Conference Title
Book Title
Edition
Volume
Issue
Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/

Item Access Status
Note

This publication has been entered in Griffith Research Online as an advance online version.

Access the data
Related item(s)
Subject

Engineering

Information and computing sciences

Persistent link to this record
Citation

Shams, KA; Reaz, MR; Rafi, MRU; Islam, S; Rahman, MS; Rahman, R; Reza, MT; Parvez, MZ; Chakraborty, S; Pradhan, B; Alamri, A, MultiModal Ensemble Approach Leveraging Spatial, Skeletal, and Edge Features for Enhanced Bangla Sign Language Recognition, IEEE Access, 2024

Collections