MultiModal Ensemble Approach Leveraging Spatial, Skeletal, and Edge Features for Enhanced Bangla Sign Language Recognition
File version
Accepted Manuscript (AM)
Author(s)
Reaz, Md Rafid
Rafi, Mohammad Ryan Ur
Islam, Sanjida
Rahman, Md Shahriar
Rahman, Rafeed
Reza, Md Tanzim
Parvez, Mohammad Zavid
Chakraborty, Subrata
Pradhan, Biswajeet
Alamri, Abdullah
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)
Date
Size
File type(s)
Location
Abstract
Sign language is the primary form of communication for individuals with auditory impairment. In Bangladesh, Bangla Sign Language (BdSL) is widely used among the hearing-impaired population. However, due to the general public’s limited awareness of sign language, communicating with them using BdSL can be challenging. Consequently, there is a growing demand for an automated system capable of efficiently understanding BdSL. For automation, various Deep Learning (DL) architectures can be employed to translate Bangla Sign Language into readable digital text. The automation system incorporates live cameras continuously capturing images, which are then processed by a DL model. However, factors such as lighting, background noise, skin tone, hand orientations, and other aspects of the image circumstances may introduce uncertainty variables. To address this, we propose a procedure that reduces these uncertainties by considering three different modalities: spatial information, skeleton awareness, and edge awareness. We introduce three image pre-processing techniques alongside three CNN models. The CNN models are combined using nine distinct ensemble meta-learning algorithms, with five of them being modifications of averaging and voting techniques. In the result analysis, our individual CNN models achieved higher training accuracy at 99.77%, 98.11%, and 99.30%, respectively, than most of the other state-of-the-art image classification architectures, except for ResNet50, which achieved 99.87%. Meanwhile, the ensemble model achieved the highest accuracy of 95.13% on the testing set, outperforming all individual CNN models. This analysis demonstrates that considering multiple modalities can significantly improve the system’s overall performance in hand pattern recognition.
Journal Title
IEEE Access
Conference Title
Book Title
Edition
Volume
Issue
Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
Item Access Status
Note
This publication has been entered in Griffith Research Online as an advance online version.
Access the data
Related item(s)
Subject
Engineering
Information and computing sciences
Persistent link to this record
Citation
Shams, KA; Reaz, MR; Rafi, MRU; Islam, S; Rahman, MS; Rahman, R; Reza, MT; Parvez, MZ; Chakraborty, S; Pradhan, B; Alamri, A, MultiModal Ensemble Approach Leveraging Spatial, Skeletal, and Edge Features for Enhanced Bangla Sign Language Recognition, IEEE Access, 2024