Multi-domain Attention Fusion Network For Language Recognition
File version
Author(s)
Xu, Y
Ke, D
Su, K
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)
Date
Size
File type(s)
Location
License
Abstract
Attention-based convolutional neural network models are increasingly adopted for language recognition tasks. In this paper, based on the self-attention mechanism, we solve the study of language recognition by acquiring rich context dependence. To do so, we propose a new multi-domain feature fusion network to integrate local features and their global dependencies adaptively. Specifically, we attach three attention modules to each stage of ResNet, which model semantic dependence in the time, frequency, and channel domain, respectively. The time attention module aggregates the features of all the time locations through the weighted sum of the features from the time feature map and the original features. Correspondingly, the frequency/channel attention module aggregates the features of all the frequency/channel locations through the weighted sum of the features from the frequency/channel feature map and the original features. We then aggregate the outputs of the three attention modules in three ways, which are addition, average, and maximum, respectively, to further improve feature representation. Experiments are conducted on the APSIPA 2017 Oriental Language Recognition (AP17-OLR) dataset and the AP20-OLR-noisy-task dataset, and on both of them, our proposed method achieves the state-of-the-art results.
Journal Title
SN Computer Science
Conference Title
Book Title
Edition
Volume
4
Issue
Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement
Item Access Status
Note
Access the data
Related item(s)
Subject
Neural networks
Data management and data science
Persistent link to this record
Citation
Ju, M; Xu, Y; Ke, D; Su, K, Multi-domain Attention Fusion Network For Language Recognition, SN Computer Science, 2023, 4, pp. 39