Relation-Aware Heterogeneous Graph Network for Learning Intermodal Semantics in Textbook Question Answering
File version
Author(s)
Wu, Yunjie
Zhang, Xiaowang
Feng, Zhiyong
Wan, Liang
Zhuang, Zhiqiang
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)
Date
Size
File type(s)
Location
License
Abstract
Textbook question answering (TQA) task aims to infer answers for given questions from a multimodal context, including text and diagrams. The existing studies have aggregated intramodal semantics extracted from a single modality but have yet to capture the intermodal semantics between different modalities. A major challenge in learning intermodal semantics is maintaining lossless intramodal semantics while bridging the gap of semantics caused by heterogeneity. In this article, we propose an intermodal relation-aware heterogeneous graph network (IMR-HGN) to extract the intermodal semantics for TQA, which aggregates different modalities while learning features rather than representing them independently. First, we design a multidomain consistent representation (MDCR) to eliminate semantic gaps by capturing intermodal features while maintaining lossless intramodal semantics in multidomains. Furthermore, we present neighbor-based relation inpainting (NRI) to reduce semantic ambiguity via repairing fuzzy relations with correlations of relations. Finally, we propose hierarchical multisemantics aggregation (HMSA) to guarantee the completeness of semantics by aggregating features of nodes and relations with a reconstruction network (RN). Experimental results show that IMR-HGN could extract the intermodal semantics of answers, achieving a 2.16% improvement on the validation set of the TQA dataset and a 3.04% increase on the test set of the AI2D dataset.
Journal Title
IEEE Transactions on Neural Networks and Learning Systems
Conference Title
Book Title
Edition
Volume
35
Issue
9
Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement
Item Access Status
Note
Access the data
Related item(s)
Subject
Persistent link to this record
Citation
Zhang, S; Wu, Y; Zhang, X; Feng, Z; Wan, L; Zhuang, Z, Relation-Aware Heterogeneous Graph Network for Learning Intermodal Semantics in Textbook Question Answering, IEEE Transactions on Neural Networks and Learning Systems, 2024, 35 (9), pp. 11872-11883