Learning Hierarchical Semantic Correspondences for Cross-Modal Image-Text Retrieval

No Thumbnail Available
File version
Author(s)
Zeng, S
Liu, C
Zhou, J
Chen, Y
Jiang, A
Li, H
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)
Date
2022
Size
File type(s)
Location

Newark, New Jersey, United States

License
Abstract

Cross-modal image-text retrieval is a fundamental task in information retrieval. The key to this task is to address both heterogeneity and cross-modal semantic correlation between data of different modalities. Fine-grained matching methods can nicely model local semantic correlations between image and text but face two challenges. First, images may contain redundant information while text sentences often contain words without semantic meaning. Such redundancy interferes with the local matching between textual words and image regions. Furthermore, the retrieval shall consider not only low-level semantic correspondence between image regions and textual words but also a higher semantic correlation between different intra-modal relationships. We propose a multi-layer graph convolutional network with object-level, object-relational-level, and higher-level learning sub-networks. Our method learns hierarchical semantic correspondences by both local and global alignment. We further introduce a self-attention mechanism after the word embedding to weaken insignificant words in the sentence and a cross-attention mechanism to guide the learning of image features. Extensive experiments on Flickr30K and MS-COCO datasets demonstrate the effectiveness and superiority of our proposed method.

Journal Title
Conference Title

ICMR '22: Proceedings of the 2022 International Conference on Multimedia Retrieval

Book Title
Edition
Volume
Issue
Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement
Item Access Status
Note
Access the data
Related item(s)
Subject

Information systems

Persistent link to this record
Citation

Zeng, S; Liu, C; Zhou, J; Chen, Y; Jiang, A; Li, H, Learning Hierarchical Semantic Correspondences for Cross-Modal Image-Text Retrieval, ICMR '22: Proceedings of the 2022 International Conference on Multimedia Retrieval, 2022, pp. 239-248