One-shot learning for long-tail visual relation detection

No Thumbnail Available
File version
Author(s)
Wang, W
Wang, M
Wang, S
Long, G
Yao, L
Qi, G
Chen, Y
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)
Date
2020
Size
File type(s)
Location

New York, United States

License
Abstract

The aim of visual relation detection is to provide a comprehensive understanding of an image by describing all the objects within the scene, and how they relate to each other, in < object-predicate-object > form; for example, < person-lean on-wall > . This ability is vital for image captioning, visual question answering, and many other applications. However, visual relationships have long-tailed distributions and, thus, the limited availability of training samples is hampering the practicability of conventional detection approaches. With this in mind, we designed a novel model for visual relation detection that works in one-shot settings. The embeddings of objects and predicates are extracted through a network that includes a feature-level attention mechanism. Attention alleviates some of the problems with feature sparsity, and the resulting representations capture more discriminative latent features. The core of our model is a dual graph neural network that passes and aggregates the context information of predicates and objects in an episodic training scheme to improve recognition of the one-shot predicates and then generate the triplets. To the best of our knowledge, we are the first to center on the viability of one-shot learning for visual relation detection. Extensive experiments on two newly-constructed datasets show that our model significantly improved the performance of two tasks PredCls and SGCls from 2.8% to 12.2% compared with state-of-the-art baselines.

Journal Title
Conference Title

AAAI 2020 - 34th AAAI Conference on Artificial Intelligence

Book Title
Edition
Volume

34

Issue

7

Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement
Item Access Status
Note
Access the data
Related item(s)
Subject

Artificial intelligence

Nanotechnology

Persistent link to this record
Citation

Wang, W; Wang, M; Wang, S; Long, G; Yao, L; Qi, G; Chen, Y, One-shot learning for long-tail visual relation detection, AAAI 2020 - 34th AAAI Conference on Artificial Intelligence, 2020, 34 (07), pp. 12225-12232