Show simple item record

dc.contributor.authorWang, W
dc.contributor.authorWang, M
dc.contributor.authorWang, S
dc.contributor.authorLong, G
dc.contributor.authorYao, L
dc.contributor.authorQi, G
dc.contributor.authorChen, Y
dc.date.accessioned2021-06-10T05:05:25Z
dc.date.available2021-06-10T05:05:25Z
dc.date.issued2020
dc.identifier.isbn9781577358350
dc.identifier.issn2159-5399
dc.identifier.doi10.1609/aaai.v34i07.6904
dc.identifier.urihttp://hdl.handle.net/10072/405069
dc.description.abstractThe aim of visual relation detection is to provide a comprehensive understanding of an image by describing all the objects within the scene, and how they relate to each other, in < object-predicate-object > form; for example, < person-lean on-wall > . This ability is vital for image captioning, visual question answering, and many other applications. However, visual relationships have long-tailed distributions and, thus, the limited availability of training samples is hampering the practicability of conventional detection approaches. With this in mind, we designed a novel model for visual relation detection that works in one-shot settings. The embeddings of objects and predicates are extracted through a network that includes a feature-level attention mechanism. Attention alleviates some of the problems with feature sparsity, and the resulting representations capture more discriminative latent features. The core of our model is a dual graph neural network that passes and aggregates the context information of predicates and objects in an episodic training scheme to improve recognition of the one-shot predicates and then generate the triplets. To the best of our knowledge, we are the first to center on the viability of one-shot learning for visual relation detection. Extensive experiments on two newly-constructed datasets show that our model significantly improved the performance of two tasks PredCls and SGCls from 2.8% to 12.2% compared with state-of-the-art baselines.
dc.description.peerreviewedYes
dc.languageEnglish
dc.publisherAAAI Press
dc.publisher.placePalo Alto, Calif, United States
dc.relation.ispartofconferencenameAAAI 2020 - 34th AAAI Conference on Artificial Intelligence
dc.relation.ispartofconferencetitleAAAI 2020 - 34th AAAI Conference on Artificial Intelligence
dc.relation.ispartofdatefrom2020-02-07
dc.relation.ispartofdateto2020-02-12
dc.relation.ispartoflocationNew York, United States
dc.relation.ispartofpagefrom12225
dc.relation.ispartofpageto12232
dc.relation.ispartofissue7
dc.relation.ispartofvolume34
dc.subject.fieldofresearchArtificial intelligence
dc.subject.fieldofresearchNanotechnology
dc.subject.fieldofresearchcode4602
dc.subject.fieldofresearchcode4018
dc.titleOne-shot learning for long-tail visual relation detection
dc.typeConference output
dc.type.descriptionE1 - Conferences
dcterms.bibliographicCitationWang, W; Wang, M; Wang, S; Long, G; Yao, L; Qi, G; Chen, Y, One-shot learning for long-tail visual relation detection, AAAI 2020 - 34th AAAI Conference on Artificial Intelligence, 2020, 34 (07), pp. 12225-12232
dc.date.updated2021-06-10T00:13:57Z
gro.hasfulltextNo Full Text
gro.griffith.authorWang, Sen


Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)

  • Conference outputs
    Contains papers delivered by Griffith authors at national and international conferences.

Show simple item record