dc.contributor.author | Wang, W | |
dc.contributor.author | Wang, M | |
dc.contributor.author | Wang, S | |
dc.contributor.author | Long, G | |
dc.contributor.author | Yao, L | |
dc.contributor.author | Qi, G | |
dc.contributor.author | Chen, Y | |
dc.date.accessioned | 2021-06-10T05:05:25Z | |
dc.date.available | 2021-06-10T05:05:25Z | |
dc.date.issued | 2020 | |
dc.identifier.isbn | 9781577358350 | |
dc.identifier.issn | 2159-5399 | |
dc.identifier.doi | 10.1609/aaai.v34i07.6904 | |
dc.identifier.uri | http://hdl.handle.net/10072/405069 | |
dc.description.abstract | The aim of visual relation detection is to provide a comprehensive understanding of an image by describing all the objects within the scene, and how they relate to each other, in < object-predicate-object > form; for example, < person-lean on-wall > . This ability is vital for image captioning, visual question answering, and many other applications. However, visual relationships have long-tailed distributions and, thus, the limited availability of training samples is hampering the practicability of conventional detection approaches. With this in mind, we designed a novel model for visual relation detection that works in one-shot settings. The embeddings of objects and predicates are extracted through a network that includes a feature-level attention mechanism. Attention alleviates some of the problems with feature sparsity, and the resulting representations capture more discriminative latent features. The core of our model is a dual graph neural network that passes and aggregates the context information of predicates and objects in an episodic training scheme to improve recognition of the one-shot predicates and then generate the triplets. To the best of our knowledge, we are the first to center on the viability of one-shot learning for visual relation detection. Extensive experiments on two newly-constructed datasets show that our model significantly improved the performance of two tasks PredCls and SGCls from 2.8% to 12.2% compared with state-of-the-art baselines. | |
dc.description.peerreviewed | Yes | |
dc.language | English | |
dc.publisher | AAAI Press | |
dc.publisher.place | Palo Alto, Calif, United States | |
dc.relation.ispartofconferencename | AAAI 2020 - 34th AAAI Conference on Artificial Intelligence | |
dc.relation.ispartofconferencetitle | AAAI 2020 - 34th AAAI Conference on Artificial Intelligence | |
dc.relation.ispartofdatefrom | 2020-02-07 | |
dc.relation.ispartofdateto | 2020-02-12 | |
dc.relation.ispartoflocation | New York, United States | |
dc.relation.ispartofpagefrom | 12225 | |
dc.relation.ispartofpageto | 12232 | |
dc.relation.ispartofissue | 7 | |
dc.relation.ispartofvolume | 34 | |
dc.subject.fieldofresearch | Artificial intelligence | |
dc.subject.fieldofresearch | Nanotechnology | |
dc.subject.fieldofresearchcode | 4602 | |
dc.subject.fieldofresearchcode | 4018 | |
dc.title | One-shot learning for long-tail visual relation detection | |
dc.type | Conference output | |
dc.type.description | E1 - Conferences | |
dcterms.bibliographicCitation | Wang, W; Wang, M; Wang, S; Long, G; Yao, L; Qi, G; Chen, Y, One-shot learning for long-tail visual relation detection, AAAI 2020 - 34th AAAI Conference on Artificial Intelligence, 2020, 34 (07), pp. 12225-12232 | |
dc.date.updated | 2021-06-10T00:13:57Z | |
gro.hasfulltext | No Full Text | |
gro.griffith.author | Wang, Sen | |