Learning multi-view visual correspondences with self-supervision
File version
Author(s)
Zhou, Lei
Bai, Xiao
Wang, Chen
Zhou, Jun
Zhang, Liang
Zheng, Jin
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)
Date
Size
File type(s)
Location
License
Abstract
Stereo-based 3D reconstruction requires to match features across images captured from slightly different viewing angles to recover 3D coordinates of the image pixels. Despite the workload of collecting data, annotating matched pixels requires also heavy labor. As recent researches for self-supervised representation learning has gained great progress, learning multi-view visual correspondences from large scale raw videos serves as an alternative. However, existing methods which benefit from contrastive learning tend to neglect false negative samples when matching between adjacent frames in a video, leading to sub-optimal optimization for visual features. In this paper, we propose a contrastive learning framework that construct self-supervision by semi-global visual correspondence to alleviate learning degradation when false negatives are involved in training. Our learning framework consists of pixel-level contrastive learning via patch reconstruction and patch-level contrastive learning cross videos. We also introduce saliency guidance to extract salient regions from video frames to further reduce potential false negatives. By optimizing the model with the proposed semi-global contrastive learning method, learned representations are forced to be discriminative and robust. Experiments demonstrate that our proposed method outperforms previous self-supervised methods on video object segmentation tasks. Moreover, when compared to fully-supervised algorithms designed for specific tasks, our proposed method also achieves competitive results.
Journal Title
Displays
Conference Title
Book Title
Edition
Volume
72
Issue
Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement
Item Access Status
Note
Access the data
Related item(s)
Subject
Science & Technology
Physical Sciences
Computer Science, Hardware & Architecture
Engineering, Electrical & Electronic
Persistent link to this record
Citation
Zhang, P; Zhou, L; Bai, X; Wang, C; Zhou, J; Zhang, L; Zheng, J, Learning multi-view visual correspondences with self-supervision, Displays, 2022, 72, pp. 102160