SupCon-ViT: Supervised contrastive learning for ultra-fine-grained visual categorization

No Thumbnail Available
File version
Author(s)
Lu, Xiaowei
Yu, Xiaohan
Wang, Kanqi
Wang, Ying
Wang, Peiyu
Liu, Gang
Zhao, Yang
Xiang, Yunhui
Gao, Yongsheng
Wu, Xiaoyu
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)
Date
2023
Size
File type(s)
Location

Port Macquarie, Australia

License
Abstract

With the increasing availability of datasets exhibiting fine granularity and subtle differences between categories, fine-grained visual categorization tasks have gained significant attention across various domains. However, the focus often lies solely on overall dataset performance metrics such as top-l accuracy, while lacking a comprehensive understanding of the underlying factors. This paper addresses this gap by presenting a detailed analysis of the CUB-200-2011 dataset through extensive experiments. We identify and investigate specific ultra-fine-grained subsets that significantly impact the overall accuracy of the dataset. To enhance the performance of ultra-fine-grained visual classification, we propose SupCon-ViT, an ultra-fine-grained visual categorization network based on supervised contrastive learning. The key component of our approach is a supervised contrastive learning module, which effectively guides the network to learn discriminative local features within samples. This is accomplished by continuously pulling closer the normalized embeddings from the same class and pushing away embeddings from different classes. As a result, our approach achieves discriminative local representations, leading to improved network classification performance. Experimental results demonstrate the effectiveness of our proposed method on four ultra-fine-grained subsets of the CUB dataset. Notably, our approach achieves significant performance improvements without requiring additional expert information during training. This work contributes to the broader understanding of fine-grained visual categorization and offers a practical solution to enhance the accuracy of ultrafine-grained visual classification tasks.The code is available at https://github.comnucinda01ove/SupCon-ViT-pytorch.

Journal Title
Conference Title

2023 International Conference on Digital Image Computing: Techniques and Applications (DICTA)

Book Title
Edition
Volume
Issue
Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement
Item Access Status
Note
Access the data
Related item(s)
Subject

Nanotechnology

Computational imaging

Computer vision and multimedia computation

Persistent link to this record
Citation

Lu, X; Yu, X; Wang, K; Wang, Y; Wang, P; Liu, G; Zhao, Y; Xiang, Y; Gao, Y; Wu, X, SupCon-ViT: Supervised contrastive learning for ultra-fine-grained visual categorization, 2023 International Conference on Digital Image Computing: Techniques and Applications (DICTA), 2023, pp. 281-288