Multiscale Representations Learning Transformer Framework for Point Cloud Classification
File version
Author(s)
Zia, A
Zhou, J
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)
Date
Size
File type(s)
Location
Kuala Lumpur, Malaysia
License
Abstract
Extracting and aggregating multiple feature representations from various scales have become the key to point cloud classification tasks. Vision Transformer (ViT) is a representative solution along this line, but it lacks the capability to model detailed multi-scale features and their interactions. In addition, learning efficient and effective representation from the point cloud is challenging due to its irregular, unordered, and sparse nature. To tackle these problems, we propose a novel multi-scale representation learning transformer framework, employing various geometric features beyond common Cartesian coordinates. Our approach enriches the description of point clouds by local geometric relationships and group them at multiple scales. This scale information is aggregated and then new patches can be extracted to minimize feature overlay. The bottleneck projection head is then adopted to enhance the information and feed all patches to the multi-head attention to capture the deep dependencies among representations across patches. Evaluation on public benchmark datasets shows the competitive performance of our framework on point cloud classification.
Journal Title
Conference Title
2023 IEEE International Conference on Image Processing (ICIP)
Book Title
Edition
Volume
Issue
Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement
Item Access Status
Note
Access the data
Related item(s)
Subject
Data structures and algorithms
Image processing
Persistent link to this record
Citation
Sun, Y; Zia, A; Zhou, J, Multiscale Representations Learning Transformer Framework for Point Cloud Classification, 2023 IEEE International Conference on Image Processing (ICIP), 2023, pp. 3354-3358