CARBO: Clustering and rotation based oversampling for class imbalance learning

No Thumbnail Available
File version
Author(s)
Paul, MK
Pal, B
Sattar, AHMS
Siddique, ASMMR
Hasan, MAM
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)
Date
2024
Size
File type(s)
Location
License
Abstract

Class imbalance of a data set is a crucial problem in machine learning where one class significantly outnumbers others. In such a data set, classification is a troublesome task for the standard classification algorithms, leading to bias towards the majority class. Different methods have been developed so far, such as oversampling, undersampling, and cost-sensitive learning, to deal with class imbalance circumstances. Among these techniques, oversampling technique does not suffer from the information loss and critical cost selection challenges. However, appropriate synthetic sample generation can be challenging and vulnerable to privacy leakage. This research proposed an oversampling technique, called CARBO, using threshold-based geometric rotation and majority class influenced clustering. Unlike the existing resampling approaches to class imbalance problem, we contribute to consider the data privacy and optimal sample generation together for effective oversampling. The performance of CARBO is evaluated using 44 benchmark imbalanced data set. The empirical analysis elucidates that CARBO can make boosting-based C4.5 ensemble classifiers perform higher for 73% of the data set than six state-of-the-art approaches. In addition, the theoretical compatibility analysis of CARBO demonstrates its robustness.

Journal Title

Knowledge-Based Systems

Conference Title
Book Title
Edition
Volume

300

Issue
Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement
Item Access Status
Note
Access the data
Related item(s)
Subject
Persistent link to this record
Citation

Paul, MK; Pal, B; Sattar, AHMS; Siddique, ASMMR; Hasan, MAM, CARBO: Clustering and rotation based oversampling for class imbalance learning, Knowledge-Based Systems, 2024, 300, pp. 112196

Collections