Cluster-oriented instance selection for classification problems

Loading...
Thumbnail Image
File version

Accepted Manuscript (AM)

Author(s)
Saha, Soumitra
Sarker, Partho Sarathi
Al Saud, Alam
Shatabda, Swakkhar
Newton, MA Hakim
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)
Date
2022
Size
File type(s)
Location
Abstract

More training instances could lead to better classification accuracy. However, accuracy could also degrade if more training instances mean further noises and outliers. Additional training instances arguably need additional computational resources in future data mining operations. Instance selection algorithms identify subsets of training instances that could desirably increase accuracy or at least do not decrease accuracy significantly. There exist many instance selection algorithms, but no single algorithm, in general, dominates the others. Moreover, existing instance selection algorithms do not allow direct controlling of the instance selection rate. In this paper, we present a simple and generic cluster-oriented instance selection algorithm for classification problems. Our proposed algorithm runs an unsupervised K Means Clustering algorithm on the training instances and with a given selection rate, selects instances from the centers and the borders of the clusters. On 24 benchmark classification problems, when very similar percentages of instances are selected by various instance selection algorithms, K Nearest Neighbours classifiers achieve more than 2%–3% better accuracy when using instances selected by our proposed method than when using those selected by other state-of-the-art generic instance selection algorithms.

Journal Title

Information Sciences

Conference Title
Book Title
Edition
Volume

602

Issue
Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement

© 2022, Elsevier. Licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Licence (http://creativecommons.org/licenses/by-nc-nd/4.0/) which permits unrestricted, non-commercial use, distribution and reproduction in any medium, providing that the work is properly cited.

Item Access Status
Note
Access the data
Related item(s)
Subject

Information and computing sciences

Science & Technology

Technology

Computer Science, Information Systems

Computer Science

Instance selection

Persistent link to this record
Citation

Saha, S; Sarker, PS; Al Saud, A; Shatabda, S; Newton, MAH, Cluster-oriented instance selection for classification problems, Information Sciences, 2022, 602, pp. 143-158

Collections