Cluster-oriented instance selection for classification problems
File version
Accepted Manuscript (AM)
Author(s)
Sarker, Partho Sarathi
Al Saud, Alam
Shatabda, Swakkhar
Newton, MA Hakim
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)
Date
Size
File type(s)
Location
Abstract
More training instances could lead to better classification accuracy. However, accuracy could also degrade if more training instances mean further noises and outliers. Additional training instances arguably need additional computational resources in future data mining operations. Instance selection algorithms identify subsets of training instances that could desirably increase accuracy or at least do not decrease accuracy significantly. There exist many instance selection algorithms, but no single algorithm, in general, dominates the others. Moreover, existing instance selection algorithms do not allow direct controlling of the instance selection rate. In this paper, we present a simple and generic cluster-oriented instance selection algorithm for classification problems. Our proposed algorithm runs an unsupervised K Means Clustering algorithm on the training instances and with a given selection rate, selects instances from the centers and the borders of the clusters. On 24 benchmark classification problems, when very similar percentages of instances are selected by various instance selection algorithms, K Nearest Neighbours classifiers achieve more than 2%–3% better accuracy when using instances selected by our proposed method than when using those selected by other state-of-the-art generic instance selection algorithms.
Journal Title
Information Sciences
Conference Title
Book Title
Edition
Volume
602
Issue
Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement
© 2022, Elsevier. Licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Licence (http://creativecommons.org/licenses/by-nc-nd/4.0/) which permits unrestricted, non-commercial use, distribution and reproduction in any medium, providing that the work is properly cited.
Item Access Status
Note
Access the data
Related item(s)
Subject
Information and computing sciences
Science & Technology
Technology
Computer Science, Information Systems
Computer Science
Instance selection
Persistent link to this record
Citation
Saha, S; Sarker, PS; Al Saud, A; Shatabda, S; Newton, MAH, Cluster-oriented instance selection for classification problems, Information Sciences, 2022, 602, pp. 143-158