Why Are There So Many Clustering Algorithms, and How Valid Are Their Results?

No Thumbnail Available
File version
Author(s)
Estivill-Castro, V
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)

H. Atmanspacher and S. Maasen

Date
2015
Size
File type(s)
Location
License
Abstract

Validity is a fundamental aspect of any machine learning approach. All the three types of current validity approaches (external, internal, and relative) have serious drawbacks and are computationally expensive. This chapter discusses why there are so many proposals for clustering algorithms and why they detach from approaches to validity. It presents a new approach that differs radically from the three families of validity approaches. The approach consists of translating the clustering validity problems to an assessment of the easiness of learning in the resulting supervised learning instances. The chapter shows that this idea meets formal principles of cluster quality measures, and thus the intuition inspiring approach has a solid theoretical foundation. In fact, it relates to the notion of reproducibility. Finally, the chapter demonstrates that the principle applies to crisp clustering algorithms and fuzzy clustering methods.

Journal Title
Conference Title
Book Title

Reproducibility: Principles, Problems, Practices, and Prospects

Edition
Volume
Issue
Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement
Item Access Status
Note
Access the data
Related item(s)
Subject

Pattern recognition

Persistent link to this record
Citation
Collections