Why Are There So Many Clustering Algorithms, and How Valid Are Their Results?
Validity is a fundamental aspect of any machine learning approach. All the three types of current validity approaches (external, internal, and relative) have serious drawbacks and are computationally expensive. This chapter discusses why there are so many proposals for clustering algorithms and why they detach from approaches to validity. It presents a new approach that differs radically from the three families of validity approaches. The approach consists of translating the clustering validity problems to an assessment of the easiness of learning in the resulting supervised learning instances. The chapter shows that this idea meets formal principles of cluster quality measures, and thus the intuition inspiring approach has a solid theoretical foundation. In fact, it relates to the notion of reproducibility. Finally, the chapter demonstrates that the principle applies to crisp clustering algorithms and fuzzy clustering methods.
Reproducibility: Principles, Problems, Practices, and Prospects
Pattern Recognition and Data Mining