Fast and Robust General Purpose Clustering Algorithms
Author(s)
Estivill-Castro, V
Yang, J
Griffith University Author(s)
Year published
2000
Metadata
Show full item recordAbstract
General purpose and highly applicable clustering methods are required for knowledge discovery. k-Means has been adopted as the prototype of iterative model-based clustering because of its speed, simplicity and capability to work within the format of very large databases. However, k-MEANS has several disadvantages derived from its statistical simplicity. We propose algorithms that remain very efficient, generally applicable, multidimensional but are more robust to noise and outliers. We achieve this by using medians rather than means as estimators of centers of clusters. Comparison with k-Means, EM and Gibbs sampling demonstrates ...
View more >General purpose and highly applicable clustering methods are required for knowledge discovery. k-Means has been adopted as the prototype of iterative model-based clustering because of its speed, simplicity and capability to work within the format of very large databases. However, k-MEANS has several disadvantages derived from its statistical simplicity. We propose algorithms that remain very efficient, generally applicable, multidimensional but are more robust to noise and outliers. We achieve this by using medians rather than means as estimators of centers of clusters. Comparison with k-Means, EM and Gibbs sampling demonstrates the advantages of our algorithms.
View less >
View more >General purpose and highly applicable clustering methods are required for knowledge discovery. k-Means has been adopted as the prototype of iterative model-based clustering because of its speed, simplicity and capability to work within the format of very large databases. However, k-MEANS has several disadvantages derived from its statistical simplicity. We propose algorithms that remain very efficient, generally applicable, multidimensional but are more robust to noise and outliers. We achieve this by using medians rather than means as estimators of centers of clusters. Comparison with k-Means, EM and Gibbs sampling demonstrates the advantages of our algorithms.
View less >
Conference Title
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume
1886