Clustering Web Visitors by Fast, Robust and Convergent Algorithms
File version
Author(s)
Yang, J
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)
O H Ibarra
Date
Size
File type(s)
Location
License
Abstract
The identification of categories of visitors to a Web-site is very useful towards improved Web designs and improved Web applications. However, the large volume involved in mining access-logs and visitation paths, and the uncertainty to fully identify the visitor demand efficient clustering algorithms that are also resistant to noise and outliers. Also, visitation paths are discrete, and dissimilarity between visitation paths involves sophisticated evaluation and results in attribute-vectors with large dimension. We provide randomized, iterative clustering algorithms for generic dissimilarity in paths. Our algorithms are robust because they use medians rather than means as estimators of location, and the resulting representative of a cluster is actually a path in the data set. We demonstrate mathematically that our algorithms converge and have subquadratic complexity. We also show experimentally that they are resistant to noise by recovering clusters from synthetic data generated by a mixture of distributions of paths in a graph. Our non-crisp method proposed generalizes approaches that allow a data item to have a degree of membership in a cluster.
Journal Title
International Journal of Foundations of Computer Science
Conference Title
Book Title
Edition
Volume
13
Issue
4
Thesis Type
Degree Program
School
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement
Item Access Status
Note
Access the data
Related item(s)
Subject
Information and computing sciences