Clustering Web Visitors by Fast, Robust and Convergent Algorithms

No Thumbnail Available
File version
Author(s)
Estivill-Castro, V
Yang, J
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)

O H Ibarra

Date
2002
Size
File type(s)
Location
License
Abstract

The identification of categories of visitors to a Web-site is very useful towards improved Web designs and improved Web applications. However, the large volume involved in mining access-logs and visitation paths, and the uncertainty to fully identify the visitor demand efficient clustering algorithms that are also resistant to noise and outliers. Also, visitation paths are discrete, and dissimilarity between visitation paths involves sophisticated evaluation and results in attribute-vectors with large dimension. We provide randomized, iterative clustering algorithms for generic dissimilarity in paths. Our algorithms are robust because they use medians rather than means as estimators of location, and the resulting representative of a cluster is actually a path in the data set. We demonstrate mathematically that our algorithms converge and have subquadratic complexity. We also show experimentally that they are resistant to noise by recovering clusters from synthetic data generated by a mixture of distributions of paths in a graph. Our non-crisp method proposed generalizes approaches that allow a data item to have a degree of membership in a cluster.

Journal Title

International Journal of Foundations of Computer Science

Conference Title
Book Title
Edition
Volume

13

Issue

4

Thesis Type
Degree Program
School
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement
Item Access Status
Note
Access the data
Related item(s)
Subject

Information and computing sciences

Persistent link to this record
Citation
Collections