Word Mover’s Distance for Agglomerative Short Text Clustering

No Thumbnail Available
File version
Author(s)
Franciscus, N
Ren, X
Wang, J
Stantic, B
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)

Nguyen, NT

Gaol, FL

Hong, TP

Trawinski, B

Date
2019
Size
File type(s)
Location

Yogyakarta, Indonesia

License
Abstract

In the era of information overload, text clustering plays an important part in the analysis processing pipeline. Partitioning high-quality texts into unseen categories tremendously helps applications in information retrieval, databases, and business intelligence domains. Short texts from social media environment such as tweets, however, remain difficult to interpret due to the broad aspects of contexts. Traditional text similarity approaches only rely on the lexical matching while ignoring the semantic meaning of words. Recent advances in distributional semantic space have opened an alternative approach in utilizing high-quality word embeddings to aid the interpretation of text semantics. In this paper, we investigate the word mover’s distance metrics to automatically cluster short text using the word semantic information. We utilize the agglomerative strategy as the clustering method to efficiently group texts based on their similarity. The experiment indicates the word mover’s distance outperformed other standard metrics in the short text clustering task.

Journal Title
Conference Title

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Book Title
Edition
Volume

11431

Issue
Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement
Item Access Status
Note
Access the data
Related item(s)
Subject

Artificial intelligence

Information and computing sciences

Science & Technology

Computer Science, Artificial Intelligence

Computer Science, Information Systems

Computer Science, Theory & Methods

Persistent link to this record
Citation

Franciscus, N; Ren, X; Wang, J; Stantic, B, Word Mover’s Distance for Agglomerative Short Text Clustering, Intelligent Information and Database Systems , 2019, 11431, pp. 128-139