Retaining Data from Streams of Social Platforms with Minimal Regret
File version
Version of Record (VoR)
Author(s)
Weidlich, Matthias
Duong, Chi Thang
Yin, Hongzhi
Nguyen, Henry
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)
Carles Sierra
Date
Size
File type(s)
Location
Melbourne, Vic, Australia
License
Abstract
Today's social platforms, such as Twitter and Facebook, continuously generate massive volumes of data. The resulting data streams exceed any reasonable limit for permanent storage, especially since data is often redundant, overlapping, sparse, and generally of low value. This calls for means to retain solely a small fraction of the data in an online manner. In this paper, we propose techniques to effectively decide which data to retain, such that the induced loss of information, the regret of neglecting certain data, is minimized. These techniques enable not only efficient processing of massive streaming data, but are also adaptive and address the dynamic nature of social media. Experiments on large-scale real-world datasets illustrate the feasibility of our approach in terms of both, runtime and information quality.
Journal Title
Conference Title
Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17)
Book Title
Edition
Volume
Issue
Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement
© 2017 International Joint Conference on Artificial Intelligence. The attached file is reproduced here in accordance with the copyright policy of the publisher. Please refer to the journal's website for access to the definitive, published version.
Item Access Status
Note
Access the data
Related item(s)
Subject
Database systems