Weighted Gibbs sampling for mixture modelling of massive datasets via coresets

No Thumbnail Available
File version
Author(s)
Mcgrory, CA
Ahfock, DC
Horsley, JA
Alston, CL
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)
Date
2014
Size
File type(s)
Location
License
Abstract

Massive datasets are increasingly encountered in modern research applications, and this presents tremendous new challenges for statisticians. In settings where the aim is to classify or cluster data via finite mixture modelling, such as in satellite image analysis, the large number data points to be analysed can make fitting such models either infeasible, or simply too time‐consuming to be of practical use. It has been shown that using a representative weighted subsample of the complete dataset to estimate mixture model parameters can lead to much more time‐efficient and yet still reasonable inference. These representative subsamples are called coresets. Naturally, these coresets have to be constructed carefully as the naive approach of performing simple uniform sampling from the dataset could lead to smaller clusters of points within the dataset being severely undersampled, and this would in turn result in very unreliable inference. It has previously been shown that an adaptive sampling approach can be used to obtain a representative coreset of data points together with a corresponding set of coreset weights. In this article, we explore how this idea can be incorporated into a Gibbs sampling algorithm for mixture modelling of image data via coresets within a Bayesian framework. We call the resulting algorithm a Weighted Gibbs Sampler. We will illustrate this proposed approach through an application to remote sensing of land use from satellite imagery.

Journal Title

Stat

Conference Title
Book Title
Edition
Volume

3

Issue

1

Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement
Item Access Status
Note
Access the data
Related item(s)
Subject

Statistics not elsewhere classified

Persistent link to this record
Citation
Collections