Weighted Gibbs sampling for mixture modelling of massive datasets via coresets
MetadataShow full item record
Massive datasets are increasingly encountered in modern research applications, and this presents tremendous new challenges for statisticians. In settings where the aim is to classify or cluster data via finite mixture modelling, such as in satellite image analysis, the large number data points to be analysed can make fitting such models either infeasible, or simply too time‐consuming to be of practical use. It has been shown that using a representative weighted subsample of the complete dataset to estimate mixture model parameters can lead to much more time‐efficient and yet still reasonable inference. These representative subsamples are called coresets. Naturally, these coresets have to be constructed carefully as the naive approach of performing simple uniform sampling from the dataset could lead to smaller clusters of points within the dataset being severely undersampled, and this would in turn result in very unreliable inference. It has previously been shown that an adaptive sampling approach can be used to obtain a representative coreset of data points together with a corresponding set of coreset weights. In this article, we explore how this idea can be incorporated into a Gibbs sampling algorithm for mixture modelling of image data via coresets within a Bayesian framework. We call the resulting algorithm a Weighted Gibbs Sampler. We will illustrate this proposed approach through an application to remote sensing of land use from satellite imagery.
Statistics not elsewhere classified