Changes in the model of within-cluster distribution of attributes and their effects on cluster analysis of vegetation data
In previous studies a minimum message length fuzzy clustering method was applied to vegetation data and shown to give sensible estimates for the number of clusters as well as consistent estimates of cluster parameters. The minimum message length method provides a principled method of choosing between models and between classes of models. It comprises 2 components; one coding the model and its associated (meta)parameter values, the other coding the data, given the model. The program uses uncorrelated Gaussian distributions as a model for the distribution of attributes within clusters. This assumption may not be acceptable and in this paper a more general model, the t-distribution, has been examined. The t-distribution provides a class of thick-tailed models, while including the Gaussian as a subclass. This should be appropriate in hierarchical clustering where, even if the final clusters had internal Gaussian distributions, the upper levels would not. In addition, it may provide a better model of within-cluster distribution of the attributes even in the final clusters. Although forcing the use of t-distributions was not profitable, allowing a choice between Gaussian and t-distributions for each attribute in each class resulted in improved results. This was despite only one attribute actually selecting the t-distribution over the Gaussian.