A model for correlation within clusters and its use in pollen analysis

No Thumbnail Available
File version
Author(s)
Dale, M
Allison, L
Dale, P
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)

J. Podani

Date
2010
Size
File type(s)
Location
License
Abstract

Many methods of cluster analysis do not explicitly account for correlation between attributes. In this paper we explicitly model any correlation using a single factor within each cluster: i.e., the correlation of atributes within each cluster is adequately described by a single component axis. However, the use of a factor is not required in every cluster. Using a Minimum Message Length criterion, we can determine the number of clusters and also whether the model of any cluster is improved by introducing a factor. The technique allows us to seek clusters which reflect directional changes rather than imposing a zonation constrained by spatial (and implicitly temporal) position. Minimal message length is a means of utilising Okham's Razor in inductive analysis. The 'best' model is that which allows most compression of the data, which results in a minimal message length for the description. Fit to the data is not a sufficient criterion for choosing models because more complicated models will almost always fit better. Minimum message length combines fit to the data with an encoding of the model and provides a Bayesian probability criterion as a means of choosing between models (and classes of model). Applying the analysis to a pollen diagram from Southern Chile, we find that the introduction of factors does not improve the overall quality of the mixture model. The solution without axes in any cluster provides the most parsimonious solution. Examining the cluster with the best case for a factor to be incorporated in its description shows that the attributes highly loaded on the axis represent a contrast of herbaceous vegetation and dominant forests types. This contrast is also found when fitting the entire population, and in this case the factor solution is the preferred model. Overall, the cluster solution without factors is much preferred. Thus, in this case classification is preferred to ordination although more data are desirable to confirm such a conclusion.

Journal Title

Community Ecology

Conference Title
Book Title
Edition
Volume

11

Issue

1

Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement
Item Access Status
Note
Access the data
Related item(s)
Subject

Ecology

Community ecology (excl. invasive species ecology)

Persistent link to this record
Citation
Collections