Experimenting with Modelling via a Virtual Laboratory: Evaluating pseudo-absence strategies to refine a species distribution model
MetadataShow full item record
Virtual laboratories (VLs) are fast becoming realities in many fields of enquiry. For instance, the Biodiversity and Climate Change Virtual Laboratory (BCCVL) provides users with a high-performance computational platform to enable more efficient investigation of biological systems. This kind of VL is more than a mere portal to dispersed data sources and a diverse range of modelling options; it also reduces computational overheads and tedium required to implement models. In this way, a VL allows users to explore each model to more fully apply scientific method in model development. Here we explore how the BCCVL can be used to support an iterative process of investigating and refining models, through experimentation. The BCCVL supports many kinds of modelling for biodiversity, measured by species presence, traits or aggregate measures such as species richness. Here we narrow our focus to consider species distribution modelling (SDM), and in particular, the source of absence data. Absences in SDM provide a useful case study for exploring models in VLs, as there are many potential settings, known to substantially impact SDM results. When absence of the species has not been explicitly recorded, several strategies are available to impute ‘pseudo-absences’. New users may inadvertently specify pseudo-absences in a way that leads to issues such as ‘naughty noughts’ or pseudo-replication. It is possible to identify those issues during SDM, and this process can be accelerated through a VL. Additionally after initial exploration in a VL, it is easy to export data for analysis into a statistical package, such as R, and continue to refine SDMs. Here we show how the SDM for the Golden bowerbird is sensitive to the strategy for generating pseudoabsences, as defined by settings that can be altered within the BCCVL. A sequence of well-defined experiments gradually helps refine the options defining this strategy. We begin with the study region, which implicitly delimits search effort, and potentially defined by: the continent, a bioregion or a convex hull delimited by the farthest occurrences. At the same time BCCVL makes it easy to compare SDM algorithms. We consider regression (GLM), tree (CTA) and machine learning (MaxEnt) algorithms. Next we undertake separate experiments to further refine selection of pseudo-absences. The sampling strategy may be: completely random; constrained by a disc centred at occurrences; or defined by a Surface Range Envelope, comprising locations that fall outside the usual range of predictors evaluated at occurrences. In comparison to the number of occurrences, the intensity of pseudo-absences may be set to be equal or any other ratio. We export model results for out-of-VL analysis, and apply recursive partitioning trees in R to investigate naughty noughts. The Golden bowerbird is similar to many specialist species in Australia: generating pseudo-absences across the continent gave a large contrast between occurrence and absence, as evidenced by the distribution of predicted probability of presences. Constraining pseudo-absences to a bioregion, we were able to choose an SDM algorithm that permitted examination of gradients from absence to presence, whilst retaining high accuracy. Further experimentation assessed sensitivity to the sampling strategy of pseudo-absences, with a good option being a 10:1 sampling ratio at least 10km from occurrences. Exporting these pseudo-absences to R, tree modelling identified uninhabited climates (with high mean temperature of the warmest quarter). When omitted, the estimates of climate effects on this species’ presence were greatly sharpened. This demonstrates how a VL may be used to refine modelling, evaluating sensitivity to settings via performance measures relevant at each stage. In this case the choice of pseudo-absence strategy to support SDM for the Golden bowerbird might have been discarded using a ‘one-off’ modelling approach that focussed on a single indicator.
22nd International Congress on Modelling and Simulation: Managing cumulative risks through model-based processes (MODSIM2017)
© 2017 Modellling & Simulation Society of Australia & New Zealand. The attached file is reproduced here in accordance with the copyright policy of the publisher. For information about this conference please refer to the conference’s website or contact the author(s).