Sensitivity Analysis to Configuration Option Settings in a Selection of Species Distribution Modelling Algorithms
MetadataShow full item record
In pursuit of a more robust provenance in the field of species distribution modelling, an extensive literature search was undertaken to find the typical default values, and the range of values, for configuration settings of a number of the most commonly used statistical algorithms available for constructing species distribution models (SDM), as implemented in the R script packages (such as Dismo and Biomod2) or other species distribution modelling programs like Maxent. We found that documentation of SDM algorithm configuration option settings in the SDM literature is very uncommon, and the justifications for these settings were minimal, when present. Such settings were often the R default values, or were the result of trial and error. This is potentially concerning for a number of reasons; it detracts from the robustness of the provenance for such SDM studies; a lack of documentation of configuration option settings in a paper prevents the replication of an experiment, which contravenes one of the main tenets of the scientific method. Inappropriate or uninformed configuration option settings are particularly concerning if they represent a poorly understood ecological variable or process, and if the algorithm is sensitive to such settings; this could result in erroneous and/or unrealistic SDMs. We test the sensitivity of two commonly used SDM algorithms to variation in configuration options settings: Random Forests and Boosted Regression Trees. A process of expert elicitation was used to derive a range of appropriate values with which to test the sensitivity of our algorithms. We chose to use species occurrence records for the Koala (Phascolartos cinereus) for our sensitivity tests, since the species has a well known distribution. Results were assessed by comparing the geospatial distribution from each sensitivity test (i.e. altered-settings) SDM for differences compared to the control SDM (i.e. default settings), using geographical information systems (QGIS). In addition, two performance measures were used to compare differences among the altered-setting SDMs to the control. The aim of our study was to be able to draw conclusions as to how reliable reported SDM results may be in light of the sensitivity of their algorithms to certain settings, given the often arbitrary nature of such settings, and the lack of awareness of, and/or attendance to this issue in most of the published SDM literature. Our results indicate that all two algorithms tested showed sensitivity to alternate values for some of their settings. Therefore this study has showed that the choice of configuration option settings in Random Forests and Boosted Regression Trees has an impact on the results, and that assigning suitable values for these settings is a relevant consideration and as such should be always published along with the model.
22nd International Congress on Modelling and Simulation: Managing cumulative risks through model-based processes (MODSIM2017)
© 2017 Modellling & Simulation Society of Australia & New Zealand. The attached file is reproduced here in accordance with the copyright policy of the publisher. For information about this conference please refer to the conference’s website or contact the author(s).
Environmental Sciences not elsewhere classified