Graphical diagnostics for classification trees using asymmetric penalties on misclassification
MetadataShow full item record
Classification trees are powerful modelling tools, which are widely applied in several disciplines. Despite their popularity, there are few diagnostic methods available for evaluating their performance. Many extensions have focused on improvements of predictive performance by combining many models via model averaging, such as boosting and bagging. Boosting is widely used to improve the performance of many algorithms. It consists of learning several weak classifiers to build a final strong classifier. Another popular method, used to improve performance, is bootstrap aggregating, also known as bagging, which is another special case of model averaging. In bagging, several training data sets are randomly sampled from the data with replacement, and for each of those sets, the same classifier is applied. The prediction of new data is obtained by averaging predictions from individual models. Model averaging approaches like this perform well for prediction but not so for interpretation, as they do not provide a single model that can be used to explain the relationships among variables. Here, we consider graphical diagnostics to support selection of one single model for explanatory purposes. In addition, predictive performance of a single model typically presumes that all kinds of misclassification are equal. In our example, misclassifying pest presence could be devastating when the aim is early detection. In contrast, misclassifying pest absence would be problematic if aiming to claim that an area is free of a pest. Of particular interest in the motivating case study, is how to improve the model by applying different penalties for misclassification of each class. This work proposes a new set of diagnostics, specifically for evaluating classification trees, their predictive performance and sensitivity to misclassification penalties. Such diagnostics can only be applied when the algorithm for fitting a classification tree adopts criteria that allow asymmetric misclassification penalties. One example is recursive partitioning with penalties (“loss” matrix) implemented in rpart in R. The use of penalties in constructing classification tree models appears to be a feature that is little used in practice. We suspect it is because there are no diagnostics readily available to examine sensitivity to value assigned to those penalties. In contrast, the use of graphical diagnostics for sensitivity analysis is a common practice in many types of analysis, such as cluster and factor analysis. Here we develop and present a new graphical approach for diagnostics of a single classification tree fitted using recursive partitioning, where both the goodness-of-fit criteria and the threshold for classification are weighted by penalties for misclassifying each class. Our method exploits detailed information provided with the results from fitting a tree: node height and change in height, which represent the amount of information added to the model and improvement in fit gained by each split, respectively. We also define new measures of fit that are of particular interest when penalising classes, which measure how well classes are separated, and the number and size of ‘pure’ nodes that perfectly predict each class. In this paper we demonstrate how to use these new graphical diagnostics, for a plant biosecurity case study to describe potential distribution model for a pest, the Russian Wheat Aphid (RWA). We show that these new graphical diagnostics for tree reveal insights that would not be evident otherwise. Using a high penalty on false negative misclassification, it was possible to identify factors (such as precipitation in July, and Temperature seasonality) that corresponded to large groups of reported absences (pure nodes on the tree). Using the diagnostic, we were able to evaluate sensitivity to the magnitude of the penalty. With small penalties, only a few pure absence nodes were identified. With larger penalties, the fit deteriorated.
22nd International Congress on Modelling and Simulation: Managing cumulative risks through model-based processes (MODSIM2017)
© 2017 Modellling & Simulation Society of Australia & New Zealand. The attached file is reproduced here in accordance with the copyright policy of the publisher. For information about this conference please refer to the conference’s website or contact the author(s).