Spatio-temporal modelling of dengue fever cases in Saudi Arabia using socio-economic, climatic and environmental factors
File version
Author(s)
Shukla, Nagesh
Pradhan, Biswajeet
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)
Date
Size
File type(s)
Location
License
Abstract
Dengue Fever (DF) is a common vector-borne disease with catastrophic health implications. DF prediction modelling is a challenging task, although technologies such as Geographical Information Systems (GIS) and spatial statistics have improved our understanding of dengue dynamics. In this paper, we create a robust data analysis model to (i) provide a better understanding of confirmed dengue fever cases despite missing data, (ii) obtain better insights into risk factors associated with confirmed cases, and (iii) by means of machine learning, create clusters of patients with comparable characteristics. The last was accomplished with a self-organizing feature map (SOFM) and the density-based spatial clustering of applications with noise (DBSCAN). The approaches used to classify confirmed cases were: Decision Tree, k-nearest neighbours, Random Forest, AdaBoost, Support Vector Classification (SVC), CatBoost, and Naive Bayes. The CatBoost classifier achieved the best accuracy for the analysis of confirmed cases. Spatial analysis was conducted using the ordinary least square (OLS) and geographically weighted regression (GWR) models to identify high-risk areas. SOM can group patients with similar features into clusters, then DBSCAN detects and retrieves six clusters from this data. The clustering of confirmed cases increases CatBoost’s modelling prediction accuracy and reveals complex factors that influence prediction accuracy. Because confirmed cases in each cluster have different features, CatBoost is applied to each cluster individually to improve the prediction accuracy. Variable values in each cluster are analysed to clarify the confirmed cases of a specific subset of DF incidents. Overall, OLS outperforms GWR when identifying hotspot areas. The proposed novel, data-driven and machine-learning-based strategy facilitates the understanding and identification of patterns associated with confirmed DF cases. The study's findings can be utilized to cluster historical patient data into groups or subgroups sharing similar variables. Using identifiable patient clusters rather than raw history data improves the model accuracy provided by CatBoost.
Journal Title
Geocarto International
Conference Title
Book Title
Edition
Volume
Issue
Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement
Item Access Status
Note
Access the data
Related item(s)
Subject
Transportation, logistics and supply chains
Geospatial information systems and geospatial data modelling
Science & Technology
Life Sciences & Biomedicine
Physical Sciences
Technology
Environmental Sciences
Persistent link to this record
Citation
Siddiq, A; Shukla, N; Pradhan, B, Spatio-temporal modelling of dengue fever cases in Saudi Arabia using socio-economic, climatic and environmental factors, Geocarto International, 2022