Beyond Comparing Machine Learning and Logistic Regression in Clinical Prediction Modelling: Shifting from Model Debate to Data Quality
File version
Version of Record (VoR)
Author(s)
Zhang, Xin
Slavin, Valerie
Belsti, Yitayeh
Tiruneh, Sofonyas Abebaw
Callander, Emily
Enticott, Joanne
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)
Date
Size
File type(s)
Location
Abstract
The rapid uptake of supervised machine learning (ML) in clinical prediction modelling, particularly for binary outcomes based on tabular data, has sparked debate about its comparative advantage over traditional statistical logistic regression. Although ML has demonstrated superiority in unstructured data domains, its performance gains in structured, tabular clinical datasets remain inconsistent and context dependent. This viewpoint synthesizes recent comparative studies and simulation findings to argue that there is no universal best modelling approach. Model performance depends heavily on dataset characteristics (eg, linearity, sample size, number of candidate predictors, minority class proportion) and data quality (eg, completeness, accuracy). Consequently, we argue that efforts to improve data quality, not model complexity, are more likely to enhance the reliability and real-world utility of clinical prediction models.
Journal Title
Journal of Medical Internet Research
Conference Title
Book Title
Edition
Volume
27
Issue
Thesis Type
Degree Program
School
Publisher link
DOI
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement
©Yanan Hu, Xin Zhang, Valerie Slavin, Yitayeh Belsti, Sofonyas Abebaw Tiruneh, Emily Callander, Joanne Enticott. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 05.Nov.2025. This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research (ISSN 1438-8871), is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.
Item Access Status
Note
Access the data
Related item(s)
Subject
Health services and systems
Persistent link to this record
Citation
Hu, Y; Zhang, X; Slavin, V; Belsti, Y; Tiruneh, SA; Callander, E; Enticott, J, Beyond Comparing Machine Learning and Logistic Regression in Clinical Prediction Modelling: Shifting from Model Debate to Data Quality, Journal of Medical Internet Research, 2025, 27, pp. e77721