Beyond Comparing Machine Learning and Logistic Regression in Clinical Prediction Modelling: Shifting from Model Debate to Data Quality

Loading...
Thumbnail Image
File version

Version of Record (VoR)

Author(s)
Hu, Yanan
Zhang, Xin
Slavin, Valerie
Belsti, Yitayeh
Tiruneh, Sofonyas Abebaw
Callander, Emily
Enticott, Joanne
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)
Date
2025
Size
File type(s)
Location
Abstract

The rapid uptake of supervised machine learning (ML) in clinical prediction modelling, particularly for binary outcomes based on tabular data, has sparked debate about its comparative advantage over traditional statistical logistic regression. Although ML has demonstrated superiority in unstructured data domains, its performance gains in structured, tabular clinical datasets remain inconsistent and context dependent. This viewpoint synthesizes recent comparative studies and simulation findings to argue that there is no universal best modelling approach. Model performance depends heavily on dataset characteristics (eg, linearity, sample size, number of candidate predictors, minority class proportion) and data quality (eg, completeness, accuracy). Consequently, we argue that efforts to improve data quality, not model complexity, are more likely to enhance the reliability and real-world utility of clinical prediction models.

Journal Title

Journal of Medical Internet Research

Conference Title
Book Title
Edition
Volume

27

Issue
Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement

©Yanan Hu, Xin Zhang, Valerie Slavin, Yitayeh Belsti, Sofonyas Abebaw Tiruneh, Emily Callander, Joanne Enticott. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 05.Nov.2025. This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research (ISSN 1438-8871), is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.

Item Access Status
Note
Access the data
Related item(s)
Subject

Health services and systems

Persistent link to this record
Citation

Hu, Y; Zhang, X; Slavin, V; Belsti, Y; Tiruneh, SA; Callander, E; Enticott, J, Beyond Comparing Machine Learning and Logistic Regression in Clinical Prediction Modelling: Shifting from Model Debate to Data Quality, Journal of Medical Internet Research, 2025, 27, pp. e77721

Collections