Explainable machine learning model for predicting cesarean section following induction of labor: Development and external validation using real-world data

Loading...
Thumbnail Image
File version

Version of Record (VoR)

Author(s)
Hu, Yanan
Zhang, Xin
Slavin, Valerie
Enticott, Joanne
Callander, Emily
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)

Leal-Neto,, Onicio Batista

Date
2025
Size
File type(s)
Location
Abstract

Induction of labor (IOL) is a common yet complex clinical procedure associated with varying risks, including cesarean section (CS). Accurate prediction models may help support more informed, personalized decision-making. This study aimed to develop and validate an explainable machine learning prediction model for CS following IOL. We used population-based administrative perinatal datasets from two Australian states (New South Wales (NSW) and Queensland) covering all births between 2016 and 2019 for model development. Temporal validation was conducted using 2020 births from NSW, and geographical validation using 2016-2018 births from Victoria. We included women with singleton, cephalic, term, live births who attempted IOL and had no prior CS. Seven models (logistic regression, random forest, gradient boosting, LightGBM, XGBoost, CatBoost, and AdaBoost) were developed with hyperparameter tuning and feature selection. Performance was assessed using the area under the receiver operating characteristic curve (AUROC), area under the precision-recall curve, calibration plot (overall and across sociodemographic subgroups), decision curve analysis, Brier Score, and model parsimony. SHAP (SHapley Additive exPlanations) values were used to explain predictor contributions. A total of 180,700 women were included in model development (mean age 31 ± 5 years; CS = 20.8%). The optimal model, developed using XGBoost with ten predictors, achieved AUROCs of 0.76 (95% CI: 0.75-0.77) and 0.75 (95% CI: 0.74-0.76) in temporal (n = 14,527; CS = 22.5%) and geographical (n = 14,755; CS = 19.0%) validations, respectively. The most influential predictors were nulliparity, pre-pregnancy body mass index, and maternal age, while diabetes and hypertension (pre-existing or pregnancy-related) contributed least. Women with higher predicted CS probabilities had increased inpatient costs and maternal morbidity, regardless of actual mode of birth. The final model is accessible via an interactive web application (https://csai-8ccf2690242c.herokuapp.com/). This model demonstrates strong predictive performance using routinely collected maternal factors. Further co-design and implementation research is needed before potential clinical adoption.

Journal Title

PLOS Digital Health

Conference Title
Book Title
Edition
Volume

4

Issue

11

Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement

© 2025 Hu et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Item Access Status
Note
Access the data
Related item(s)
Subject

Machine learning

Data management and data science

Persistent link to this record
Citation

Hu, Y; Zhang, X; Slavin, V; Enticott, J; Callander, E, Explainable machine learning model for predicting cesarean section following induction of labor: Development and external validation using real-world data, PLOS Digital Health, 2025, 4 (11), pp. e0001061

Collections