TY - JOUR
T1 - HGSORF
T2 - Henry Gas Solubility Optimization-based Random Forest for C-Section prediction and XAI-based cause analysis
AU - Islam, Md Saiful
AU - Awal, Md Abdul
AU - Laboni, Jinnaton Nessa
AU - Pinki, Farhana Tazmim
AU - Karmokar, Shatu
AU - Mumenin, Khondoker Mirazul
AU - Al-Ahmadi, Saad
AU - Rahman, Md Ashfikur
AU - Hossain, Md Shahadat
AU - Mirjalili, Seyedali
N1 - Funding Information:
The authors extend their appreciation to the Deanship of Scientific Research at King Saud University for funding this work through research group no [ RG-1441-394 ].
Publisher Copyright:
© 2022 Elsevier Ltd
PY - 2022/8
Y1 - 2022/8
N2 - A stable predictive model is essential for forecasting the chances of cesarean or C-section (CS) delivery, as unnecessary CS delivery can adversely affect neonatal, maternal, and pediatric morbidity and mortality, and can incur significant financial burdens. Limited state-of-the-art machine learning models have been applied in this area in recent years, and the current models are insufficient to correctly predict the probability of CS delivery. To alleviate this drawback, we have proposed a Henry gas solubility optimization (HGSO)-based random forest (RF), with an improved objective function, called HGSORF, for the classification of CS and non-CS classes. Real-world CS datasets can be noisy, such as the Pakistan Demographic and Health Survey (PDHS) dataset used in this study. The HGSO can provide fine-tuned hyperparameters of RF by avoiding local minima points. To compare performance, Gaussian Naive Bayes (GNB), linear discriminant analysis (LDA), K-nearest neighbors (KNN), gradient boosting classifier (GBC), and logistic regression (LR) have been considered in this research. The ADAptive SYNthetic (ADASYN) algorithm has been used to balance the model, and the proposed HGSORF has been compared with other classifiers as well as with other studies. The superior performance was achieved by HGSORF with an accuracy of 98.33% for the PDHS dataset. The hyperparameters of RF have also been optimized by using commonly used hyperparameter-optimization algorithms, and the proposed HGSORF provided comparatively better performance. Additionally, to analyze the causes of CS and their significance, the HGSORF is explained locally and globally using eXplainable artificial intelligence (XAI)-based tools such as SHapely Additive exPlanation (SHAP) and Local Interpretable Model-Agnostic Explanations (LIME). A decision support system has been developed as a potential application to support clinical staffs. All pre-trained models and relevant codes are available on: https://github.com/MIrazul29/HGSORF_CSection.
AB - A stable predictive model is essential for forecasting the chances of cesarean or C-section (CS) delivery, as unnecessary CS delivery can adversely affect neonatal, maternal, and pediatric morbidity and mortality, and can incur significant financial burdens. Limited state-of-the-art machine learning models have been applied in this area in recent years, and the current models are insufficient to correctly predict the probability of CS delivery. To alleviate this drawback, we have proposed a Henry gas solubility optimization (HGSO)-based random forest (RF), with an improved objective function, called HGSORF, for the classification of CS and non-CS classes. Real-world CS datasets can be noisy, such as the Pakistan Demographic and Health Survey (PDHS) dataset used in this study. The HGSO can provide fine-tuned hyperparameters of RF by avoiding local minima points. To compare performance, Gaussian Naive Bayes (GNB), linear discriminant analysis (LDA), K-nearest neighbors (KNN), gradient boosting classifier (GBC), and logistic regression (LR) have been considered in this research. The ADAptive SYNthetic (ADASYN) algorithm has been used to balance the model, and the proposed HGSORF has been compared with other classifiers as well as with other studies. The superior performance was achieved by HGSORF with an accuracy of 98.33% for the PDHS dataset. The hyperparameters of RF have also been optimized by using commonly used hyperparameter-optimization algorithms, and the proposed HGSORF provided comparatively better performance. Additionally, to analyze the causes of CS and their significance, the HGSORF is explained locally and globally using eXplainable artificial intelligence (XAI)-based tools such as SHapely Additive exPlanation (SHAP) and Local Interpretable Model-Agnostic Explanations (LIME). A decision support system has been developed as a potential application to support clinical staffs. All pre-trained models and relevant codes are available on: https://github.com/MIrazul29/HGSORF_CSection.
KW - ADASYN
KW - Cesarean section
KW - HGSORF
KW - Hyperparameter optimization
KW - LIME
KW - Machine learning
KW - SHAP
KW - XAI
UR - http://www.scopus.com/inward/record.url?scp=85131221843&partnerID=8YFLogxK
U2 - 10.1016/j.compbiomed.2022.105671
DO - 10.1016/j.compbiomed.2022.105671
M3 - Article
AN - SCOPUS:85131221843
SN - 0010-4825
VL - 147
JO - Computers in Biology and Medicine
JF - Computers in Biology and Medicine
M1 - 105671
ER -