Tehran University of Medical Sciences

Science Communicator Platform

Stay connected! Follow us on X network (Twitter):
Share this content! On (X network) By
Comparison of Different Machine Learning Algorithms to Classify Patients Suspected of Having Sepsis Infection in the Intensive Care Unit Publisher



Gholamzadeh M1 ; Abtahi H2 ; Safdari R3
Authors
Show Affiliations
Authors Affiliations
  1. 1. Health Information Management and Medical Informatics Department, School of Allied Medical Sciences, Tehran University of Medical Sciences, Tehran, Iran
  2. 2. Pulmonary and Critical Care Department, Thoracic Research Center, Imam Khomeini Hospital Complex, Tehran University of Medical Sciences, Tehran, Iran
  3. 3. Department of Health Information Management, School of Allied Medical Sciences, Tehran University of Medical Sciences, Tehran, Iran

Source: Informatics in Medicine Unlocked Published:2023


Abstract

Background: Sepsis is a life-threatening disease that occurs as a result of the body's response to an infection. This study aims to develop a classification model for predicting patients at risk of sepsis using clinical findings and demographic information. Methods: The study was conducted using a MIMICIII dataset which is freely available as open-access data. The synthetic minority oversampling technique (SMOTE) was applied to address the imbalanced data problem in our dataset. Through preprocessing, the dataset was cleaned and missing values were imputed. Split validation was done by dividing the dataset into training and test data for developing classification models. Six algorithms including Gaussian Naive Bayes (NB), decision tree (DT), random forest (RF), logistic regression (LR), KNN algorithm, and XGBoost classifier were developed. A combination of evaluation metrics was employed to evaluate the performance of the proposed models. Results: Our dataset includes 1,552,210 entries with 44 features of critically ill patients who were admitted to the ICU. Comparing the performance of developed models using different metrics showed that the RF model had the best performance in terms of F-Measure and the area under the ROC curve. The 20 top features with high importance were determined based on the RF model. Conclusion: Our analysis showed that the RF model predicted sepsis with significantly higher performance in comparison to other classification models using the MIMICIII dataset. Due to the high mortality of sepsis, these kinds of studies could be supportive to prevent the side effects of the disease and lessen the risk of mortality in hospitalized patients by providing early sepsis prediction. © 2023 The Authors