Tehran University of Medical Sciences

Science Communicator Platform

Stay connected! Follow us on X network (Twitter):
Share this content! By
Predicting Errors in Accident Hotspots and Investigating Satiotemporal, Weather, and Behavioral Factors Using Interpretable Machine Learning: An Analysis of Telematics Big Data Publisher Pubmed



A Golestani ALI ; N Rezaei NAZILA ; Mr Malekpour Mohammad REZA ; N Ahmadi NASER ; Smn Ataei Seyed Mohammad NAVID ; S Khosravi SEPEHR ; A Jafari AYYOOB ; S Shahraz SAEID ; F Farzadfar FARSHAD
Authors

Source: PLOS ONE Published:2025


Abstract

Background Road traffic accidents (RTAs) are a major public health concern with significant health and economic burdens. Identifying high-risk areas and key contributing factors is essential for developing targeted interventions. While machine learning (ML) has been increasingly used to predict RTAs, the lack of interpretability limits its applicability in policymaking. This study aimed to utilize interpretable ML models to predict the occurrence of errors in road accident hotspots using telematics data in Iran and interpret the most influential predictors. Methods We utilized data collected via telematics from 1673 intercity buses throughout the year 2020, spanning cities across all provinces of Iran. Merging this data with a weather-related dataset resulted in a comprehensive dataset containing location, time, weather, and error type variables. After preprocessing, 619,988 records without any missing values were used to train and compare the performance of six machine learning models including logistic regression, K-nearest neighbors, random forest, Extreme Gradient Boosting (XGBoost), Naive Bayes, and support vector machine. The best model was selected for interpretation using SHAP (SHapley Additive exPlanation). Due to the high imbalance in the outcome, an ensemble approach was applied to train all models. Results XGBoost demonstrated the best performance with an area under the curve (AUC) of 91.70% (95% uncertainty interval: 91.33% − 92.09%). SHAP values highlighted spatial-related variables, particularly the province of error and road type, as the most critical features for predicting errors in accident hotspots in Iran. Fatigue, as a behavioral error, was associated with a higher risk of predicting errors in accident hotspots, and certain weather-related variables including dew points and relative humidity also exhibited importance. However, temporal variables did not contribute significantly to the prediction. Conclusion By integrating spatiotemporal, behavioral, and weather-related variables, our study highlighted the dominance of spatial factors in predicting errors in accident hotspots. These findings underscore the need for targeted road infrastructure improvements and data-driven policymaking to mitigate RTA risks. © 2025 Elsevier B.V., All rights reserved.
Other Related Docs