Tehran University of Medical Sciences

Science Communicator Platform

Stay connected! Follow us on X network (Twitter):
Share By
Addressing Overfitting in an Imbalanced Dataset for Ms Progression Prediction Publisher



Pilehvari S ; Peng W ; Morgan Y ; Sahraian MA ; Eskandarieh S
Authors

Source: Lecture Notes in Networks and Systems Published:2025


Abstract

Overfitting is a common problem during model training, particularly for binary medical datasets with class imbalance. This research specifically addresses this issue in predicting Multiple Sclerosis (MS) progression, with the primary goal of improving model accuracy and reliability. By investigating various data resampling techniques, ensemble methods, feature extraction, and model regularization, the study thoroughly evaluates the effectiveness of these strategies in enhancing stability and performance for highly imbalanced datasets. Compared to prior studies, this research advances existing approaches by integrating Kernel Principal Component Analysis (KPCA), moderate under-sampling, Synthetic Minority Oversampling Technique (SMOTE), and post-processing techniques, including Youden’s J Statistic and manual threshold adjustments. This comprehensive strategy significantly reduced overfitting while improving the generalization of models, particularly the Multilayer Perceptron (MLP), which achieved an Area Under the Curve (AUC) of 0.98—outperforming previous models in similar applications. These findings establish important best practices for developing robust prognostic models for MS progression and underscore the importance of tailored solutions in complex medical prediction tasks. © 2025 Elsevier B.V., All rights reserved.