Tehran University of Medical Sciences

Science Communicator Platform

Stay connected! Follow us on X network (Twitter):
Share this content! On (X network) By
Comparison of Four Data Mining Algorithms for Predicting Colorectal Cancer Risk Publisher



Shanbehzadeh M1 ; Nopour R2 ; Kazemiarpanahi H3, 4
Authors
Show Affiliations
Authors Affiliations
  1. 1. Dept. of Health Information Technology, School of Paramedical, Ilam University of Medical Sciences, Ilam, Iran
  2. 2. Dept. of Health Information Technology, School of Allied Medical Sciences, Tehran University of Medical Sciences, Tehran, Iran
  3. 3. Dept. of Health Information Technology, Abadan Faculty of Medical Sciences, Abadan, Iran
  4. 4. Dept. of Student Research Committee, Abadan Faculty of Medical Sciences, Abadan, Iran

Source: Journal of Advances in Medical and Biomedical Research Published:2021


Abstract

Background & Objective: Colorectal cancer (CRC) is one of the most prevalent malignancies in the world. The early detection of CRC is not only a simple process but also is the key to treatment. Data mining algorithms could be potentially useful in cancer prognosis, diagnosis, and treatment. Therefore, the main focus of this study is to measure the performance of some data mining classifier algorithms in predicting CRC and providing an early warning to the high-risk groups. Materials & Methods: This study was performed on 468 subjects, including 194 CRC patients and 274 non-CRC cases. We used the CRC dataset from Imam Hospital, Sari, Iran. The Chi-square feature selection method was utilized to analyze the risk factors. Next, four popular data mining algorithms were compared in terms of their performance in predicting CRC, and, finally, the best algorithm was identified. Results: The best outcome was obtained by J-48 with F-measure=0.826, receiver operating characteristic (ROC)=0.881, precision=0.826, and sensitivity =0.827. Bayesian net was the second-best performer (F-Measure=0.718, ROC=0.784, precision=0.719, and sensitivity=0.722) followed by random forest (F-Measure=0.705, ROC=0.758, precision=0.719, and sensitivity=0.712). The multilayer perceptron technique had the worst performance (F-Measure=0.702, ROC=0.76, precision=0.701, and sensitivity=0.703). Conclusion: According to the results of this study, J-48 could provide better insights than other proposed prediction models for clinical applications. © 2021, Zanjan University of Medical Sciences and Health Services. All rights reserved.