Tehran University of Medical Sciences

Science Communicator Platform

Share By
Triage: Trustworthy Reporting and Assessment for Clinical Gain and Effectiveness of Ai Models Publisher



Fazilati F ; Rajabi MZ ; Alihosseini N ; Farsani ME ; Sandid SH ; Zamani S ; Alirezaei Farahani M ; Biriaei F ; Sadeghipour F ; Mirshamsi MT ; Fahami M ; Marateb HR
Authors

Source: Diagnostics Published:2026


Abstract

Machine learning (ML), including deep learning, kernel-based classifiers, and ensemble methods, is increasingly used to support clinical diagnosis in medical imaging, biosignal interpretation, and electronic health record (EHR)-based decision support. Despite rapid progress, many diagnostic AI studies still rely on limited retrospective evaluation and single summary measures (e.g., accuracy or AUC), creating a gap between reported model performance and evidence required for safe clinical adoption. This review proposes TRIAGE, a clinically grounded evaluation framework designed to organize diagnostic AI testing as an evidence pipeline aligned with real clinical use cases (screening, triage, second reading, and confirmatory testing). We summarize core discrimination metrics derived from the confusion matrix (sensitivity, specificity, predictive values, likelihood ratios, diagnostic odds ratio, and F-scores) and highlight the importance of prevalence and spectrum effects for interpreting predictive value and clinical workload. We further review evaluation strategies for multi-class and multi-label diagnostic tasks using appropriate aggregation methods (micro, macro, and weighted averaging) and set-based measures such as Hamming loss, exact match ratio, and Jaccard/IoU. Because diagnostic deployment is threshold-dependent, we integrate representation curves (ROC, precision–recall, lift, and cumulative gain) with calibration assessment and clinical utility analysis, including calibration slope, Brier score, and decision-curve analysis. We also address robustness and fairness evaluation, leakage-resistant validation designs (patient-grouped splits, stratified and temporal validation, and external validation), computational constraints relevant to deployment (latency, throughput, and energy use), and statistically sound model comparison with multiplicity control. A structured TRIAGE checklist table summarizing the evaluation parameters described in this review is provided in the main text to support reproducible and clinically interpretable reporting. © 2026 by the authors.
Other Related Docs
18. Machine Learning and Microbiome Analysis for Early Detection of Pancreatic Cancer, Gastroenterology and Hepatology from Bed to Bench (2025)