Llm-Based Extraction of Imaging Features From Radiology Reports: Automating Disease Activity Scoring in Crohn's Disease

Science Communicator Platform

Share By

Llm-Based Extraction of Imaging Features From Radiology Reports: Automating Disease Activity Scoring in Crohn's Disease Publisher

R Dehdab REZA ; F Mankertz FIONA ; Jm Brendel Jan MICHAEL ; N Maalouf NOUR ; K Kaya KENAN ; S Afat SAIF ; S Kolahdoozan SH ; Ar Radmard Amir REZA

Source: Academic Radiology Published:2025

Abstract

Rationale and Objectives: Large Language Models (LLMs) offer a promising solution for extracting structured clinical information from free-text radiology reports. The Simplified Magnetic Resonance Index of Activity (sMARIA) is a validated scoring system used to quantify Crohn's disease (CD) activity based on Magnetic Resonance Enterography (MRE) findings. This study aims to evaluate the performance of two advanced LLMs in extracting key imaging features and computing sMARIA scores from free-text MRE reports. Materials and Methods: This retrospective study included 117 anonymized free-text MRE reports from patients with confirmed CD. ChatGPT (GPT-4o) and DeepSeek (DeepSeek-R1) were prompted using a structured input designed to extract four key radiologic features relevant to sMARIA: bowel wall thickness, mural edema, perienteric fat stranding, and ulceration. LLM outputs were evaluated against radiologist annotations at both the segment and feature levels. Segment-level agreement was assessed using accuracy, mean absolute error (MAE) and Pearson correlation. Feature-level performance was evaluated using sensitivity, specificity, precision, and F1-score. Errors including confabulations were recorded descriptively. Results: ChatGPT achieved a segment-level accuracy of 98.6%, MAE of 0.17, and Pearson correlation of 0.99. DeepSeek achieved 97.3% accuracy, MAE of 0.51, and correlation of 0.96. At the feature level, ChatGPT yielded an F1-score of 98.8% (precision 97.8%, sensitivity 99.9%), while DeepSeek achieved 97.9% (precision 96.0%, sensitivity 99.8%). Conclusions: LLMs demonstrate near-human accuracy in extracting structured information and computing sMARIA scores from free-text MRE reports. This enables automated assessment of CD activity without altering current reporting workflows, supporting longitudinal monitoring and large-scale research. Integration into clinical decision support systems may be feasible in the future, provided appropriate human oversight and validation are ensured. © 2025 Elsevier B.V., All rights reserved.

Related Docs

View other Related Docs

1. Methodological Insights Into Chatgpt’S Screening Performance in Systematic Reviews, BMC Medical Research Methodology (2024)

2. Accuracy of Large Language Models in Answering Dental Examination Questions: A Systematic Review and Meta-Analysis, International Dental Journal (2026)

3. Evaluating the Diagnostic Accuracy of Artificial Intelligence in Spondylolisthesis Detection: A Systematic Review and Meta-Analysis, Academic Radiology (2026)

Experts (# of related papers)

View all Related Experts

Kolahi Shahriar (2)

Seyed Mohsen Ahmadi Tafti (1)

Other Related Docs

4. Artificial Intelligence for the Prediction of Synchronous and Metachronous Liver Metastasis in Colorectal Cancer Patients: A Systematic Review and Meta-Analysis, Abdominal Radiology (2026)

5. Glioma Tumor Grading Using Radiomics on Conventional Mri: A Comparative Study of Who 2021 and Who 2016 Classification of Central Nervous Tumors, Journal of Magnetic Resonance Imaging (2023)

6. Identifying Abdominal Aortic Aneurysm Size and Presence Using Natural Language Processing of Radiology Reports: A Systematic Review and Meta-Analysis, Abdominal Radiology (2025)

7. A Systematic Review and Meta-Analysis Comparing the 2019 and 2005 Bosniak Classification Systems for Assessing Renal Cysts and Cystic Renal Masses: Diagnostic Accuracy and Inter-Rater Agreement Evaluation, British Journal of Radiology (2025)

8. Mri-Based Machine Learning for Determining Quantitative and Qualitative Characteristics Affecting the Survival of Glioblastoma Multiforme, Magnetic Resonance Imaging (2022)

9. Predicting Hemorrhagic Transformation in Acute Ischemic Stroke: A Systematic Review, Meta-Analysis, and Methodological Quality Assessment of Ct/Mri-Based Deep Learning and Radiomics Models, Emergency Radiology (2025)

10. Performance of Machine Learning Algorithms in Diffusion Tensor Imaging of Movement Disorders: An Exploratory Meta-Analysis, BioMedical Engineering Online (2026)

11. Can We Rely on Machine Learning Algorithms As a Trustworthy Predictor for Recurrence in High-Grade Glioma? a Systematic Review and Meta-Analysis, Clinical Neurology and Neurosurgery (2025)

12. Focal Cortical Dysplasia Detection by Artificial Intelligence Using Mri: A Systematic Review and Meta-Analysis, Epilepsy and Behavior (2025)

13. Handcrafted Vs. Deep Radiomics Vs. Fusion Vs. Deep Learning: A Comprehensive Review of Machine Learning -Based Cancer Outcome Prediction in Pet and Spect Imaging, Journal of Imaging Informatics in Medicine (2026)

14. Machine Learning and Deep Learning Algorithms in Stroke Medicine: A Systematic Review of Hemorrhagic Transformation Prediction Models, Journal of Neurology (2025)

15. The Diagnostic Accuracy of Artificial Intelligence-Assisted Ct Imaging in Covid-19 Disease: A Systematic Review and Meta-Analysis, Informatics in Medicine Unlocked (2021)

16. Deep Learning-Based Image Classification and Segmentation on Digital Histopathology for Oral Squamous Cell Carcinoma: A Systematic Review and Meta-Analysis, Journal of Oral Pathology and Medicine (2024)

17. Clinical Application of Artificial Intelligence in Prediction of Intraoperative Cerebrospinal Fluid Leakage in Pituitary Surgery: A Systematic Review and Meta-Analysis, World Neurosurgery (2024)

Style	Citing Format
MLA	R Dehdab REZA, et al.. "Llm-Based Extraction of Imaging Features From Radiology Reports: Automating Disease Activity Scoring in Crohn's Disease." Academic Radiology, vol. , no. , 2025, pp. -.
APA	R Dehdab REZA, F Mankertz FIONA, Jm Brendel Jan MICHAEL, N Maalouf NOUR, K Kaya KENAN, S Afat SAIF, S Kolahdoozan SH, Ar Radmard Amir REZA (2025). Llm-Based Extraction of Imaging Features From Radiology Reports: Automating Disease Activity Scoring in Crohn's Disease. Academic Radiology, (), -.
Chicago	R Dehdab REZA, F Mankertz FIONA, Jm Brendel Jan MICHAEL, N Maalouf NOUR, K Kaya KENAN, S Afat SAIF, S Kolahdoozan SH, Ar Radmard Amir REZA. "Llm-Based Extraction of Imaging Features From Radiology Reports: Automating Disease Activity Scoring in Crohn's Disease." Academic Radiology , no. (2025): -.
Harvard	R Dehdab REZA et al. (2025) 'Llm-Based Extraction of Imaging Features From Radiology Reports: Automating Disease Activity Scoring in Crohn's Disease', Academic Radiology, (), pp. -.
Vancouver	R Dehdab REZA, F Mankertz FIONA, Jm Brendel Jan MICHAEL, N Maalouf NOUR, K Kaya KENAN, S Afat SAIF, et al.. Llm-Based Extraction of Imaging Features From Radiology Reports: Automating Disease Activity Scoring in Crohn's Disease. Academic Radiology. 2025;():-.
BibTex	@article{ author = {R Dehdab REZA and F Mankertz FIONA and Jm Brendel Jan MICHAEL and N Maalouf NOUR and K Kaya KENAN and S Afat SAIF and S Kolahdoozan SH and Ar Radmard Amir REZA}, title = {Llm-Based Extraction of Imaging Features From Radiology Reports: Automating Disease Activity Scoring in Crohn's Disease}, journal = {Academic Radiology}, volume = {}, number = {}, pages = {-}, year = {2025} }
RIS	TY - JOUR AU - R Dehdab REZA AU - F Mankertz FIONA AU - Jm Brendel Jan MICHAEL AU - N Maalouf NOUR AU - K Kaya KENAN AU - S Afat SAIF AU - S Kolahdoozan SH AU - Ar Radmard Amir REZA TI - Llm-Based Extraction of Imaging Features From Radiology Reports: Automating Disease Activity Scoring in Crohn's Disease JO - Academic Radiology VL - IS - SP - EP - PY - 2025 ER -

Science Communicator Platform

Authors

Abstract