Tehran University of Medical Sciences

Science Communicator Platform

Stay connected! Follow us on X network (Twitter):
Share this content! By
Assessing Chatgpt-4’S Performance on the Us Prosthodontic Exam: Impact of Fine-Tuning and Contextual Prompting Vs. Base Knowledge, a Cross-Sectional Study Publisher Pubmed



Dashti M1 ; Khosraviani F2 ; Azimi T3 ; Hefzi D4 ; Ghasemi S5 ; Fahimipour A6 ; Zare N7 ; Khurshid Z8 ; Habib SR9
Authors

Source: BMC Medical Education Published:2025


Abstract

Background: Artificial intelligence (AI), such as ChatGPT-4 from OpenAI, has the potential to transform medical education and assessment. However, its effectiveness in specialized fields like prosthodontics, especially when comparing base to fine-tuned models, remains underexplored. This study evaluates the performance of ChatGPT-4 on the US National Prosthodontic Resident Mock Exam in its base form and after fine-tuning. The aim is to determine whether fine-tuning improves the AI’s accuracy in answering specialized questions. Methods: An official sample questions from the 2021 US National Prosthodontic Resident Mock Exam was used, obtained from the American College of Prosthodontists. A total of 150 questions were initially considered, and resources were available for 106 questions. Both the base and fine-tuned models of ChatGPT-4 were tested under simulated exam conditions. Performance was assessed by comparing correct and incorrect responses. The Chi-square test was used to analyze accuracy, with significance set at p < 0.05. The Kappa coefficient was calculated to measure agreement between the models’ responses. Results: The base model of ChatGPT-4 correctly answered 62.7% of the 150 questions. For the 106 questions with resources, the fine-tuned model answered 73.6% correctly. The Chi-square test showed a significant improvement in performance after fine-tuning (p < 0.001). The Kappa coefficient was 0.39, indicating moderate agreement between the models (p < 0.001). Performance varied by topic, with lower accuracy in areas such as Implant Prosthodontics, Removable Prosthodontics, and Occlusion, though the fine-tuned model consistently outperformed the base model. Conclusions: Fine-tuning ChatGPT-4 with specific resources significantly enhances its accuracy in answering specialized prosthodontic exam questions. While the base model provides a solid baseline, fine-tuning is essential for improving AI performance in specialized fields. However, certain topics may require more targeted training to achieve higher accuracy. © The Author(s) 2025.