Vision-based diagnostic gain of ChatGPT-5 and gemini 2.5 pro compared with human experts in oral lesion assessment

Hassanein, Fatma E A; Hussein, Radwa R; Sarhan, Susan; Ahmed, Yousra; Abou-Bakr, Asmaa; Gamal Almalahy, Hadeel

Vision-based diagnostic gain of ChatGPT-5 and gemini 2.5 pro compared with human experts in oral lesion assessment

Hassanein, Fatma E A; Hussein, Radwa R; Sarhan, Susan; Ahmed, Yousra; Abou-Bakr, Asmaa; Gamal Almalahy, Hadeel;

Abstract

The diagnostic potential of multimodal large language models (LLMs) in oral medicine remains underexplored, particularly in real-world clinical contexts. This study introduces Vision-Based Diagnostic Gain (VWDG) as a novel metric to quantify the incremental diagnostic value of incorporating images into AI-assisted diagnosis of oral lesions. We conducted a prospective, biopsy-validated, case-matched study including 200 oral lesion cases with clinical photographs and radiographs of variable quality. ChatGPT-5 and Gemini 2.5 Pro were evaluated against board-certified oral medicine experts. Each case was presented under two conditions: text-only and multimodal (text plus images). Diagnostic accuracy was measured across Top-1, Top-3, and Top-5 differentials. VWDG was defined as the absolute and relative improvement in diagnostic accuracy between multimodal and text-only conditions. Cochran's Q and paired McNemar tests with effect sizes quantified differences across models and conditions, with analyses stratified by lesion type and diagnostic difficulty Both models demonstrated strong baseline diagnostic accuracy, but their performance diverged with image integration. ChatGPT-5 achieved significant VWDG across thresholds-Top-1 gain + 19% points, Top-3 gain + 18 pp, and Top-5 gain + 14 pp (all p < 0.001). In contrast, Gemini 2.5 Pro showed negligible or even negative gain (0 pp at Top-1/Top-3; - 2 pp at Top-5). Stratified analyses confirmed that ChatGPT-5 benefited most from visual input in malignant and diagnostically difficult cases, whereas Gemini's strength remained in text-dominant contexts. Human experts consistently outperformed both models in simple and benign presentations. By introducing and applying VWDG, this study provides the first expert-anchored, head-to-head evaluation of next-generation multimodal LLMs in oral medicine. ChatGPT-5 functions as a visual synergist, Gemini as a textual expert, and their complementary strengths suggest a cooperative human-AI diagnostic paradigm. VWDG offers a clinically meaningful framework for benchmarking AI models and guiding safe, context-aware integration into practice.

Other data

Title	Vision-based diagnostic gain of ChatGPT-5 and gemini 2.5 pro compared with human experts in oral lesion assessment
Authors	Hassanein, Fatma E A; Hussein, Radwa R; Sarhan, Susan ; Ahmed, Yousra; Abou-Bakr, Asmaa; Gamal Almalahy, Hadeel
Keywords	Artificial intelligence; ChatGPT 5; Diagnosis; Gemini 2.5 pro; Large language model; Oral lesions; Oral medicine
Issue Date	5-Dec-2025
Journal	Scientific reports
ISSN	2045-2322
DOI	10.1038/s41598-025-28862-1
PubMed ID	41350570
Scopus ID	2-s2.0-105024233219

Attached Files

File	Description	Size	Format	Existing users please Login
41598_2025_Article_28862.pdf		1.8 MB	Adobe PDF	Request a copy

Recommend this item

Similar Items from Core Recommender Database

Google Scholar^TM

Check

Vision-based diagnostic gain of ChatGPT-5 and gemini 2.5 pro compared with human experts in oral lesion assessment

Hassanein, Fatma E A; Hussein, Radwa R; Sarhan, Susan; Ahmed, Yousra; Abou-Bakr, Asmaa; Gamal Almalahy, Hadeel;

Abstract

Other data

Attached Files

Google ScholarTM

Google Scholar^TM