Poster CR-049

Performance of Microsoft’s Bing Artificial Intelligence in Diagnosis of Chronic Wounds

Kirollos Tadrousse (he/him/his)BSUniversity of ToledoKirollos.Tadrousse@rockets.utoledo.edu

Presented at Symposium on Advanced Wound Care 2024

Introduction: Nearly 2.5% of the United States population is impacted by chronic wounds. These wounds can lead to many complications, including hospitalization, infection, and amputation if not timely diagnosed and properly treated. Timely diagnosis of chronic wounds can be exacerbated by social determinants of health as many areas of the country experience clinician shortages and lack access to proper wound care centers. Recently, publicly accessible artificial intelligence (AI) systems have arisen as a potential avenue to overcome these challenges by reducing the time to diagnosis and proper treatment of chronic wounds. However, the accuracy of AI systems in medical diagnosis remains widely unknown. This study aims to rigorously assess the accuracy of Microsoft's Bing AI in diagnosing chronic wounds, a crucial step in the ongoing development and refinement of AI systems for medical use.Methods:Ten chronic wound cases were randomly selected with a given physician diagnosis from the Silesian University of Technology publicly accessible online database. Case images and basic patient demographics (age and sex) were inputted into the publicly accessible Bing AI, a generative machine learning model based on ChatGPT-4. Then, Bing AI was queried for the top three differential diagnoses of the wound etiology. Differential diagnoses were scored based on the following grading system: (1) Three points for a correct primary diagnosis; (2) Two points for a correct secondary differential; (3) One point for a correct tertiary differential; (4) Zero points if all differentials provided were incorrect. A Fischer exact test was utilized to evaluate the accuracy of Bing AI's differential diagnoses against the official case diagnosis.Results:Of the ten chronic wound cases (n=10), the accuracy of Bing AI's most probable diagnosis (score = 3) was 30%, and the accuracy of a correct diagnosis within Bing AI's top three differentials (score > 0) was 70%. The accuracy of Bing AI's top differential diagnosis was not significantly correlated with an accurate diagnosis within Bing AI's top three differentials (p = 0.20).Discussion: Overall, Bing AI displayed poor chronic wound diagnostic performance and should be limited in use if translated to the clinic as a diagnostic aid. This study highlights the importance of thoroughly assessing AI systems' accuracy as they become more publicly accessible for medical diagnostic use. For future studies, more extensive medical databases should be utilized with more robust diagnostic criteria to evaluate AI systems' accuracy further.References:Hwang, J. M. (2023). Time is tissue. want to save millions in wound care? start early: A QI project to expedite referral of high-risk wound care patients to specialized care. BMJ Open Quality, 12(1). https://doi.org/10.1136/bmjoq-2022-002206 Sen, C. K. (2021). Human wound and its burden: Updated 2020 compendium of estimates. Advances in Wound Care, 10(5), 281–292. https://doi.org/10.1089/wound.2021.0026 Sutherland, B. L., Pecanac, K., Bartels, C. M., & Brennan, M. B. (2020). Expect delays: Poor connections between rural and Urban Health Systems Challenge multidisciplinary care for rural Americans with diabetic foot ulcers. Journal of Foot and Ankle Research, 13(1). https://doi.org/10.1186/s13047-020-00395-y