Use of Chat Generated Pretrained Transformer (ChatGPT), a free artificial intelligence (AI)-based natural language processing tool, to answer dermatologic questions falls short of expectations, according to a study presented at AAD 2024. Responses by doctors are still preferable to ChatGPT-generated answers.
“Despite the excitement regarding the potential use of ChatGPT-generated responses in various types of inquiries, the results of this study suggest that physician-generated responses to patients’ portal messages are still preferred over ChatGPT,” said the researchers led by Dr Kelly Reynolds, Department of Dermatology, University of Michigan, Ann Arbor, Michigan, US.
However, “generative AI tools may be helpful in generating the first drafts of responses and providing information on education resources for patients,” they added.
Reynolds and her team used electronic medical records to extract patient-submitted questions and the corresponding responses from their dermatology physician for analysis. The researchers entered these questions into ChatGPT (version 3.5) and obtained the outputs for evaluation, with manual removal of verbiage concerning the inability of ChatGPT to provide medical advice.
Ten blinded reviewers, including seven physicians and three nonphysicians, rated both the physician- and AI-generated responses and selected their preference in terms of overall quality, readability, accuracy, thoroughness, and level of empathy.
A total of 31 messages and responses were included in the analysis. The physician-generated response was greatly preferred over that of ChatGPT by physician and nonphysician reviewers. Physician-generated responses also scored significantly higher in terms of readability and level of empathy. [AAD 2024, poster 49147]
Patch testing
These findings were consistent with those of another study presented at AAD 2024, in which the investigators assessed whether ChatGPT can properly interpret patch testing results and offer reliable patient counselling.
Based on the findings, ChatGPT succeeded in analysing the relevance of written patch testing results, but it lacked the comprehensiveness of the manufacturer’s patient information sheet in counselling the patient. ChatGPT also did not have the clinical expertise to correctly ascribe past, current, or future relevance in all cases. [AAD 2024, poster 48853]
On the other hand, ChatGPT “can identify whether a list of products contain a particular hapten (eg, benzalkonium chloride), showing how AI could be used as a tool to improve efficiency in the patch testing clinic,” according to the investigators.
Specifically, clinicians and ChatGPT disagreed about the patch testing results relevance in 12 cases (17.9 percent). [AAD 2024, poster 48853]
“Whereas clinicians can extract precise past relevance of allergens, ChatGPT-4 ascribes possible current relevance despite no documentation of current exposure being provided,” the investigators said. “ChatGPT-4 is able in cases, however, to ascribe possible future relevance that is clinically appropriate.”
In this study, clinical history and results for 67 patients who underwent patch testing at one UK centre were recorded. ChatGPT was asked to counsel patients of the clinical relevance of the patch testing results and to identify alternative chemical names and potential sources for the 80 haptens comprising the North American 80 Comprehensive Series (NAC-80).