AI-powered app provides accurate responses, but lacks key info, in urology | Multidisciplinary

A large language model (LLM) application powered by artificial intelligence (AI) can generate appropriate and readable responses to urology-related medical inquiries, a study has shown. However, the generated content lacks vital information and misses emergency diagnoses.

“Despite impressive capabilities, natural language processors have limitations as sources of medical information,” the researchers said. “Refinement is crucial before adoption for this purpose.”

In this study, the research team developed 18 patient questions based on Google Trends and used these as inputs in ChatGPT. They then assessed three categories, namely oncologic, benign, and emergency. Questions in each category were either treatment- or sign/symptom-related.

The appropriateness of ChatGPT outputs for patient counselling were independently evaluated by three native English-speaking board-certified urologists using accuracy, comprehensiveness, and clarity as proxies for appropriateness.

Using the Flesch Reading Ease and Flesh-Kincaid Reading Grade Level formulas, the researchers assessed the readability of the responses. Additional measures were created based on validated tools and assessed by independent reviewers.

Of the 18 responses, 14 (77.8 percent) were deemed appropriate, with clarity achieving the most 4 and 5 scores (p=0.01). No significant difference was observed in the appropriateness of the responses between treatment and symptoms or between different categories of conditions. [J Urol 2023;210:688-694]

According to the urologists who reviewed the AI responses, their most common reason for giving a low score was lack of information, particularly vital ones.

In terms of readability, the responses achieved a mean Flesch Reading Ease score of 35.5 and a mean Flesh-Kincaid Reading Grade Level score of 13.5. Moreover, no significant differences were noted in additional quality assessment scores between different categories of conditions.

“The groundbreaking advent of LLMs sets forth exciting yet concerning possibilities for adoption within the field of urology and healthcare at large,” the researchers said. “While highly impressive in their current state, we identified faults such as lacking vital information in certain responses and missing emergency diagnoses.”

Chatbots

Nearly three in five Americans use the internet to seek information about health. Thus, it can be expected for some of them to rely on chatbots powered by GPT-4 or LLMs for medical information. [https://www.pewresearch.org/internet/2013/02/12/the-internet-and-health/]

“However, the empathetic and human-like nature of these chatbots' responses could potentially jeopardize their well-being if the provided replies are not entirely accurate, despite sounding convincing,” the researchers said. “Hence, it is imperative to conduct further investigation to compare the quality of information reported by LLMs like ChatGPT and Google Bard.”

Likewise, the medical community must take precautions when using these technological tools to deliver medical advice, just like how they evaluate information obtained from search engines like Google, according to the researchers. [Soc Int Urol J 2021;2:362-369; Int J Impot Res 2020;32:455-461; Int J Impot Res 2020;32:455-461; Lancet Oncol 2019;20:1491-1492]

Appropriate regulatory measures must be established for the use of AI in medicine, they noted.