21-05-2025
AI Tool to Answer Patient Queries Has Mixed Performance
A novel generative artificial intelligence (genAI) tool for drafting physician responses to patients' online questions showed mixed results, according to a poster presented recently at the Society of General Internal Medicine (SGIM) 2025 Annual Meeting in Hollywood, Florida.
Researchers led by Natalie Lee, MD, MPH, assistant professor of internal medicine at The Ohio State University College of Medicine in Columbus, Ohio, conducted a thematic analysis of 201 genAI responses to real online questions from patients. The AI generator, an Epic draft tool based on GPT-3.5, was trained with four to five rounds of revisions to build the model. The team grouped responses into five categories: Building rapport, gathering information, delivering information accurately and professionally, facilitating next steps and patient decisions, and responding to emotion. Primary care provider reviewers developed a coding system to define strengths and limitations for each domain.
'73% of Drafts Had Limitations'
'We observed both strengths and limitations across all communication domains,' Lee told Medscape Medical News. The data show that 60% of drafts were deemed 'usable,' and 73% had limitations.
In the category of information delivery, for instance, the model performed well in providing accurate information about what to expect with a medical test but gave information about blood donation when the patient's question was about kidney donation. The researchers reported that 65% of AI information delivery responses made incorrect assumptions about clinical context, 36% made incorrect assumptions about workflow, and 24% of the responses were 'simply wrong.'
In responding to emotion, an example limitation was seen when a patient expressed long-standing emotional suffering related to a condition and the reply did not acknowledge patient emotion.
Overall, the researchers wrote, 'While GenAI drafts often contain usable portions, net value added may be limited by inadequacies in more cognitively demanding, critical domains like providing accurate information, asking key questions, and responding to patient emotions. Further optimization is needed to genuinely enhance patient care and provider experience.'
This work provides more data about the usefulness of AI as the need for assistance in fielding patients' questions has intensified with unprecedented volume. 'How do we integrate generative AI technology so that we uphold communication and care quality while also reducing provider burden? It's not clear we have that figured out,' she said.
Framework Adds Novel Assessment
Owais Durrani, DO, an emergency medicine physician with Memorial Hermann Health System in Houston, told Medscape Medical News that the authors' use of a novel framework to assess the quality of medical communications made this study particularly significant.
'Too often, we judge AI tools on surface-level efficiency metrics — how many clicks saved, how many seconds shaved off. But this study looked at deeper clinical domains. The finding that AI performed better in low-risk areas like enabling next steps or basic rapport-building but struggled in areas like delivering accurate information or responding to emotion really highlights where the gaps are. It's a reminder that AI can support the clinical process but shouldn't replace human judgment, especially in patient interactions that require empathy, nuance, or clinical reasoning.'
He said he's open to AI communications, but with caution. 'Many of the drafts are overly verbose, vague, or miss the clinical nuance required to answer a patient's real question,' he said. 'Physicians then have to sift through the noise and rework the content, which in many cases takes as much — or more — time than writing it ourselves. For now, I view AI-generated messages more as a starting point than a solution.'
Lee and Durrani reported having no relevant financial relationships.