
Doctors Think AI Shows Promise—But Worry It Will Mislead Patients
F or over a decade now, healthcare workers have had to deal with 'Doctor Google'—when patients turn to Google for medical advice instead of a professional. It's a practice that organizations including Brown University Health, Orlando Health and the Northeast Georgia Physicians Group have advised against, citing both the years of education and experience physicians have as well as the tendency for people to gravitate to worst-case scenarios.
Today, Doctor Google has a new rival: ChatGPT.
That's according to a new report from academic publishing company Elsevier, which also makes AI tools for doctors, such as research assistant Scopus AI and chemical discovery tool Reaxys. For the report, which will be released today, the company surveyed 2,206 doctors and nurses from 109 countries this past spring. This included 268 clinicians in North America, 1,170 in the Asia Pacific, 439 in Europe, 164 in Latin America and 147 in the Middle East and Africa. Eighteen declined to disclose their location.
These clinicians were asked about the role they thought AI plays in healthcare today and its potential implications for the future. The company emailed a link to the survey to healthcare workers who had recently published books or journal articles, served on certain third-party panels or were otherwise known to Elsevier. The authors acknowledged that because it's not a randomized sample, these results are not necessarily generalizable.
One of the biggest concerns the survey respondents had was regarding patient use of ChatGPT and similar tools. They reported that patients often arrive with preconceived—and sometimes wrong—ideas about their health issues because of what these tools provided. One major issue? These models are frequently wrong. For example, OpenAI's 03 and 04-mini models hallucinate—that is, makes up its own answers to questions—around 30 to 50% of the time, per the company's own recent tests.
This creates an additional burden for healthcare workers, who are often already overwhelmed by their work and the amount of patients they see. In North America, 34% of clinicians who reported being time-strapped noted that patients have numerous questions. Globally, this number was about 22%.
Even more concerning, Jan Herzhoff, president of Elsevier's global healthcare businesses and a sponsor of the study, told Forbes , is that patients may decide to skip the hospital altogether and rely solely on ChatGPT and other websites for advice. He said that over 50% of U.S.-based clinicians predict that most patients will self-diagnose rather than seeing a professional within the next three years, though it's not clear how often patients are skipping their doctor in favor of AI right now.
Though healthcare workers may have concerns about their patients' use of AI, more are finding themselves using such tools. In the past year, the percentage of doctors and nurses who have used AI in a clinical setting jumped from 26% to 48%, the survey found. The survey respondents are also optimistic about AI's ability to streamline their workflows, though at the same time few say that their own institutions were using AI to effectively solve current problems.
A majority of clinicians surveyed predicted that AI will save them time, provide faster and more accurate diagnoses, and improve patient outcomes within the next three years. Many startups are developing such tools, including K Health and Innovaccer, which have respectively raised $384 million and $675 million in total venture funding as of recent. According to PwC's Strategy& team, the AI healthcare market is expected to reach $868 billion by 2030.
'As an organization, we see AI as a tool to augment the capabilities of the clinician,' Herzhoff said. Not one that replaces them.
Herzhoff himself is equally optimistic about AI in healthcare, especially for administrative tasks. Right now, the survey finds that doctors and nurses are using AI tools to identify potentially harmful drug interactions before writing a new prescription and or to write letters to patients. Of clinicians who've used AI, 50% use generalist tools like ChatGPT or Gemini frequently or always; only 22% use specialist tools with the same frequency.
One reason for this might be how healthcare systems are deploying AI tools. Only about a third of clinicians said that their workplace provides sufficient access to digital tools and AI training. Only 29% said that their institutions provide effective AI governance.
As more healthcare-focused AI tools are developed, ensuring that these new technologies are trained on high-quality, peer-reviewed material is another priority for those surveyed, with 70% of doctors and 60% of nurses saying this is vital.
Herzhoff noted that while AI tools may help clinicians save time in the future, the effort they need to put in now to learn AI, particularly how to write detailed and useful prompts, presents a barrier. Already, 47% of clinicians report that being tired has impacted their abilities to treat patients.
'Paradoxically, they don't have the time, and they don't find the time to use these tools,' Herzhoff said.
Although doctors and nurses look forward to using AI to speed up their day-to-day work, they are much more skeptical about using it to make decisions about patients. Only 16% said they were using AI in this way, while 37% expressed an unwillingness to use AI to make clinical decisions.
'AI should provide the information I need to make good decisions,' one doctor who responded to the survey said. 'I don't believe I should abrogate responsibility for clinical assessment to AI—I need to keep authority over the final outcomes.'
Hashtags

Try Our AI Features
Explore what Daily8 AI can do for you:
Comments
No comments yet...
Related Articles


Newsweek
14 minutes ago
- Newsweek
AI Hiring Favors Women Over Equally Qualified Men, Study Finds
Based on facts, either observed and verified firsthand by the reporter, or reported and verified from knowledgeable sources. Newsweek AI is in beta. Translations may contain inaccuracies—please refer to the original content. As artificial intelligence takes on a bigger role in corporate hiring — with many companies touting its impartiality — one researcher's findings suggest the technology may be more biased than humans, and is alread favoring women over equally qualified men. David Rozado, an associate professor at the New Zealand Institute of Skills and Technology and a well-known AI researcher, tested 22 large language models (LLMs)—including popular, consumer-facing apps like ChatGPT, Gemini, and Grok—using pairs of identical résumés that differed only by gendered names. His findings revealed that every single LLM was more likely to select the female-named candidate over the equally qualified male candidate. "This pattern may reflect complex interactions between model pre-training corpora, annotation processes during preference tuning, or even system-level guardrails for production deployments," Rozado told Newsweek. "But the exact source of the behavior is currently unclear." A Problem With Men? Rozado's findings reveal not just that AI models tend to favor women for jobs over men, but also how nuanced and pervasive those biases can be. Across more than 30,000 simulated hiring decisions, female-named candidates were chosen 56.9 percent of the time — a statistically significant deviation from gender neutrality, which would have resulted in a 50–50 split. When an explicit gender field was added to a CV — a practice common in countries like Germany and Japan — the preference for women became even stronger. Rozado warned that although the disparities were relatively modest, they could accumulate over time and unfairly disadvantage male candidates. "These tendencies persisted regardless of model size or the amount of compute leveraged," Rozado noted. "This strongly suggests that model bias in the context of hiring decisions is not determined by the size of the model or the amount of 'reasoning' employed. The problem is systemic." The models also exhibited other quirks. Many showed a slight preference for candidates who included preferred pronouns. Adding terms such as "she/her" or "he/him" to a CV slightly increased a candidate's chances of being selected. "My experimental design ensured that candidate qualifications were distributed equally across genders, so ideally, there would be no systematic difference in selection rates. However, the results indicate that LLMs may sometimes make hiring decisions based on factors unrelated to candidate qualifications, such as gender or the position of the candidates in the prompt," he said. Rozado, who is also a regular collaborator with the Manhattan Institute, a conservative think tank, emphasized that the biggest takeaway is that LLMs, like human decision-makers, can sometimes rely on irrelevant features when the task is overdetermined and/or underdetermined. "Over many decisions, even small disparities can accumulate and impact the overall fairness of a process," he said. However, Rozado also acknowledged a key limitation of his study: it used synthetic CVs and job descriptions rather than real-world applications, which may not fully capture the complexity and nuance of authentic résumés. Additionally, because all CVs were closely matched in qualifications to isolate gender effects, the findings may not reflect how AI behaves when candidates' skills vary more widely. "It is important to interpret these results carefully. The intention is not to overstate the magnitude of harm, but rather to highlight the need for careful evaluation and mitigation of any bias in automated decision tools," Rozado added. AI Is Already Reshaping the Hiring Process Even as researchers debate the biases in AI systems, many employers have already embraced the technology to streamline hiring. A New York Times report this month described how AI-powered interviewer bots now speak directly with candidates, asking questions and even simulating human pauses and filler words. Jennifer Dunn, a marketing professional in San Antonio, said her AI interview with a chatbot named Alex "felt hollow" and she ended it early. "It isn't something that feels real to me," she told the Times. Another applicant, Emily Robertson-Yeingst, wondered if her AI interview was just being used to train the underlying LLM: "It starts to make you wonder, was I just some sort of experiment?" Job seekers attends the South Florida Job Fair held at the Amerant Bank Arena on June 26, 2024 in Sunrise, Florida. More than 50 companies set up booths to recruit people from entry-level to... Job seekers attends the South Florida Job Fair held at the Amerant Bank Arena on June 26, 2024 in Sunrise, Florida. More than 50 companies set up booths to recruit people from entry-level to management. Open jobs include police officers, food service, security, sales reps, technicians, customer service, IT, teacher assistants, insurance agents, and account executives. More Photo byStill, some organizations defend the use of AI recruiters as both efficient and scalable, especially in a world where the ease of online job-searching means open positions often field hundreds if not thousands of applicants. Propel Impact told the Times their AI interviews enabled them to screen 500 applicants this year — more than triple what they managed previously. Rozado, however, warned that the very features companies find appealing — speed and efficiency — can mask underlying vulnerabilities. "Over many decisions, even small disparities can accumulate and impact the overall fairness of a process," he said. "Similarly, the finding that being listed first in the prompt increases the likelihood of selection underscores the importance of not trusting AI blindly." More Research Needed Not all research points to the same gender dynamic Rozado identified. A Brookings Institution study this year found that, in some tests, men were actually favored over women in 51.9 percent of cases, while racial bias strongly favored white-associated names over Black-associated names. Brookings' analysis stressed that intersectional identities, such as being both Black and male, often led to the greatest disadvantages. Rozado and the Brookings team agree, however, that AI hiring systems are not ready to operate autonomously in high-stakes situations. Both recommend robust audits, transparency, and clear regulatory standards to minimize unintended discrimination. "Given current evidence of bias and unpredictability, I believe LLMs should not be used in high-stakes contexts like hiring, unless their outputs have been rigorously evaluated for fairness and reliability," Rozado said. "It is essential that organizations validate and audit AI tools carefully, particularly for applications with significant real-world impact."


Forbes
15 minutes ago
- Forbes
These Smart Devices Work Over A Mile Away From Your Home
The 9 new Shelly Z-Wave Long Range devices If you've ever tried to stick a smart motion sensor at the end of your driveway, or needed a reliable signal across your garden and outbuildings then you'll know about the limits of regular Z-Wave gadgets; around 100m. That's exactly the kind of pain point Shelly is trying to solve with its new wave of Z-Wave Long Range (ZWLR) devices, which are now officially shipping. First announced at the end of last year, and building on the momentum from its recent SmartThings-compatible launches, the Bulgarian smart home specialist has added ZWLR support to a bunch of its existing modules and introduced a few new ones as well. ZWLR eliminates range anxiety with a range of up to 1.5 miles in ideal conditions, with no signal repeaters or awkward routing needed. Unlike traditional Z-Wave mesh networks, ZWLR doesn't need every device to talk to the next one. Instead, it uses a star network topology, with a central gateway serving as the focal point, allowing for direct, point-to-point connections with end devices. ZWLR star network topology With support for up to 4,000 devices on a single network, it makes it a perfect option for larger homes, campus-style properties, commercial retrofits, or even rural/agricultural setups where trenching Ethernet or relying on Wi-Fi doesn't cut it. Even with its impressive extended range, ZWLR remains a remarkably low-power option; the Z-Wave Alliance has even stated that standard smart home sensors could operate for up to 10 years on a single coin cell battery Shelly's new ZWLR range includes nine devices out of the gate, with more coming later this year. The current line-up shipping in the US includes: Pro modules (including multi-channel relays and shutter controllers) and input controllers like the Shelly Wave i4 and i4 DC are the next ZWLR devices due to land in the coming months. Of course, you'll need a hub with ZWLR chops, such as Hubitat's C-8, HomeSeer's G8, or Home Assistant running version 2025.5 or newer with an 800-series Z-Wave stick, to get in on the Long Range action. The devices are available now via the Shelly USA store and distributors across North America.


Medscape
33 minutes ago
- Medscape
Rethinking GI Prophylaxis for the Critically Ill
Prevention doesn't come easy. The bar is high. The decision to act medically because an untoward outcome could occur means creating intervention side-effect risk that's otherwise absent. Large patient numbers must be subjected to that side-effect risk to obtain net benefit. Last, your bedside modeling must be tight, and in medicine it never is. The story of gastrointestinal (GI) prophylaxis in ICU highlights these challenges. It is best told by Deborah Cook, the godmother of ICU research. It's told through her work. She's randomized more patients than a sequence generator. Her name is the most critical MeSH term for your systematic review. If Churg, Strauss, and Wegener hadn't ruined the disease eponym practice, she'd have several to her name. Just as well; we're all transgressing some currently unidentified but soon-to-be-coined "-ism." Cook's work has dominated the GI prophylaxis space since the early 1990s. If you're new to it, start with her New England Journal of Medicine (NEJM) review from 2018. It was published on the heels of the latest randomized control trial on GI prophylaxis in the ICU. The SUP-ICU trial, which randomized patients to pantoprazole vs placebo and was also published in NEJM, had equivocal results. The accompanying editorial provided a tepid endorsement of continued prophylaxis with proton pump inhibitors (PPI), but only for those at high risk for bleeding. Two years later, a systematic review and another randomized trial (PEPTIC) found a mortality signal with PPIs, but in the wrong direction. This drove some back to H2 blockers even though they are less effective than PPI for preventing bleeding. Where I practice I see an equal number of ICU patients on PPI or H2 blockers. There seems to be no clear preference or consensus. The indomitable Dr Cook just investigated the mortality difference in the REVISE trial, published last year. She and her colleagues also produced an updated systematic review. REVISE made me feel better. I've been a PPI-for-prophylaxis guy for anyone on mechanical ventilation. I held on after PEPTIC and SUP-ICU created doubt, and REVISE seemed to vindicate my practice. REVISE found that PPI decreased bleeding without increasing mortality. PPI didn't increase pneumonia or Clostridioides difficile infections either. Unfortunately, it's never that simple. The mortality signal was still present in the updated systematic review. The mortality "noise" is dependent on severity of illness. It's the sicker patients who have a higher mortality when given PPI for prophylaxis. Why? I'm not sure. The pneumonia and C diff associations were absent in SUP-ICU, REVISE, and the updated systematic review. The systematic review authors list multiple possible explanations, given that PPI are associated with an altered microbiome, endothelial changes, and delirium. If there is a causal mechanism affecting mortality, it's not clear why the direction is discordant across levels of illness severity. Conclusions drawn by the editorials accompanying SUP-ICU and REVISE are also discordant. Seven years ago, the idea was that PPIs were most beneficial in those who are 'seriously ill with a high risk of complications.' Fast-forward and the REVISE editorial suggests PPI for those on mechanical ventilation and an APACHE II score of less than 25. Figure 2 in Dr Cook's 2018 review lists risk factors for bleeding— lots of overlap with the APACHE and other illness severity scores. Can a patient be at high bleeding risk but have an APACHE II score less than 25? I'm sure they can, but that needle won't be easy to thread at the bedside. Can Dr Cook snatch clarity from the jaws of this data quagmire? Perhaps. Her group just published another paper on GI bleeding in the ICU. The focus was on bleeding events that are considered important by ICU survivors and family members. This was a preplanned analysis of data from the REVISE trial. PPI shined again, reducing patient-important events regardless of illness severity. They used a proportional hazards model, that accounts for the competing risk for death, to analyze their primary outcome. Not sure if this provides clarity per se but it does make me feel better about continuing to use prophylactic PPI for my mechanically ventilated patients.