Why is AI halllucinating more frequently, and how can we stop it?
The more advanced artificial intelligence (AI) gets, the more it "hallucinates" and provides incorrect and inaccurate information.
Research conducted by OpenAI found that its latest and most powerful reasoning models, o3 and o4-mini, hallucinated 33% and 48% of the time, respectively, when tested by OpenAI's PersonQA benchmark. That's more than double the rate of the older o1 model. While o3 delivers more accurate information than its predecessor, it appears to come at the cost of more inaccurate hallucinations.
This raises a concern over the accuracy and reliability of large language models (LLMs) such as AI chatbots, said Eleanor Watson, an Institute of Electrical and Electronics Engineers (IEEE) member and AI ethics engineer at Singularity University.
"When a system outputs fabricated information — such as invented facts, citations or events — with the same fluency and coherence it uses for accurate content, it risks misleading users in subtle and consequential ways," Watson told Live Science.
Related: Cutting-edge AI models from OpenAI and DeepSeek undergo 'complete collapse' when problems get too difficult, study reveals
The issue of hallucination highlights the need to carefully assess and supervise the information AI systems produce when using LLMs and reasoning models, experts say.
The crux of a reasoning model is that it can handle complex tasks by essentially breaking them down into individual components and coming up with solutions to tackle them. Rather than seeking to kick out answers based on statistical probability, reasoning models come up with strategies to solve a problem, much like how humans think.
In order to develop creative, and potentially novel, solutions to problems, AI needs to hallucinate —otherwise it's limited by rigid data its LLM ingests.
"It's important to note that hallucination is a feature, not a bug, of AI," Sohrob Kazerounian, an AI researcher at Vectra AI, told Live Science. "To paraphrase a colleague of mine, 'Everything an LLM outputs is a hallucination. It's just that some of those hallucinations are true.' If an AI only generated verbatim outputs that it had seen during training, all of AI would reduce to a massive search problem."
"You would only be able to generate computer code that had been written before, find proteins and molecules whose properties had already been studied and described, and answer homework questions that had already previously been asked before. You would not, however, be able to ask the LLM to write the lyrics for a concept album focused on the AI singularity, blending the lyrical stylings of Snoop Dogg and Bob Dylan."
In effect, LLMs and the AI systems they power need to hallucinate in order to create, rather than simply serve up existing information. It is similar, conceptually, to the way that humans dream or imagine scenarios when conjuring new ideas.
However, AI hallucinations present a problem when it comes to delivering accurate and correct information, especially if users take the information at face value without any checks or oversight.
"This is especially problematic in domains where decisions depend on factual precision, like medicine, law or finance," Watson said. "While more advanced models may reduce the frequency of obvious factual mistakes, the issue persists in more subtle forms. Over time, confabulation erodes the perception of AI systems as trustworthy instruments and can produce material harms when unverified content is acted upon."
And this problem looks to be exacerbated as AI advances. "As model capabilities improve, errors often become less overt but more difficult to detect," Watson noted. "Fabricated content is increasingly embedded within plausible narratives and coherent reasoning chains. This introduces a particular risk: users may be unaware that errors are present and may treat outputs as definitive when they are not. The problem shifts from filtering out crude errors to identifying subtle distortions that may only reveal themselves under close scrutiny."
Kazerounian backed this viewpoint up. "Despite the general belief that the problem of AI hallucination can and will get better over time, it appears that the most recent generation of advanced reasoning models may have actually begun to hallucinate more than their simpler counterparts — and there are no agreed-upon explanations for why this is," he said.
The situation is further complicated because it can be very difficult to ascertain how LLMs come up with their answers; a parallel could be drawn here with how we still don't really know, comprehensively, how a human brain works.
In a recent essay, Dario Amodei, the CEO of AI company Anthropic, highlighted a lack of understanding in how AIs come up with answers and information. "When a generative AI system does something, like summarize a financial document, we have no idea, at a specific or precise level, why it makes the choices it does — why it chooses certain words over others, or why it occasionally makes a mistake despite usually being accurate," he wrote.
The problems caused by AI hallucinating inaccurate information are already very real, Kazerounian noted. "There is no universal, verifiable, way to get an LLM to correctly answer questions being asked about some corpus of data it has access to," he said. "The examples of non-existent hallucinated references, customer-facing chatbots making up company policy, and so on, are now all too common."
Both Kazerounian and Watson told Live Science that, ultimately, AI hallucinations may be difficult to eliminate. But there could be ways to mitigate the issue.
Watson suggested that "retrieval-augmented generation," which grounds a model's outputs in curated external knowledge sources, could help ensure that AI-produced information is anchored by verifiable data.
"Another approach involves introducing structure into the model's reasoning. By prompting it to check its own outputs, compare different perspectives, or follow logical steps, scaffolded reasoning frameworks reduce the risk of unconstrained speculation and improve consistency," Watson, noting this could be aided by training to shape a model to prioritize accuracy, and reinforcement training from human or AI evaluators to encourage an LLM to deliver more disciplined, grounded responses.
RELATED STORIES
—AI benchmarking platform is helping top companies rig their model performances, study claims
—AI can handle tasks twice as complex every few months. What does this exponential growth mean for how we use it?
—What is the Turing test? How the rise of generative AI may have broken the famous imitation game
"Finally, systems can be designed to recognise their own uncertainty. Rather than defaulting to confident answers, models can be taught to flag when they're unsure or to defer to human judgement when appropriate," Watson added. "While these strategies don't eliminate the risk of confabulation entirely, they offer a practical path forward to make AI outputs more reliable."
Given that AI hallucination may be nearly impossible to eliminate, especially in advanced models, Kazerounian concluded that ultimately the information that LLMs produce will need to be treated with the "same skepticism we reserve for human counterparts."
Hashtags

Try Our AI Features
Explore what Daily8 AI can do for you:
Comments
No comments yet...
Related Articles

Yahoo
8 minutes ago
- Yahoo
Thoma Bravo nearing deal to acquire software firm Dayforce, source says
(Reuters) -Private equity firm Thoma Bravo is nearing a deal to acquire HR software company Dayforce, a source familiar with the matter told Reuters on Monday. Shares of Dayforce, which has seen its stock lose more than 27% of its value so far this year, were up about 29% in late-afternoon trading. Dayforce had a market value of $8.44 billion as of Friday's closing price. Dayforce offers a cloud-based human capital management platform that includes payroll, workforce management, benefits, talent management, compliance and analytics. Thoma Bravo has been actively pursuing software acquisitions this year, betting on the build-out of artificial intelligence and the resilience of recurring revenue in a volatile economy. A deal could be announced as early as this week or possibly next week, the source said. Thoma Bravo and Dayforce did not immediately respond to requests for comment. Bloomberg News, which first reported the potential deal on Sunday, said that while the talks were advanced, they could still be delayed or falter. Dayforce beat Wall Street expectations for second-quarter revenue and raised its annual revenue forecast last week, as more enterprises increase use of AI and cloud-based platforms to run day-to-day operations. Sign in to access your portfolio


Chicago Tribune
9 minutes ago
- Chicago Tribune
Bradshaw: Advice for high schoolers entering the AI era
Dear Freshman, You're starting high school with an advantage: you already know you're interested in math and computers. That focus can set you apart — but you need to understand the world you're stepping into. By the time you graduate, artificial intelligence will be doing much of the work people train years for today. Coding simple programs, solving standard math problems, even designing basic websites — AI can already do these things faster, cheaper, and often better than most humans. And it's improving. The newest version of Chat GPT-5 was released last Thursday. It's a doctorate-level expert on any subject. Anyone can create software by typing in simple English language prompts. It's called 'vibe coding.' That's not a reason to give up. It's a reason to aim higher. Your goal isn't just to learn skills — it's to learn how to think, adapt, and work with AI, so you're the one directing the tools, not the one being replaced by them. 1. Build your math foundation — because reasoning is still human territory. Math as a set of procedures is easy for AI. Math as a way of thinking is still a human edge. Algebra and geometry aren't just boxes to check—they're your training ground for logical reasoning, problem decomposition, and spotting errors. AI can solve a problem, but it often can't judge whether the problem makes sense. Aim for mastery, not speed. If you take calculus by senior year, great — but the bigger win is learning to frame problems, question assumptions, and verify solutions, especially when AI hands you an answer. Those skills translate into every field AI will touch — which is all of them. 2. Learn to code — but as a designer, not just a typist. Yes, AI can write code. In fact, it can write decent code with just a short prompt. That means your value isn't in typing every line — it's in knowing what to build, why it matters, and how to guide the AI to produce it. Python is still a great starting point, but think of it as learning to read and write in a new language so you can collaborate with AI fluently. The earlier you understand the structure of programs, the easier it will be to spot AI's mistakes, combine AI-generated components into something original, and add the creativity and judgment that machines still lack. 3. Join competitions and projects — but choose ones AI can't dominate. Math competitions and coding hackathons are still valuable but understand the landscape: AI can already solve many contest style problems. The human advantage now is in creative problem framing, strategy, and interpreting messy, incomplete data. Look for contests or projects that require innovation, interdisciplinary thinking, or human insight — like robotics design, ethical AI challenges, or data projects tied to real world communities. If your school doesn't have a club that takes this approach, start one. Colleges will notice a student who organizes an 'AI + Society' club more than another generic coding group. 4. Make summers your AI-era laboratory. Summer projects matter more than ever — but the projects that will stand out are those that combine AI with something unique to you. Building yet another calculator app won't impress anyone. Using AI to analyze local environmental data and present it to city planners? That's original. By the time you're applying to college, admissions officers will see thousands of AI-assisted projects. The ones that stand out will be those where the student clearly drove the vision, used AI as a partner, and produced something tied to a personal interest or local need. 5. Read widely — especially about how technology reshapes society. The technical history of people like Alan Turing and Grace Hopper is still inspiring. But now you should also study the thinkers wrestling with AI's impact — economists, ethicists, historians of technology. Understand not only how to build a tool, but how that tool changes jobs, politics, and even human relationships. Books like 'Life 3.0' by Max Tegmark or 'Prediction Machines' by Agrawal, Gans, and Goldfarb will give you a broader view of AI's role in the economy you're heading into. 6. Communication is no longer optional, it's survival. Ironically, as AI gets better at writing and speaking, human communication skills are becoming more valuable. In your high school years, practice turning complex, AI-assisted work into clear, persuasive presentations. Lead a meeting, explain your process, write a compelling project report. These skills will help you manage AIdriven teams later on. 7. Treat curiosity as your competitive advantage. AI is trained on the past. Your edge is seeing possibilities that aren't in its data yet. That's why you should follow your curiosity beyond the obvious — physics, economics, art, and philosophy. Many of the breakthroughs in AI itself come from unexpected intersections of disciplines. When something sparks your interest, chase it down — talk to experts, experiment, connect it back to your math and computer skills. The more unique the mix of your knowledge is, the harder you are to replace. No better place to ask these questions than as a student on the high school newspaper. 8. Find mentors who are already living in the AI-augmented world. Seek out people who use AI in their work today — engineers, doctors, entrepreneurs. Ask not just how they use the tools, but how those tools are changing the nature of their jobs. Learn what tasks have been automated, what new opportunities have opened up, and where the human role is shifting. Here's the truth: by the time you finish high school, AI will be far better at many of the skills schools still test you on. But the people who thrive won't be the ones competing with AI on speed or memory — they'll be the ones orchestrating it, combining its output with human insight, creativity, and values. Your mission over the next four years is to train yourself to think critically, work adaptively, and use AI as a force multiplier for your own ideas. That's not just how you protect your future, it's how you lead it.


Fast Company
9 minutes ago
- Fast Company
Philips CEO Jeff DiLullo on how AI is changing healthcare today
AI is quietly reshaping the efficiency, power, and potential of U.S. healthcare, even as government health policy and spending drastically shift. Philips, the legacy electronics manufacturer turned medtech provider, is leading the AI healthcare revolution, streamlining and accelerating the workflow of patient care. Philips North America CEO Jeff DiLullo shares how technology can have the biggest impact on health outcomes today—from radiology scans to cancer diagnoses, and what it takes for leaders in any industry to rethink the way we work to best meet the moment. This is an abridged transcript of an interview from Rapid Response, hosted by the former editor-in-chief of Fast Company Bob Safian. From the team behind the Masters of Scale podcast, Rapid Response features candid conversations with today's top business leaders navigating real-time challenges. Subscribe to Rapid Response wherever you get your podcasts to ensure you never miss an episode. AI seems to be changing everything. There's a lot of talk about it, but in some businesses, I feel like the conversation about it is ahead of the actual implementation or the impact, and I'm curious how true that might be in medtech. How is AI impacting things now, today, versus what you think it can do in the future? If you remember, we released the Future of Health Index. One of the things that we realized is that AI, in some of these compartments I'm talking about, is quite mature. FDA cleared, very safe for clinical use. Other areas, it's more experimental. But the trust factor of the use of that AI is actually quite nascent. It's the biggest barrier right now to larger scale deployment. Yeah. That health index that you mentioned, the 2025 Future Health Index, I mean, there was this sort of trust gap in it, right? That something like 60, 65% of clinicians trust AI, but only about a third of patients or certainly older patients do. How do you bridge that gap? Is it Philips's job to bridge that gap? Whose job is it? So I have the benefit of having two Gen Zs and a millennial, they are digitally fluid. They don't worry at all about the AI models that are coming on the other side of this because they're used to it and they understand it. Older patients, not so much. The magic is always the healthcare practitioner that's directly interfacing with the customers or the patients. If they believe what they're doing, if they know it's credible, if they're using it to augment their analysis or their diagnostics, not replacing it, I think ultimately we'll see an uplift. It's our job to provide valid FDA-cleared, very good diagnostic capability leveraging AI. But if our doctors and nurses believe what we're doing and they see the value in increasing their time with patients and also a little de-stressing, we think it's going to really pick up in a parabolic way in the next few years, at least in health. I can understand and see how AI can quickly help some of the back office functionality in healthcare, but you're talking about for practitioners, right? How does that practically work today? So I'm going to give you, let's talk radiology. It's the biggest field right now, diagnostic, right? The earlier the diagnostic, the better the outcome most likely. And when I think of a radiologist, I have to wait a month and a half. I'm in a pretty nice part of Vanderbilt University area, like a lot of health tech around me in Nashville, but I've got to wait over a month to get a scan. So in radiology, we start with the box or the design, right? I have an MRI that is highly efficient. I can move it around, I can put it on a truck. But today, I can get a scan done in half or even a third of the time. The AI built into the system software makes it much faster. Just a few months ago, I had a scan that took only 20 minutes—whereas a couple of years ago, the same scan would have taken about 45 minutes. The smart speed that we have on the system actually compresses the scanning time. It doesn't fill in the blanks, it removes the noise. You actually get a better scan in a shorter time. If you're a radiologist having to do 12 or 15 studies a day, but you can do 20 studies a day, I get more patients through, I drive more reimbursement, it's better for the hospital, it's better for patient care. Then I take it into workflow, and today I can pinpoint things that are happening in that digital image and send it to a radiologist and say, 'You should look here,' in just very simple speak. It's very complicated stuff, but the AI is already mainstream today where we can actually pinpoint areas for radiologists to look at and make a determination. I can digitize the whole process today with digital pathology. And I can have a finding where somebody's waiting, do I have cancer or not? I can do this in hours now because it's all digital. And that kind of workflow and orchestration is a game changer. And the issue of AI hallucinations, which show up with some of the generative AI things, does that apply to healthcare? Are there different kinds of safeguards? Because I guess there's a human who's checking. There's so many things today, like smart speed I just talked about, being able to run that radiology workflow to compress the time of diagnostics, run the tumor boards in hours, on-demand meetings like you and I would on Zoom or teams, all of that is happening today, but not happening at the pace it could. My point is, go do that right now. Every health system, go do that. As you start to unpack these more generative AI models, I think there's real reason to be cautious and make sure we have the right controls and the governance on them, but not experimenting in them also is not an option. We kind of have to. But we see leading institutions, MGB, Stanford, Mount Sinai in New York, we see them really working with population health data to really try to train models on very specific and even broad use cases. There's so much to do right now. In other words, you don't have to go all the way out to the silver bullet of, we're going to live forever or we're going to solve every health problem. You can make the system we have right now more efficient and more effective today. Bob, when you first drove a car, was the first thing you did to go to the Autobahn? Probably not. There's so much to do in the neighborhood. There's so much to do in my town that I can really get good at what we're doing and drive productivity at scale. You need to have the innovation and the creativity to get us to the next place, but 80% of it we can do today. That is just game-changing in terms of how we deliver today, and that's what we think is really the next opportunity here for healthcare. And I think that'll happen with what's mature in AI and virtual capabilities in the next few years because the need is so great.