
OpenAI's latest AI models report high ‘hallucination' rate: What does it mean — and why is this significant?
A technical report released by artificial intelligence (AI) research organisation OpenAI last month found that the company's latest models — o3 and o4-mini — generate more errors than its older models. Computer scientists call the errors made by chatbots 'hallucinations'.
The report revealed that o3 — OpenAI's most powerful system — hallucinated 33% of the time when running its PersonQA benchmark test, which involves answering questions about public figures. The o4-mini hallucinated at 48%.
To make matters worse, OpenAI said it does not even know why these models are hallucinating more than their predecessors.
Here is a look at what AI hallucinations are, why they happen, and why the new report about OpenAI's models is significant.
When the term AI hallucinations began to be used to refer to errors made by chatbots, it had a very narrow definition. It was used to refer to those instances when AI models would give fabricated information as output. For instance, in June 2023, a lawyer in the United States admitted using ChatGPT to help write a court filing as the chatbot had added fake citations to the submission, which pointed to cases that never existed.
Today, hallucination has become a blanket term for various types of mistakes made by chatbots. This includes instances when the output is factually correct but not actually relevant to the question that was asked.
ChatGPT, o3, o4-mini, Gemini, Perplexity, Grok and many more are all examples of what are known as large language models (LLMs). These models essentially take in text inputs and generate synthesised outputs in the form of text.
LLMs are able to do this as they are built using massive amounts of digital text taken from the Internet. Simply put, computer scientists feed these models a lot of text, helping them identify patterns and relationships within that text, and predict text sequences and produce some output in response to a user's input (known as a prompt).
Note that LLMs are always making a guess while giving an output. They do not know for sure what is true and what is not — these models cannot even fact-check their output against, let's say, Wikipedia like humans can.
LLMs 'know what words are and they know which words predict which other words in the context of words. They know what kinds of words cluster together in what order. And that's pretty much it. They don't operate like you and me,' scientist Gary Marcus wrote on his Substack, Marcus on AI.
As a result, when an LLM is trained on, for example, inaccurate text, they give inaccurate outputs, thereby hallucinating.
However, even accurate text cannot stop LLMs from making mistakes. That's because to generate new text (in response to a prompt), these models combine billions of patterns in unexpected ways. So, there is always a possibility that LLMs give fabricated information as output.
And as LLMs are trained on vast amounts of data, experts do not understand why they generate a particular sequence of text at a given moment.
Hallucination has been an issue with AI models from the start, and big AI companies and labs, in the initial years, repeatedly claimed that the problem would be resolved in the near future. It did seem possible, as after they were first launched, models tended to hallucinate less with each update.
However, after the release of the new report about OpenAI's latest models, it has increasingly become clear that hallucination is here to stay. Also, the issue is not limited to just OpenAI. Other reports have shown that Chinese startup DeepSeek's R-1 model has double-digit rises in hallucination rates compared with previous models from the company.
This means that the application of AI models has to be limited, at least for now. They cannot be used, for example, as a research assistant (as models create fake citations in research papers) or a paralegal-bot (because models give imaginary legal cases).
Computer scientists like Arvind Narayanan, who is a professor at Princeton University, think that, to some extent, hallucination is intrinsic to the way LLMs work, and as these models become more capable, people will use them for tougher tasks where the failure rate will be high.
In a 2024 interview, he told Time magazine, 'There is always going to be a boundary between what people want to use them [LLMs] for, and what they can work reliably at… That is as much a sociological problem as it is a technical problem. And I do not think it has a clean technical solution.'

Try Our AI Features
Explore what Daily8 AI can do for you:
Comments
No comments yet...
Related Articles


Economic Times
6 hours ago
- Economic Times
France's Mistral unveils its first 'reasoning' AI model
French artificial intelligence startup Mistral on Tuesday announced a so-called "reasoning" model it said was capable of working through complex problems, following in the footsteps of top US developers. Available immediately on the company's platforms as well as the AI platform Hugging Face, the Magistral "is designed to think things through -- in ways familiar to us," Mistral said in a blog post. The AI was designed for "general purpose use requiring longer thought processing and better accuracy" than its previous generations of large language models (LLMs), the company other "reasoning" models, Magistral displays a so-called "chain of thought" that purports to show how the system is approaching a problem given to it in natural means users in fields like law, finance, healthcare and government would receive "traceable reasoning that meets compliance requirements" as "every conclusion can be traced back through its logical steps", Mistral said. The company's claim gestures towards the challenge of so-called "interpretability" -- working out how AI systems arrive at a given response. Since they are "trained" on gigantic corpuses of data rather than directly programmed by humans, much behaviour by AI systems remains impenetrable even to their also vaunted improved performance in software coding and creative writing by Magistral. Competing "reasoning" models include OpenAI's o3, some versions of Google's Gemini and Anthropic's Claude, or Chinese challenger DeepSeek's R1. The idea that AIs can "reason" was called into question this week by Apple -- the tech giant that has struggled to match achievements by leaders in the Apple researchers published a paper called "The Illusion of Thinking" that claimed to find "fundamental limitations in current models" which "fail to develop generalizable reasoning capabilities beyond certain complexity thresholds".


NDTV
6 hours ago
- NDTV
What Is Meta's 'Superintelligence' That Zuckerberg Is Hiring Personally For
Quick Read Summary is AI generated, newsroom reviewed. Meta CEO Mark Zuckerberg is leading efforts to develop artificial general intelligence (AGI). He is assembling a high-profile "superintelligence" team with lucrative compensation packages. Zuckerberg's hands-on approach includes inviting top recruits to his homes to attract talent. Meta CEO Mark Zuckerberg is taking a hands-on approach to one of tech's most ambitious frontiers: building artificial general intelligence (AGI). According to The New York Times and Bloomberg, Zuckerberg is personally assembling a new 'superintelligence' team—offering eye-popping compensation packages that reportedly reach into nine figures—to create what could become the world's most advanced AI platform. AGI, while still a theoretical concept, refers to AI that can outperform humans across a broad range of cognitive tasks. Many experts believe such capabilities may be decades away—if they are possible at all—but Zuckerberg is betting big on accelerating that timeline. Frustrated with Meta's current pace of AI development, the tech billionaire has taken direct control of the project. Bloomberg reports that he's gone as far as inviting top recruits to his homes in Lake Tahoe and Palo Alto in a bid to woo them personally. He's also reorganised his office so that the new AI team, currently about 50 members strong, sits close to him. Among the high-profile hires is Alexandr Wang, founder of Scale AI, who is expected to play a leading role in a new Meta research lab focused on building AI systems with intelligence surpassing the human brain. In a fierce race with rivals like OpenAI (creator of GPT-4) and Google DeepMind (developer of the Gemini models), Meta is offering massive compensation packages—ranging from high six figures to well into the nine-figure range—to lure top AI talent from across the industry. Some of these offers have already been accepted. Zuckerberg's push signals not only a new phase for Meta, but also intensifies the global competition to dominate the future of AI.


Time of India
6 hours ago
- Time of India
ChatGPT still down: Here's what OpenAI has to say as outage continues for 12-plus hours
OpenAI 's ChatGPT service has been experiencing an outage globally for more than seven hours, likely marking the longest period of downtime in the popular AI chatbot's history. The disruption, which began around 12.25 in India and midnight Pacific Time, has affected users globally and led to increasing reports of service unavailability. OpenAI has acknowledged the outage on X and has directed inquiries to the company's status page. 'We are observing elevated error rates and latency across ChatGPT and the API. Our engineers have identified the root cause and are working as fast as possible to fix the issue,' the company said, adding, 'For updates see our status page'. by Taboola by Taboola Sponsored Links Sponsored Links Promoted Links Promoted Links You May Like ¿Tienes $105? Inviértelos en CFD de Amazon y observa cómo crecen sin salir de casa. Empezar ahora Subscríbete Undo Meanwhile, on the Status Page, OpenAI says that it has already identified the issues, patched them and is monitoring the recovery. 'We are still monitoring the fix and we are working towards full recovery,' the company says. The affected services include ChatGPT AI chatbot and APIs. Previously, the company also announced that people have also witnessed degraded performance in Sora – its AI video maker. This is the longest ChatGPT outage in its history This outage has affected users across the world for about 12 hours now. Previously, In December last year, OpenAI's popular ChatGPT chatbot, its Sora AI video generator and its developer-facing API experienced a significant outage. At that time, the disruption lasted for around 6 hours – from 3 PM Pacific Time (3:30 am IST), with services largely back online by 9 PM PT (9.30 am IST). 'ChatGPT, API, and Sora were down today but we've recovered,' OpenAI stated in a tweet following the restoration of services. In February, the AI chatbot global outage, which lasted for around two hours.