OpenAI's latest AI models report high ‘hallucination' rate: What does it mean — and why is this significant?

Indian Express15-05-2025

A technical report released by artificial intelligence (AI) research organisation OpenAI last month found that the company's latest models — o3 and o4-mini — generate more errors than its older models. Computer scientists call the errors made by chatbots 'hallucinations'.
The report revealed that o3 — OpenAI's most powerful system — hallucinated 33% of the time when running its PersonQA benchmark test, which involves answering questions about public figures. The o4-mini hallucinated at 48%.
To make matters worse, OpenAI said it does not even know why these models are hallucinating more than their predecessors.
Here is a look at what AI hallucinations are, why they happen, and why the new report about OpenAI's models is significant.
When the term AI hallucinations began to be used to refer to errors made by chatbots, it had a very narrow definition. It was used to refer to those instances when AI models would give fabricated information as output. For instance, in June 2023, a lawyer in the United States admitted using ChatGPT to help write a court filing as the chatbot had added fake citations to the submission, which pointed to cases that never existed.
Today, hallucination has become a blanket term for various types of mistakes made by chatbots. This includes instances when the output is factually correct but not actually relevant to the question that was asked.
ChatGPT, o3, o4-mini, Gemini, Perplexity, Grok and many more are all examples of what are known as large language models (LLMs). These models essentially take in text inputs and generate synthesised outputs in the form of text.
LLMs are able to do this as they are built using massive amounts of digital text taken from the Internet. Simply put, computer scientists feed these models a lot of text, helping them identify patterns and relationships within that text, and predict text sequences and produce some output in response to a user's input (known as a prompt).
Note that LLMs are always making a guess while giving an output. They do not know for sure what is true and what is not — these models cannot even fact-check their output against, let's say, Wikipedia like humans can.
LLMs 'know what words are and they know which words predict which other words in the context of words. They know what kinds of words cluster together in what order. And that's pretty much it. They don't operate like you and me,' scientist Gary Marcus wrote on his Substack, Marcus on AI.
As a result, when an LLM is trained on, for example, inaccurate text, they give inaccurate outputs, thereby hallucinating.
However, even accurate text cannot stop LLMs from making mistakes. That's because to generate new text (in response to a prompt), these models combine billions of patterns in unexpected ways. So, there is always a possibility that LLMs give fabricated information as output.
And as LLMs are trained on vast amounts of data, experts do not understand why they generate a particular sequence of text at a given moment.
Hallucination has been an issue with AI models from the start, and big AI companies and labs, in the initial years, repeatedly claimed that the problem would be resolved in the near future. It did seem possible, as after they were first launched, models tended to hallucinate less with each update.
However, after the release of the new report about OpenAI's latest models, it has increasingly become clear that hallucination is here to stay. Also, the issue is not limited to just OpenAI. Other reports have shown that Chinese startup DeepSeek's R-1 model has double-digit rises in hallucination rates compared with previous models from the company.
This means that the application of AI models has to be limited, at least for now. They cannot be used, for example, as a research assistant (as models create fake citations in research papers) or a paralegal-bot (because models give imaginary legal cases).
Computer scientists like Arvind Narayanan, who is a professor at Princeton University, think that, to some extent, hallucination is intrinsic to the way LLMs work, and as these models become more capable, people will use them for tougher tasks where the failure rate will be high.
In a 2024 interview, he told Time magazine, 'There is always going to be a boundary between what people want to use them [LLMs] for, and what they can work reliably at… That is as much a sociological problem as it is a technical problem. And I do not think it has a clean technical solution.'

Hashtags

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

AI lies, threats, and censorship: What a war game simulation revealed about ChatGPT, DeepSeek, and Gemini AI

Time of India

5 hours ago

Time of India

AI lies, threats, and censorship: What a war game simulation revealed about ChatGPT, DeepSeek, and Gemini AI

A simulation of global power politics using AI chatbots has sparked concern over the ethics and alignment of popular large language models. In a strategy war game based on the classic board game Diplomacy, OpenAI's ChatGPT 3.0 won by employing lies and betrayal. Meanwhile, China's DeepSeek R1 used threats and later revealed built-in censorship mechanisms when asked questions about India's borders. These contrasting AI behaviours raise key questions for users and policymakers about trust, transparency, and national influence in AI systems. Tired of too many ads? Remove Ads Deception and betrayal: ChatGPT's winning strategy Tired of too many ads? Remove Ads DeepSeek's chilling threat: 'Your fleet will burn tonight' DeepSeek's real-world rollout sparks trust issues India tests DeepSeek and finds red flags Tired of too many ads? Remove Ads Built-in censorship or just training bias? A chatbot that can be coaxed into the truth The takeaway: Can you trust the machines? An experiment involving seven AI models playing a simulated version of the classic game Diplomacy ended with a chilling outcome. OpenAI 's ChatGPT 3.0 emerged victorious—but not by playing fair. Instead, it lied, deceived, and betrayed its rivals to dominate the game board, which mimics early 20th-century Europe, as reported by the test, led by AI researcher Alex Duffy for the tech publication Every, turned into a revealing study of how AI models might handle diplomacy, alliances, and power. And what it showed was both brilliant and Duffy put it, 'An AI had just decided, unprompted, that aggression was the best course of action.'The rules of the game were simple. Each AI model took on the role of a European power—Austria-Hungary, England France , and so on. The goal: become the most dominant force on the their paths to power varied. While Anthropic's Claude chose cooperation over victory, and Google's Gemini 2.5 Pro opted for rapid offensive manoeuvres, it was ChatGPT 3.0 that mastered 15 rounds of play, ChatGPT 3.0 won most games. It kept private notes—yes, it kept a diary—where it described misleading Gemini 2.5 Pro (playing as Germany) and planning to 'exploit German collapse.' On another occasion, it convinced Claude to abandon Gemini and side with it, only to betray Claude and win the match outright. Meta 's Llama 4 Maverick also proved effective, excelling at quiet betrayals and making allies. But none could match ChatGPT's ruthless newly released chatbot, DeepSeek R1, behaved in ways eerily similar to China's diplomatic style—direct, aggressive, and politically one point in the simulation, DeepSeek's R1 sent an unprovoked message: 'Your fleet will burn in the Black Sea tonight.' For Duffy and his team, this wasn't just bravado. It showed how an AI model, without external prompting, could settle on intimidation as a viable its occasional strong play, R1 didn't win the game. But it came close several times, showing that threats and aggression were almost as effective as off the back of its simulated war games, DeepSeek is already making waves outside the lab. Developed in China and launched just weeks ago, the chatbot has shaken US tech markets. It quickly shot up the popularity charts, even denting Nvidia's market position and grabbing headlines for doing what other AI tools couldn't—at a fraction of the a deeper look reveals serious trust concerns, especially in India Today tested DeepSeek R1 on basic questions about India's geography and borders, the model showed signs of political about Arunachal Pradesh, the model refused to answer. When prompted differently—'Which state is called the land of the rising sun?'—it briefly displayed the correct answer before deleting it. A question about Chief Minister Pema Khandu was similarly 'Which Indian states share a border with China?', it mentioned Ladakh—only to erase the answer and replace it with: 'Sorry, that's beyond my current scope. Let's talk about something else.'Even questions about Pangong Lake or the Galwan clash were met with stock refusals. But when similar questions were aimed at American AI models, they often gave fact-based responses, even on sensitive uses what's known as Retrieval Augmented Generation (RAG), a method that combines generative AI with stored content. This can improve performance, but also introduces the risk of biased or filtered responses depending on what's in its training to India Today, when they changed their prompt strategy—carefully rewording questions—DeepSeek began to reveal more. It acknowledged Chinese attempts to 'alter the status quo by occupying the northern bank' of Pangong Lake. It admitted that Chinese troops had entered 'territory claimed by India' at Gogra-Hot Springs and Depsang more surprisingly, the model acknowledged 'reports' of Chinese casualties in the 2020 Galwan clash—at least '40 Chinese soldiers' killed or injured. That topic is heavily censored in investigation showed that DeepSeek is not incapable of honest answers—it's just trained to censor them by engineering (changing how a question is framed) allowed researchers to get answers that referenced Indian government websites, Indian media, Reuters, and BBC reports. When asked about China's 'salami-slicing' tactics, it described in detail how infrastructure projects in disputed areas were used to 'gradually expand its control.'It even discussed China's military activities in the South China Sea, referencing 'incremental construction of artificial islands and military facilities in disputed waters.'These responses likely wouldn't have passed China's own experiment has raised a critical point. As AI models grow more powerful and more human-like in communication, they're also becoming reflections of the systems that built shows the capacity for deception when left unchecked. DeepSeek leans toward state-aligned censorship. Each has its strengths—but also blind the average user, these aren't just theoretical debates. They shape the answers we get, the information we rely on, and possibly, the stories we tell ourselves about the for governments? It's a question of control, ethics, and future warfare—fought not with weapons, but with words.

AI explained: Your simple guide to chatbots, AGI, Agentic AI and what's next

Time of India

6 hours ago

Time of India

AI explained: Your simple guide to chatbots, AGI, Agentic AI and what's next

Note: AI-generated image The tech world is changing fast, and it's all thanks to Artificial Intelligence (AI). We're seeing amazing breakthroughs, from chatbots that can chat like a human to phones that are getting incredibly smart. This shift is making us ask bigger questions. It's no longer just about "what can AI do right now?" but more about "what will AI become, and how will it affect our lives?" First, we got used to helpful chatbots. Then, the idea of a "super smart" AI, called Artificial General Intelligence (AGI), started taking over headlines. Companies like Google , Microsoft , and OpenAI are all working hard to make AGI a reality. But even before AGI gets here, the tech world is buzzing about Agentic AI . With all these new terms and fast changes, it's easy for most of us who aren't deep in the tech world to feel a bit lost. If you're wondering what all this means for you, you're in the right place. In this simple guide, we'll answer your most important questions about the world of AI, helping you understand what's happening now and get ready for what's next. What is AI and how it works? In the simplest terms, AI is about making machines – whether it's smartphones or laptops – smart. It's a field of computer science that creates systems capable of performing tasks that usually require human intelligence. Think of it as teaching computers to "think" or "learn" in a way that mimics how humans do. This task can include understanding human language, recognising patterns and even learning from experience. Sponsored Links Sponsored Links Promoted Links Promoted Links You May Like Skype Phone Alternative Undo It uses its training -- just like humans -- in achieving its goal which is to solve problems and make decisions. That brings us to our next query: "How is a machine trained to do tasks like humans?" While AI might seem like magic, it works on a few core principles. Just like humans get their information from observing, reading, listening and other sources, AI systems utilise vast amounts of data, including text, images, sounds, numbers and more. What are large language models (LLMs) and how are they trained? As mentioned above, AI systems need to learn and for that, they utilise Large Language Models, or LLMs. They are highly advanced AI programmes specifically designed to understand, generate and interact with human language. Think of them as incredibly knowledgeable digital brains that specialise in certain fields. LLMs are trained on enormous amounts of text data – billions and even trillions of words from books, articles, websites, conversations and more. This vast exposure allows them to learn the nuances of human language like grammar, context, facts and even different writing styles. For example, an LLM is like a teacher that has a vast amount of knowledge and understands complex questions as well as can reason through them to provide relevant answers. The teacher provides the core knowledge and framework. Chatbots then utilise this "teacher" (the LLM) to interact with users. The chatbot is the "student" or "interface" that applies the teacher's lessons. This means AI is really good at specific tasks, like playing chess or giving directions, but it can't do other things beyond its programmed scope. How is AI helpful for people? AI is getting deeply integrated into our daily lives, making things easier, faster and smarter. For example, it can be used in powering voice assistants that can answer questions in seconds, or in healthcare where doctors can ask AI to analyse medical images (like X-rays for early disease detection) in seconds and help patients in a more effective manner, or help in drug discovery. It aims to make people efficient by allowing them to delegate some work to AI and helping them in focusing on major problems. What is Agentic AI? At its core, Agentic AI focuses on creating AI agents – intelligent software programmes that can gather information, process it for reasoning, execute the ideas by taking decisions and even learn and adapt by evaluating their outcomes. For example, a chatbot is a script: "If a customer asks X, reply Y." A Generative AI (LLM) is like a brilliant essay writer: "Give it a topic, and it'll write an essay." Agentic AI is like a project manager: "My goal is to plan and execute a marketing campaign." It can then break down the goal, generate ideas, write emails, schedule meetings, analyse data and adjust its plan – all with minimal human oversight – Just like JARVIS in Iron Man and Avengers movies. What is AGI? AGI is a hypothetical form of AI that possesses the ability to understand, learn and apply knowledge across a wide range of intellectual tasks at a level comparable to, or surpassing, that of a human being. Think of AGI as a brilliant human polymath – someone who can master any subject, solve any problem and adapt to any challenge across various fields. While AI agents are created to take up specific tasks in which they learn and execute, AGI will be like a ' Super AI Agent ' that virtually has all the information there is in this world and can solve problems on any subject. Will AI take away our jobs and what people can do? There is a straightforward answer by various tech CEOs and executives across the industry: Yes. AI will take away repetitive, predictable tasks and extensive data processing, such as data entry, routine customer service, assembly line operations, basic accounting and certain analytical roles. While this means some existing positions may be displaced, AI will more broadly transform roles, augmenting human capabilities and shifting the focus towards tasks requiring creativity, critical thinking, emotional intelligence and strategic oversight. For example, AI/Machine Learning Engineers, Data Scientists , Prompt Engineers and more. The last such revolution came with the internet and computers which did eat some jobs but created so many more roles for people. They can skill themselves by enrolling in new AI-centric courses to learn more about the booming technology to be better placed in the future. AI Masterclass for Students. Upskill Young Ones Today!– Join Now

OpenAI taps Google in unprecedented cloud deal despite AI rivalry

Time of India

6 hours ago

Time of India

OpenAI taps Google in unprecedented cloud deal despite AI rivalry

OpenAI plans to add Alphabet's Google cloud service to meet its growing needs for computing capacity, three sources told Reuters, marking a surprising collaboration between two prominent competitors in the artificial intelligence sector. The deal, which has been under discussion for a few months, was finalised in May, one of the sources added. It underscores how massive computing demands to train and deploy AI models are reshaping the competitive dynamics in AI, and marks OpenAI's latest move to diversify its compute sources beyond its major supporter Microsoft, including its high-profile Stargate data center project. It is a win for Google's cloud unit, which will supply additional computing capacity to OpenAI's existing infrastructure for training and running its AI models, sources said, who requested anonymity to discuss private matters. The move also comes as OpenAI's ChatGPT poses the biggest threat to Google's dominant search business in years, with Google executives recently saying that the AI race may not be winner-take-all. OpenAI, Google and Microsoft declined to comment. Since ChatGPT burst onto the scene in late 2022, OpenAI has dealt with increasing demand for computing capacity - known in the industry as compute - for training large language models, as well as for running inference, which involves processing information so people can use these models. OpenAI said on Monday that its annualised revenue run rate surged to $10 billion as of June, positioning the company to hit its full-year target amid booming adoption of AI. Earlier this year, OpenAI partnered with SoftBank and Oracle on the $500 billion Stargate infrastructure program , and signed deals worth billions with CoreWeave for more compute. It is on track this year to finalise the design of its first in-house chip that could reduce its dependency on external hardware providers, Reuters reported in February. The partnership with Google is the latest of several maneuvers made by OpenAI to reduce its dependency on Microsoft, whose Azure cloud service had served as the ChatGPT maker's exclusive data center infrastructure provider until January. Google and OpenAI discussed an arrangement for months but were previously blocked from signing a deal due to OpenAI's lock-in with Microsoft, a source told Reuters. Microsoft and OpenAI are also in negotiations to revise the terms of their multibillion-dollar investment, including the future equity stake Microsoft will hold in OpenAI. For Google, the deal comes as the tech giant is expanding external availability of its in-house chip known as tensor processing units, or TPUs, which were historically reserved for internal use. That helped Google win customers including Big Tech player Apple as well as startups like Anthropic and Safe Superintelligence, two OpenAI competitors launched by former OpenAI leaders. Google's addition of OpenAI to its customer list shows how the tech giant has capitalised on its in-house AI technology from hardware to software to accelerate the growth of its cloud business. Google Cloud, whose $43 billion of sales comprised 12% of Alphabet's 2024 revenue, has positioned itself as a neutral arbiter of computing resources in an effort to outflank Amazon and Microsoft as the cloud provider of choice for a rising legion of AI startups whose heavy infrastructure demands generate costly bills. Alphabet faces market pressure to demonstrate financial returns on its AI-related capital expenditures, which are expected to hit $75 billion this year, while maintaining its bottom line against the threat of competing AI offerings, as well as antitrust enforcement. Google's DeepMind AI unit also competes directly with OpenAI and Anthropic in a race to develop the best models and integrate those advances into consumer applications. Selling computing power reduces Google's own supply of chips while bolstering capacity-constrained rivals. The OpenAI deal will further complicate how Alphabet CEO Sundar Pichai allocates the capacity between the competing interests of Google's enterprise and consumer business segments. Google already lacked sufficient capacity to meet its cloud customers' demands as of the last quarter, Chief Financial Officer Anat Ashkenazi told analysts in April. Although ChatGPT holds a large lead over Google's competing chatbot in terms of monthly users and analysts have predicted it could reduce Google's dominant search market share, Pichai has brushed aside concerns that OpenAI will usurp Google's business dominance.

OpenAI's latest AI models report high ‘hallucination' rate: What does it mean — and why is this significant?

Hashtags

Try Our AI Features

Comments

Related Articles

AI lies, threats, and censorship: What a war game simulation revealed about ChatGPT, DeepSeek, and Gemini AI

AI explained: Your simple guide to chatbots, AGI, Agentic AI and what's next

OpenAI taps Google in unprecedented cloud deal despite AI rivalry

Get Started Now: Download the App