
AI is getting more powerful, but its hallucinations are getting worse
Written by Cade Metz and Karen Weise
Last month, an artificial intelligence bot that handles tech support for Cursor, an up-and-coming tool for computer programmers, alerted several customers about a change in company policy. It said they were no longer allowed to use Cursor on more than just one computer.
In angry posts to internet message boards, the customers complained. Some canceled their Cursor accounts. And some got even angrier when they realized what had happened: The AI bot had announced a policy change that did not exist.
'We have no such policy. You're of course free to use Cursor on multiple machines,' the company's CEO and co-founder, Michael Truell, wrote in a Reddit post. 'Unfortunately, this is an incorrect response from a front-line AI support bot.'
More than two years after the arrival of ChatGPT, tech companies, office workers and everyday consumers are using AI bots for an increasingly wide array of tasks. But there is still no way of ensuring that these systems produce accurate information.
The newest and most powerful technologies — so-called reasoning systems from companies including OpenAI, Google and the Chinese startup DeepSeek — are generating more errors, not fewer. As their math skills have notably improved, their handle on facts has gotten shakier. It is not entirely clear why.
Today's AI bots are based on complex mathematical systems that learn their skills by analyzing enormous amounts of digital data. They do not — and cannot — decide what is true and what is false. Sometimes, they just make stuff up, a phenomenon some AI researchers call hallucinations. On one test, the hallucination rates of newer AI systems were as high as 79%.
These systems use mathematical probabilities to guess the best response, not a strict set of rules defined by human engineers. So they make a certain number of mistakes. 'Despite our best efforts, they will always hallucinate,' said Amr Awadallah, CEO of Vectara, a startup that builds AI tools for businesses, and a former Google executive. 'That will never go away.'
For several years, this phenomenon has raised concerns about the reliability of these systems. Though they are useful in some situations — like writing term papers, summarizing office documents and generating computer code — their mistakes can cause problems.
The AI bots tied to search engines like Google and Bing sometimes generate search results that are laughably wrong. If you ask them for a good marathon on the West Coast, they might suggest a race in Philadelphia. If they tell you the number of households in Illinois, they might cite a source that does not include that information.
Those hallucinations may not be a big problem for many people, but it is a serious issue for anyone using the technology with court documents, medical information or sensitive business data.
'You spend a lot of time trying to figure out which responses are factual and which aren't,' said Pratik Verma, co-founder and CEO of Okahu, a company that helps businesses navigate the hallucination problem. 'Not dealing with these errors properly basically eliminates the value of AI systems, which are supposed to automate tasks for you.'
Cursor and Truell did not respond to requests for comment.
For more than two years, companies such as OpenAI and Google steadily improved their AI systems and reduced the frequency of these errors. But with the use of new reasoning systems, errors are rising. The latest OpenAI systems hallucinate at a higher rate than the company's previous system, according to the company's own tests.
The company found that o3 — its most powerful system — hallucinated 33% of the time when running its PersonQA benchmark test, which involves answering questions about public figures. That is more than twice the hallucination rate of OpenAI's previous reasoning system, called o1. The new o4-mini hallucinated at an even higher rate: 48%.
When running another test called SimpleQA, which asks more general questions, the hallucination rates for o3 and o4-mini were 51% and 79%. The previous system, o1, hallucinated 44% of the time.
In a paper detailing the tests, OpenAI said more research was needed to understand the cause of these results. Because AI systems learn from more data than people can wrap their heads around, technologists struggle to determine why they behave in the ways they do.
'Hallucinations are not inherently more prevalent in reasoning models, though we are actively working to reduce the higher rates of hallucination we saw in o3 and o4-mini,' a company spokesperson, Gaby Raila, said. 'We'll continue our research on hallucinations across all models to improve accuracy and reliability.'
Hannaneh Hajishirzi, a professor at the University of Washington and a researcher with the Allen Institute for Artificial Intelligence, is part of a team that recently devised a way of tracing a system's behavior back to the individual pieces of data it was trained on. But because systems learn from so much data — and because they can generate almost anything — this new tool can't explain everything. 'We still don't know how these models work exactly,' she said.
Tests by independent companies and researchers indicate that hallucination rates are also rising for reasoning models from companies such as Google and DeepSeek.
Since late 2023, Awadallah's company, Vectara, has tracked how often chatbots veer from the truth. The company asks these systems to perform a straightforward task that is readily verified: Summarize specific news articles. Even then, chatbots persistently invent information.
Vectara's original research estimated that in this situation chatbots made up information at least 3% of the time and sometimes as much as 27%.
In the year and a half since, companies such as OpenAI and Google pushed those numbers down into the 1% or 2% range. Others, such as San Francisco startup Anthropic, hovered around 4%. But hallucination rates on this test have risen with reasoning systems. DeepSeek's reasoning system, R1, hallucinated 14.3% of the time. OpenAI's o3 climbed to 6.8%.
(The New York Times has sued OpenAI and its partner, Microsoft, accusing them of copyright infringement regarding news content related to AI systems. OpenAI and Microsoft have denied those claims.)
For years, companies like OpenAI relied on a simple concept: The more internet data they fed into their AI systems, the better those systems would perform. But they used up just about all the English text on the internet, which meant they needed a new way of improving their chatbots.
So these companies are leaning more heavily on a technique that scientists call reinforcement learning. With this process, a system can learn behavior through trial and error. It is working well in certain areas, such as math and computer programming. But it is falling short in other areas.
'The way these systems are trained, they will start focusing on one task — and start forgetting about others,' said Laura Perez-Beltrachini, a researcher at the University of Edinburgh who is among a team closely examining the hallucination problem.
Another issue is that reasoning models are designed to spend time 'thinking' through complex problems before settling on an answer. As they try to tackle a problem step by step, they run the risk of hallucinating at each step. The errors can compound as they spend more time thinking.
The latest bots reveal each step to users, which means the users may see each error, too. Researchers have also found that in many cases, the steps displayed by a bot are unrelated to the answer it eventually delivers.
'What the system says it is thinking is not necessarily what it is thinking,' said Aryo Pradipta Gema, an AI researcher at the University of Edinburgh and a fellow at Anthropic.

Try Our AI Features
Explore what Daily8 AI can do for you:
Comments
No comments yet...
Related Articles


Time of India
44 minutes ago
- Time of India
China AI chip firm Biren raises new funds, plans Hong Kong IPO
Academy Empower your mind, elevate your skills Chinese AI chip startup Biren Technology has raised about 1.5 billion yuan ($207 million) in fresh funding and is preparing for a Hong Kong initial public offering, people familiar with the matter funding round and IPO plan come as China seeks to develop domestic alternatives to U.S. semiconductors amid escalating export restrictions by Washington on advanced has prioritised building homegrown champions in graphics processing units (GPUs), which are critical for artificial intelligence 1.5 billion yuan funding round was led primarily by state-linked investors, two of the sources said. Participants included a state-backed fund from Guangdong province and another from the Shanghai government, according to one initially filed documents for a mainland China listing last year but has since shifted focus to Hong Kong, partly due to stricter regulatory requirements on the mainland including less tolerance for loss-making companies, two of the sources company is preparing to file for a Hong Kong listing in the third quarter, potentially as early as August, one source said. It was not immediately clear if Biren had appointed advisers for the was valued at approximately 14 billion yuan prior to the latest funding round, two of the sources and the Guangdong provincial and Shanghai governments, did not immediately respond to requests for sources, who spoke to Reuters, declined to be named as the information is not domestic GPU capabilities has become critical for China as the U.S. has tightened export restrictions on advanced semiconductors. The latest measures, implemented in April, prompted U.S. chip giant Nvidia to halt sales of its H20 AI chips to Chinese potential market for Chinese AI chip companies is substantial. Investment bank Morgan Stanley predicted in a May client note that domestic GPU makers could generate 287 billion yuan in sales by 2027, capturing 70% of the Chinese market compared with 30% last in 2019, Biren's cofounders include Zhang Wen, formerly a president at leading AI face-recognition company SenseTime, and Jiao Guofang, who previously worked at Qualcomm and company initially drew attention from investors and industry observers in 2022 when it unveiled its first batch of products, including the BR100 chip, which it claimed could match the performance of Nvidia's advanced H100 AI the company was added to the U.S. 'Entity List' in 2023, preventing it from using leading global foundries such as Taiwan Semiconductor Manufacturing to manufacture its has since experienced significant upheaval, with some senior executives departing, including co-founder Xu continues to operate at a loss and generates limited revenue, recording 400 million yuan in sales in 2024, according to two of the company's general-purpose GPU products have been deployed across multiple intelligent computing centers, and its partners include China Mobile , China Telecom, ZTE, and the Shanghai AI Laboratory, according to its official of these companies have also been targeted by U.S. restrictions, including China Mobile and China Telecom, which the Federal Communications Commission said in March it is investigating for potential evasion of U.S. also faces intense competition from other Chinese AI chip companies, including Huawei and peers such as Tencent-backed Enflame and Metax.


Hindustan Times
an hour ago
- Hindustan Times
ChatGPT hack that I use everyday: Bill Gates' daughter reveals tip to go viral
Bill Gates' daughter, Phoebe Gates and the founder of fashion tech startup, Phia, leverages ChatGPT to build ideas for marketing, content creation, and making viral videos. Phia co-founder Sophia Kianni said, 'You should not be starting anything from scratch. The internet exists for a reason.' In a podcast with Kianni, Gates explained how they take advantage of ChatGPT to create viral videos for social media and marketing. Know about the ChatGPT hack to create viral videos for social media.(Phoebe Gates (@phoebegates)) The podcast talks about the detailed process of how they collect viral and technically crafted videos from the internet and 'reverse engineer' them using ChatGPT to create their own content for their business. This new hack is not only smart, but could inspire several businesses and startups to ace marketing and social media content creation like a pro. Also read: India's AI Industry to Grow Threefold, May Reach $17 Billion by 2027 ChatGPT hack to create viral video content In a podcast named The Burnouts, hosted by Sophia Kianni and Bill Gates delve deeper into how they master social media marketing using ChatGPT. The founders detailed how they create smart marketing strategies and promote their brand 'Phia' with AI tools and collect viral video content from social media. Gates highlighted a term called 'reverse engineering technique', in which they leverage ChatGPT to understand some viral videos and replicate them for their own brand. Gates highlighted that they create a list of videos from social media platforms that are viral and of absolute high quality. 'Then we'll make an Excel sheet: These are really good videos, here's why this video was good, and then we'll reverse engineer how to make that video. A lot of times, even for our simple founder videos, we're reverse engineering other content that we've seen do really, really well.' Also read: ChatGPT now lets you download Deep Research reports as PDFs - here's how Kianna further explained that they transcribe these viral videos using ChatGPT, and then ask the Chatbot to find angles on what made them work so well on social media platforms. Now, based on the response, they provide them a detailed description of Phia and convert it into scripts for their own videos. The discussion further highlights that there are some 'patterns' on why some videos tend to go viral, and understanding them is crucial for businesses to attract new consumers. Gates also added, "I use AI almost every single day, and it supercharges me.' This new way of using ChatGPT could inspire many and also encourage businesses to widely leverage AI tools. Mobile Finder: iPhone 17 Pro Max LATEST specs, features, and price


Mint
an hour ago
- Mint
Franklin Templeton buying China stocks for first time in years
LONDON, - Multi-billion dollar fund manager Franklin Templeton has started edging back into Chinese stocks for the first time in years, betting that trade tensions with the U.S. have now peaked and that Beijing is fully behind its top tech firms again. Zehrid Osmani, Head of the firm's Global Long-Term Unconstrained team, told Reuters that a group of its funds managing around $2 billion had only started their buying in the last few weeks having had no exposure at all over the last 2-3 years. "We've tip-toed ," Osmani said in an interview. "We reduced our underweight which has been sizable in some of our mandates, and in some of our global mandates we've neutralized the China exposure." Hong Kong-listed Chinese tech stocks are up nearly 20% this year, more than treble what the U.S. Nasdaq has made and flow data has shown global investors significantly increasing their buying. Osmani said it had returned largely because after years of spluttering growth, property market and geo-political troubles, and a "Common Prosperity" mantra which crimped top tech firms, China's markets look cheap. President Xi Jinping signalled an end to the tech clampdown by gathering the "captains of industry" earlier this year in a show of Beijing's support, while a willingness by both China and the U.S. to meet at the trade negotiating table was also encouraging, Osmani said. "We're also conscious that China, in terms of policy initiatives, has probably more levers to pull than many other countries in terms of fiscal and monetary policy." "We don't think they've gone aggressive in any of those, and we would like them to be more aggressive on both fronts to really support the economy, but they do have those levers that they can pull." This article was generated from an automated news agency feed without modifications to text.