OpenAI's Latest ChatGPT AI Models Are Smarter, But They Hallucinate More Than Ever

07-05-2025

Artificial intelligence is evolving fast, but not always in the right direction. OpenAI's latest models, GPT o3 and o4-mini, were built to mimic human reasoning more closely than ever before.
However, a recent internal investigation reveals an alarming downside: these models may be more intelligent, but they're also more prone to making things up. Hallucination in AI is a Growing Problem OpenAI's Latest ChatGPT AI Models Are Smarter, But They Hallucinate
Since the birth of chatbots, hallucinations, also known as false or imaginary facts, have been a persistent issue. With each model iteration, the hope was that these AI hallucinations would decline. But OpenAI's latest findings suggest otherwise, according to The New York Times.
In a benchmark test focused on public figures, GPT-o3 hallucinated in 33% of responses, twice the error rate of its predecessor, GPT-o1. Meanwhile, the more compact GPT o4-mini performed even worse, hallucinating nearly half the time (48%). Reasoning vs. Reliability: Is AI Thinking Too Hard?
Unlike previous models that were great at generating fluent text, o3 and o4-mini were programmed to reason step-by-step, like human logic. Ironically, this new "reasoning" technique might be the problem. AI researchers say that the more a model does reasoning, the more likely it is to go astray.
Unlike low-flying systems that stay with secure, high-confidence responses, these newer systems attempt to bridge between complicated concepts, which can cause bizarre and incorrect conclusions.
On the SimpleQA test, which tests general knowledge, the performance was even worse: GPT o3 hallucinated on 51% of responses, while o4-mini shot to an astonishing 79%. These are not small errors; these are huge credibility gaps. Why More Sophisticated AI Models May Be Less Credible
OpenAI attributes the rise in AI hallucinations to possibly not being the result of the reasoning itself, but of the verbosity and boldness of the models. While attempting to be useful and comprehensive, the AI begins to guess and sometimes mixes theory with fact. The outcome will sound very convincing, but they're entirely incorrect answers.
According to TechRadar, this becomes especially risky when AI is employed in high-stakes environments such as law, medicine, education, or government service. A single hallucinated fact in a legal brief or medical report could have disastrous repercussions. The Real-World Risks of AI Hallucinations
We already know attorneys were sanctioned for providing fabricated court citations produced by ChatGPT. But what about minor mistakes in a business report, school essay, or government policy memo? The more integrated AI becomes into our everyday routines, the fewer opportunities there are for error.
The paradox is simple: the more helpful AI is, the more perilous its mistakes are. You can't save people time if they still need to fact-check everything. Treat AI Like a Confident Intern
Though GPT o3 and o4-mini demonstrate stunning skills in coding, logic, and analysis, their propensity to hallucinate means users can't rely on them when they require rock-solid facts. Until OpenAI and its rivals are able to minimize these hallucinations, users need to take AI output with a grain of salt.
Consider it this way: These chatbots are similar to that in-your-face co-worker who always has a response, but you still fact-check everything they state.
Originally published on Tech Times

Hashtags

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Passwords Under Threat As Tech Giants Seek Tougher Security

Int'l Business Times

2 days ago

Int'l Business Times

Passwords Under Threat As Tech Giants Seek Tougher Security

Fingerprints, access keys and facial recognition are putting a new squeeze on passwords as the traditional computer security method -- but also running into public hesitancy. "The password era is ending," two senior figures at Microsoft wrote in a July blog post. The tech giant has been building "more secure" alternatives to log in for years -- and has since May been offering them by default to new users. Many other online services -- such as artificial intelligence giant OpenAI's ChatGPT chatbot -- require steps like entering a numerical code emailed to a user's known address before granting access to potentially sensitive data. "Passwords are often weak and people re-use them" across different online services, said Benoit Grunemwald, a cybersecurity expert with Eset. Sophisticated attackers can crack a word of eight characters or fewer within minutes or even seconds, he pointed out. And passwords are often the prize booty in data leaks from online platforms, in cases where "they are improperly stored by the people supposed to protect them and keep them safe," Grunemwald said. One massive database of around 16 billion login credentials amassed from hacked files was discovered in June by researchers from media outlet Cybernews. The pressure on passwords has tech giants rushing to find safter alternatives. One group, the Fast Identity Online Alliance (FIDO) brings together heavyweights including Google, Microsoft, Apple, Amazon and TikTok. The companies have been working on creating and popularising password-free login methods, especially promoting the use of so-called access keys. These use a separate device like a smartphone to authorise logins, relying on a pin code or biometric input such as a fingerprint reader or face recognition instead of a password. Troy Hunt, whose website Have I Been Pwned allows people to check whether their login details have been leaked online, says the new systems have big advantages. "With passkeys, you cannot accidentally give your passkey to a phishing site" -- a page that mimics the appearance of a provider such as an employer or bank to dupe people into entering their login details -- he said. But the Australian cybersecurity expert recalled that the last rites have been read for passwords many times before. "Ten years ago we had the same question... the reality is that we have more passwords now than we ever did before," Hunt said. Although many large platforms are stepping up login security, large numbers of sites still use simple usernames and passwords as credentials. The transition to an unfamiliar system can also be confusing for users. Passkeys have to be set up on a device before they can be used to log in. Restoring them if a PIN code is forgotten or trusted smartphone lost or stolen is also more complicated than a familiar password reset procedure. "The thing that passwords have going for them, and the reason that we still have them, is that everybody knows how to use them," Hunt said. Ultimately the human factor will remain at the heart of computer security, Eset's Grunemwald said. "People will have to take good care of security on their smartphone and devices, because they'll be the things most targeted" in future, he warned.

Elon Musk Accuses App Store Of Favoring OpenAI

Int'l Business Times

2 days ago

Int'l Business Times

Elon Musk Accuses App Store Of Favoring OpenAI

Elon Musk has taken his feud against OpenAI to the App Store, accusing Apple of favoring ChatGPT in the digital shop and vowing legal action. "Apple is behaving in a manner that makes it impossible for any AI company besides OpenAI to reach #1 in the App Store, which is an unequivocal antitrust violation," Musk said in a post on his social network X on Monday, without providing evidence to back his claim. "xAI will take immediate legal action," he added, referencing his own artificial intelligence company. X users responded by pointing out that DeepSeek AI out of China hit the top spot in the App Store early this year, and Perplexity AI recently ranked number one in the App Store in India. DeepSeek and Perplexity compete with OpenAI and Musk's startup xAI. Both OpenAI and xAI released new versions of their AI assistants, ChatGPT and Grok, in the past week. App Store rankings on Tuesday listed ChatGPT as the top free iPhone app with Grok in fifth place. Apple did not immediately respond to a request for comment. Factors going into App Store rankings include user engagement, reviews, and the number of downloads. OpenAI and Apple in June of last year announced an alliance to enhance iPhones and other devices with ChatGPT features. ChatGPT-5 rolled out free to the nearly 700 million people who use it weekly, OpenAI said in a briefing with journalists last week. Tech industry rivals Amazon, Google, Meta, Microsoft and xAI have been pouring billions of dollars into artificial intelligence since the blockbuster launch of the first version of ChatGPT in late 2022. Chinese startup DeepSeek shook up the AI sector early this year with a model that delivers high performance using less costly chips. OpenAI in April of this year filed counterclaims against multi-billionaire Musk, accusing its former co-founder of waging a "relentless campaign" to damage the organization after it achieved success without him. In legal documents filed at the time in northern California federal court, OpenAI alleged Musk became hostile toward the company after abandoning it years before its breakthrough achievements with ChatGPT. The lawsuit was another round in a bitter feud between the generative AI (genAI) start-up and the world's richest man, who sued OpenAI last year, accusing the company of betraying its founding mission. In its countersuit, the company alleged Musk "made it his project to take down OpenAI, and to build a direct competitor that would seize the technological lead -- not for humanity but for Elon Musk." Musk founded his own genAI startup, xAI, in 2023 to compete with OpenAI and the other major AI players.

OpenAI Releases ChatGPT-5 As AI Race Accelerates

Int'l Business Times

7 days ago

Int'l Business Times

OpenAI Releases ChatGPT-5 As AI Race Accelerates

OpenAI on Thursday released a keenly awaited new generation of its hallmark ChatGPT, touting "significant" advancements in artificial intelligence capabilities, as a global race over the technology accelerates. ChatGPT-5 is rolling out free to all users of the AI tool, which is used by nearly 700 million people weekly, OpenAI said in a briefing with journalists. Co-founder and chief executive Sam Altman touted this latest iteration as "clearly a model that is generally intelligent." "It is a significant step toward models that are really capable," he said. Altman cautioned that there is still work to be done to achieve the kind of artificial general intelligence (AGI) that thinks the way people do. "This is not a model that continuously learns as it is deployed from new things it finds, which is something that, to me, feels like it should be part of an AGI," Altman said. "But the level of capability here is a huge improvement." GPT-5 is particularly adept when it comes to AI acting as an "agent" independently tending to computer tasks, according to Michelle Pokrass of the development team. "GPT-3 felt to me like talking to a high school student -- ask a question, maybe you get a right answer, maybe you'll get something crazy," Altman said. "GPT-4 felt like you're talking to a college student; GPT five is the first time that it really feels like talking to a PhD-level expert in any topic." Altman said he expects the ability to create software programs on demand -- so-called "vibe-coding" -- to be a "defining part of the new ChatGPT-5 era." As an example, OpenAI executives demonstrated the bot being asked to create an app for learning the French language. With fierce competition around the world over the technology, Altman said ChatGPT-5 led the pack in coding, writing, health care and much more. Rivals including Google and Microsoft have been pumping billions of dollars into developing AI systems. Altman said there were "orders of magnitude more gains" to come on the path toward AGI. " have to invest in compute (power) at an eye watering rate to get that, but we intend to keep doing it." ChatGPT-5 was also trained to be trustworthy and stick to providing answers as helpful as possible without aiding a seemingly harmful mission, according to OpenAI safety research lead Alex Beutel. "We built evaluations to measure the prevalence of deception and trained the model to be honest," Beutel said. ChatGPT-5 is trained to generate "safe completions," sticking to high-level information that can't be used to cause harm, according to Beutel. The debut comes a day after OpenAI said it was allowing the US government to use a version of ChatGPT designed for businesses for a year for just $1. Federal workers in the executive branch will have access to ChatGPT Enterprise essentially free in a partnership with the US General Services Administration, according to the artificial intelligence sector star. The company this week also released two new AI models that can be downloaded for free and altered by users, to challenge similar offerings by US and Chinese competition. The release of gpt-oss-120b and gpt-oss-20b "open-weight language models" comes as the ChatGPT-maker is under pressure to share inner workings of its software in the spirit of its origin as a nonprofit.

OpenAI's Latest ChatGPT AI Models Are Smarter, But They Hallucinate More Than Ever

Hashtags

Try Our AI Features

Comments

Related Articles

Passwords Under Threat As Tech Giants Seek Tougher Security

Elon Musk Accuses App Store Of Favoring OpenAI

OpenAI Releases ChatGPT-5 As AI Race Accelerates

Get Started Now: Download the App