OpenAI Admits That Its New Model Still Hallucinates More Than a Third of the Time
But apparently it's different for OpenAI's hot new model. Using SimpleQA, the company's in-house factuality benchmarking tool, OpenAI admitted in its release announcement that its new large language model (LLM) GPT-4.5 hallucinates — which is AI parlance for confidently spewing fabrications and presenting them as fact — 37 percent of the time.
Yes, you read that right: in tests, the latest AI model from a company that's worth hundreds of billions of dollars is telling lies for more than one out of every three answers it gives.
As if that wasn't bad enough, OpenAI is actually trying to spin GPT-4.5's bullshitting problem as a good thing because — get this — it doesn't hallucinate as much as the company's other LLMs.
The same graph [can we embed a screenshot below?] that showed how often the new model spews nonsense also reports that GPT-4o, a purportedly advanced "reasoning" model, hallucinates 61.8 percent of the time on the SimpleQA benchmark. OpenAI's o3-mini, a cheaper and smaller version of its reasoning model, was found to hallucinate a whopping 80.3 percent of the time.
Of course, the problem isn't unique to OpenAI.
"At present, even the best models can generate hallucination-free text only about 35 percent of the time," explained Wenting Zhao, a Cornell doctoral student who co-wrote a paper last year about AI hallucination rates, in an interview about the research with TechCrunch. "The most important takeaway from our work is that we cannot yet fully trust the outputs of model generations."
Beyond the incredulity of a company getting hundreds of billions of dollars in investments for products that have such issues telling the truth, it says a lot about the AI industry at large that these are the things they're selling us: expensive, resource-consuming systems that are supposed to be approaching human-level intelligence but still can't get basic facts right.
As OpenAI's LLMs plateau in performance, the company is clearly grasping at straws to re-steer the hype ship back on the course it seemed to chart when ChatGPT first dropped.
But to do that, we're probably going to need to see a real breakthrough, not more of the same.
More on AI hallucinations: Even the Most Advanced AI Has a Problem: If It Doesn't Know the Answer, It Makes One Up

Try Our AI Features
Explore what Daily8 AI can do for you:
Comments
No comments yet...
Related Articles
Yahoo
3 hours ago
- Yahoo
HIVE Digital Technologies Ltd (HIVE) Q1 2026 Earnings Call Highlights: Record Revenue and ...
Release Date: August 14, 2025 For the complete transcript of the earnings call, please refer to the full earnings call transcript. Positive Points HIVE Digital Technologies Ltd (NASDAQ:HIVE) reported a record quarter with over $45 million in total revenue, primarily driven by Bitcoin mining operations. The company achieved a significant growth in earnings per share, increasing by 206% year over year. HIVE's strategic expansion in Paraguay has been transformative, allowing the company to rapidly scale its Bitcoin mining operations. The company maintains a strong balance sheet with $24.6 million in cash and $47.3 million in digital currencies. HIVE's focus on renewable energy and sustainable practices positions it well for future growth, particularly in the AI and HPC sectors. Negative Points The volatility of Bitcoin prices poses a risk to HIVE's financial performance, as evidenced by the significant non-cash reevaluation of Bitcoin on their balance sheet. High depreciation charges due to the purchase of new GPU and ASIC chips for AI and Bitcoin buildout could impact profitability. The company's expansion and scaling efforts require significant capital investment, which could strain financial resources if not managed carefully. HIVE's growth strategy involves complex operations across multiple countries, which may present logistical and regulatory challenges. The competitive landscape in the Bitcoin mining and AI sectors is intensifying, which could pressure HIVE's market position and margins. Q & A Highlights Warning! GuruFocus has detected 7 Warning Signs with HIVE. Q: Can you provide an overview of HIVE's financial performance for Q1 2026? A: Aiden Killick, President and CEO, highlighted that HIVE had a record quarter with over $45 million in total revenue, 90% of which came from Bitcoin mining operations and 10% from their HPC AI business. The company achieved a gross operating margin of 38%, yielding about $15.8 million in cash flow from operations, and reported a net income of $35 million with $44.6 million in adjusted EBITDA. Q: How is HIVE managing its Bitcoin holdings and what strategies are in place for future growth? A: Aiden Killick explained that HIVE ended the quarter with 435 Bitcoin on the balance sheet and has a Bitcoin pledge strategy allowing them to purchase Bitcoin back at zero interest. This strategy has enabled HIVE to scale its Bitcoin mining business without dilution or taking on debt, effectively using $200 million worth of CapEx. Q: What are the key developments in HIVE's expansion efforts, particularly in Paraguay? A: Aiden Killick noted that HIVE has significantly expanded its operations in Paraguay, completing phase one of their expansion ahead of schedule. They are currently operating at over 15 exahash and are fully funded to reach 25 exahash by American Thanksgiving. This expansion is part of their strategy to maintain a 440 megawatt green energy footprint for Bitcoin mining. Q: How does HIVE's AI and HPC business contribute to its overall strategy? A: Craig Tavares, President of Buzz HPC, explained that HIVE's AI and HPC business is rapidly scaling, with a target of reaching $100 million ARR. The company operates over 5,000 GPUs and is focused on providing a full suite of infrastructure services for AI, leveraging their existing data centers and renewable energy sources. Q: What are HIVE's future plans for data center expansion and AI infrastructure? A: Craig Tavares mentioned that HIVE is expanding its data center footprint with recent acquisitions in Toronto and Sweden. These facilities will support their sovereign AI strategy and are expected to go live next year. The Toronto data center, in particular, will be a tier 3 facility leveraging liquid cooling infrastructure to support high-density GPU clusters. For the complete transcript of the earnings call, please refer to the full earnings call transcript. This article first appeared on GuruFocus.
Yahoo
3 hours ago
- Yahoo
S&P 500 Hits 6,400 on AI Boom: ETFs in Focus
The S&P 500 reached a new milestone on Aug. 13, 2025, closing above 6,400 for the first time. The rally was driven by large-cap technology stocks. While Trump's tariffs caused some upheavals in the early phase of the year, the tech stocks once again have led the charge. Jessica Rabe, co-founder of DataTrek Research, noted that investors are continuing to favor large-cap U.S. tech stocks over large caps in general — and that this trend is not done yet. Top 20 Stocks Outperforming the Index Rabe highlighted that the 20 largest companies on the S&P 500 have risen at an average of 40.6% since the market bottom compared to the index's overall 27.9% gain. This means these top holdings have pulled the index higher, while the remaining 480 stocks have been a relative drag. Most of the outperformers — including NVIDIA (NVDA), Microsoft (MSFT), Apple (AAPL), Amazon (AMZN), Alphabet (GOOGL, GOOG), Meta (META), Broadcom (AVGO), Tesla (TSLA), JPMorgan (JPM), Netflix (NFLX), Oracle (ORCL), and Palantir (PLTR) — share a common chord: AI-backed fundamentals. Industrials Also Riding the AI Wave Citi US equity strategist Scott Chronert, who recently boosted his year-end S&P 500 target to 6,600, said the Industrials rally is also tied to AI spending, with companies benefiting from the technology's infrastructure demands, as quoted on Yahoo Finance. Chronert believes the real opportunity lies in the longer term, as more companies will be adopting AI to improve margins and productivity. AI ETFs in Focus Against this backdrop, below we highlight a few AI-based exchange-traded funds (ETFs) that deserve a place in your portfolio. Global X Artificial Intelligence & Technology ETF AIQ The underlying Indxx Artificial Intelligence & Big Data Index is designed to provide exposure to exchange-listed companies in developed markets that are positioned to benefit from the further development and implementation of artificial intelligence technology, as well as to companies that provide critical technology and services for the analysis of large and complex data sets. The fund charges 68 bps in fees. Invest in Gold Thor Metals Group: Best Overall Gold IRA Priority Gold: Up to $15k in Free Silver + Zero Account Fees on Qualifying Purchase American Hartford Gold: #1 Precious Metals Dealer in the Nation iShares Exponential Technologies ETF XT The underlying Morningstar Exponential Technologies Index measures the performance of equity securities that are involved with the creation of groundbreaking technologies or that are users that apply such technologies within their businesses. The fund charges 46 bps in fees. Global X Robotics & Artificial Intelligence ETF BOTZ The underlying Indxx Global Robotics & Artificial Intelligence Thematic Index invests in companies that potentially stand to benefit from increased adoption and utilization of robotics and artificial intelligence, including those involved with industrial robotics and automation, non-industrial robots, and autonomous vehicles. The fund charges 68 bps in fees. ARK Autonomous Technology & Robotics ETF ARKQ The ARK Autonomous Technology & Robotics ETF is an actively managed ETF that seeks long-term growth of capital by investing under normal circumstances primarily in domestic and foreign equity securities of autonomous technology and robotics companies that follow the theme of disruptive innovation. The fund charges 75 bps in fees. ROBO Global Robotics & Automation Index ETF ROBO The underlying ROBO Global Robotics and Automation Index measures the performance of companies that derive a portion of their revenues and profits from robotics-related or automation-related products or services. The fund charges 95 bps in fees. Want the latest recommendations from Zacks Investment Research? Today, you can download 7 Best Stocks for the Next 30 Days. Click to get this free report ARK Autonomous Technology & Robotics ETF (ARKQ): ETF Research Reports ROBO Global Robotics and Automation Index ETF (ROBO): ETF Research Reports Global X Robotics & Artificial Intelligence ETF (BOTZ): ETF Research Reports iShares Exponential Technologies ETF (XT): ETF Research Reports Global X Artificial Intelligence & Technology ETF (AIQ): ETF Research Reports This article originally published on Zacks Investment Research ( Zacks Investment Research
Yahoo
3 hours ago
- Yahoo
White House AI czar David Sacks says 'AI psychosis' is similar to the 'moral panic' of social media's early days
The White House AI advisor discussed "AI psychosis" on a recent podcast. David Sacks said he doubted the validity of the concept. He compared it to the "moral panic" that surrounded earlier tech leaps, like social media. AI can create a diet plan, organize a calendar, and provide answers to an endless variety of burning questions. Can it also cause a psychiatric breakdown? David Sacks, the White House official spearheading America's AI policies, doesn't think so. President Donald Trump's AI and crypto czar discussed "AI psychosis" during an episode of the "All-In Podcast" published Friday. While most people engage with chatbots without a problem, a small number of users say the bots have encouraged delusions and other concerning behavior. For some, ChatGPT serves as an alternative to professional therapists. A psychiatrist earlier told Business Insider that some of his patients exhibiting what's been described as "AI psychosis," a nonclinical term, used the technology before experiencing mental health issues, "but they turned to it in the wrong place at the wrong time, and it supercharged some of their vulnerabilities." During the podcast, Sacks doubted the whole concept of "AI psychosis." "I mean, what are we talking about here? People doing too much research?" he asked. "This feels like the moral panic that was created over social media, but updated for AI." Sacks then referred to a recent article featuring a psychiatrist, who said they didn't believe using a chatbot inherently induced "AI psychosis" if there aren't other risk factors — including social and genetic — involved. "In other words, this is just a manifestation or outlet for pre-existing problems," Sacks said. "I think it's fair to say we're in the midst of a mental health crisis in this country." Sacks attributed the crisis instead to the COVID-19 pandemic and related lockdowns. "That's what seems to have triggered a lot of these mental health declines," he said. After several reports of users suffering mental breaks while using ChatGPT, OpenAI CEO Sam Altman addressed the issue on X after the company rolled out the highly anticipated GPT-5. "People have used technology, including AI, in self-destructive ways; if a user is in a mentally fragile state and prone to delusion, we do not want the AI to reinforce that," Altman wrote. "Most users can keep a clear line between reality and fiction or role-play, but a small percentage cannot." Earlier this month, OpenAI introduced safeguards in ChatGPT, including a prompt encouraging users to take breaks after long conversations with the chatbot. The update will also change how the chatbot responds to users asking about personal challenges. Read the original article on Business Insider Solve the daily Crossword