GPT-5 failed the hype test

5 hours ago

Last week, on GPT-5 launch day, AI hype was at an all-time high.
In a press briefing beforehand, OpenAI CEO Sam Altman said GPT-5 is 'something that I just don't wanna ever have to go back from,' a milestone akin to the first iPhone with a Retina display. The night before the announcement livestream, Altman posted an image of the Death Star, building even more hype. On X, one user wrote that the anticipation 'feels like christmas eve.' All eyes were on the ChatGPT-maker as people across industries waited to see if the publicity would deliver or disappoint. And by most accounts, the big reveal would fall short.
The hype for OpenAI's long-time-coming new model had been building for years — ever since the 2023 release of GPT-4. In a Reddit AMA with Altman and staff last October, users continuously asked about the release date of GPT-5, looking for details on its features and what would set it apart. One Redditor asked, 'Why is GPT-5 taking so long?' Altman responded that compute was a limitation, and that 'all of these models have gotten quite complex and we can't ship as many things in parallel as we'd like to.'
But when GPT-5 appeared in ChatGPT, users were largely unimpressed. The sizable advancements they had been expecting seemed mostly incremental, and the model's key gains were in areas like cost and speed. In the long run, however, that might be a solid financial bet for OpenAI — albeit a less flashy one.
People expected the world of GPT-5. (One X user posted that after Altman's Death Star post, 'everyone shifted expectations.') And OpenAI didn't downplay those projections, calling GPT-5 its 'best AI system yet' and a 'significant leap in intelligence' with 'state-of-the-art performance across coding, math, writing, health, visual perception, and more.' Altman said in a press briefing that chatting with the model 'feels like talking to a PhD-level expert.'
That hype made for a stark contrast with reality. Would a model with PhD-level intelligence, for example, repeatedly insist there were three 'b's' in the word blueberry, as some social media users found? And would it not be able to identify how many state names included the letter 'R'? Would it incorrectly label a U.S. map with made-up states including 'New Jefst,' 'Micann,' 'New Nakamia,' 'Krizona,' and 'Miroinia,' and label Nevada as an extension of California? People who used the bot for emotional support found the new system austere and distant, protesting so loudly that OpenAI brought support for an older model back. Memes abounded — one depicting GPT-4 and GPT-4o as formidable dragons with GPT-5 beside them as a simpleton.
The court of expert public opinion was not forgiving, either. Gary Marcus, a leading AI industry voice and emeritus professor of psychology at New York University, called the model 'overdue, overhyped and underwhelming.' Peter Wildeford, co-founder of the Institute for AI Policy and Strategy, wrote in his review, 'Is this the massive smash we were looking for? Unfortunately, no.' Zvi Mowshowitz, a popular AI industry blogger, called it 'a good, but not great, model.' One Redditor on the official GPT-5 Reddit AMA wrote, 'Someone tell Sam 5 is hot garbage.'
In the days following GPT-5's release, the onslaught of unimpressed reviews has tempered a bit. The general consensus is that although GPT-5 wasn't as significant of an advancement as people expected, it offered upgrades in cost and speed, plus fewer hallucinations, and the switch system it offered — automatically directing your query on the backend to the model that made the most sense to answer it, so you don't have to decide — was all-new. Altman leaned into that narrative, writing, 'GPT-5 is the smartest model we've ever done, but the main thing we pushed for is real-world utility and mass accessibility/affordability.'
OpenAI researcher Christina Kim posted on X that with GPT-5, 'the real story is usefulness. It helps with what people care about-- shipping code, creative writing, and navigating health info-- with more steadiness and less friction. We also cut hallucinations. It's better calibrated, says 'I don't know,' separates facts from guesses, and can ground answers with citations when you want.'
There's a widespread understanding that, to put it bluntly, GPT-5 has made ChatGPT less eloquent. Viral social media posts complained that the new model lacked nuance and depth in its writing, coming off as robotic and cold. Even in GPT-5's own marketing materials, OpenAI's side-by-side comparison of GPT-4o and GPT-5-generated wedding toasts doesn't seem like an unmitigated win for the new model — I personally preferred the one from 4o. When Altman asked Redditors if they thought GPT-5 was better at writing, he was met with an onslaught of comments defending the retired GPT-4o model instead; within a day, he'd acquiesced to pressure and at least temporarily returned it to ChatGPT.
But there's one front where the model appears to shine brighter: coding. One iteration of GPT-5 currently tops the most popular AI model leaderboard in the coding category, with Anthropic's Claude coming in second. OpenAI's launch promotion showed off AI-generated games (a rolling ball mini-game and a typing speed race), a pixel art tool, a drum simulator, and a lofi visualizer. When I tried to vibe-code a puzzle game with the tool, it had a bunch of glitches, but I did find success with simpler projects like an interactive embroidery lesson.
That's a big win for OpenAI, since it's been going head-to-head in the AI coding wars with competitors like Anthropic, Google, and others for a long while now. Businesses are willing to spend a lot on AI coding, and that's one of the most realistic revenue generators for cash-burning AI startups.
OpenAI also highlighted GPT-5's prowess in healthcare, but that remains mostly untested in practice — we likely won't know how successful it is for a while.
AI benchmarks have come to mean less and less in recent years, since they change often and some companies cherry-pick which results they reveal. But overall, they may give us a reasonable picture of GPT-5. The model performed better than its predecessors on many industry tests, but that improvement wasn't anything to write home about, according to many industry folks. As Wildeford put it, 'When it comes to formal evaluations, it seems like GPT-5 was largely what would be expected — small, incremental increases rather than anything worthy of a vague Death Star meme.'
But if recent history has anything to say about it, those small, incremental increases could be more likely to translate into concrete profit than wowing individual consumers. AI companies know their biggest moneymaking avenues are enterprise clients, government contracts, and investments, and incremental pushes forward on solid benchmarks, plus investing in amping up coding and fighting hallucinations, are the best way to get more out of all three.
Posts from this author will be added to your daily email digest and your homepage feed.
See All by Hayden Field
Posts from this topic will be added to your daily email digest and your homepage feed.
See All AI
Posts from this topic will be added to your daily email digest and your homepage feed.
See All OpenAI
Posts from this topic will be added to your daily email digest and your homepage feed.
See All Report

Hashtags

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Here are hottest jobs in tech, and the roles you should avoid

Business Insider

a few seconds ago

Business Insider

Here are hottest jobs in tech, and the roles you should avoid

Tech job postings have plunged 35% since early 2020, with some roles seeing radically lower demand. The decline began post-pandemic and continued after ChatGPT's release in late 2022. AI and machine learning roles are in demand, while junior positions require more experience. Tech hiring has fallen dramatically in recent years, and certain jobs are no longer in demand, while other specific roles have surged. New tech hiring data from Indeed shows overall postings have plunged 35% from early 2020. Roughly half this decline happened during the post-pandemic sell-off, when tech companies realized they'd overhired. The other half happened after ChatGPT came out in late 2022. "While there isn't a smoking gun linking AI to the plunge, automation trends could be a reason the weakness persists," said Brendon Bernard, a senior economist at Indeed. "We're also seeing potential AI effects show up in the types of tech jobs still in demand, and rising experience requirements among the opportunities that remain." Demand for AI and machine learning roles is up, while listings for junior tech positions have dropped significantly. More job postings now require 5+ years of experience. Here's what hot, and what's not, in tech jobs over the past five years or so: I often think that the most boring technology ends up being the most profitable. So it may not be surprising to see an SAP job at the top of the table on the right here. No disrespect to SAP. Shares of the German enterprise software giant have more than tripled since late 2022.

Anthropic has new rules for a more dangerous AI landscape

The Verge

a few seconds ago

The Verge

Anthropic has new rules for a more dangerous AI landscape

Anthropic has updated the usage policy for its Claude AI chatbot in response to growing concerns about safety. In addition to introducing stricter cybersecurity rules, Anthropic now specifies some of the most dangerous weapons that people should not develop using Claude. Anthropic doesn't highlight the tweaks made to its weapons policy in the post summarizing its changes, but a comparison between the company's old usage policy and its new one reveals a notable difference. Though Anthropic previously prohibited the use of Claude to 'produce, modify, design, market, or distribute weapons, explosives, dangerous materials or other systems designed to cause harm to or loss of human life,' the updated version expands on this by specifically prohibiting the development of high-yield explosives, along with biological, nuclear, chemical, and radiological (CBRN) weapons. In May, Anthropic implemented 'AI Safety Level 3' protection alongside the launch of its new Claude Opus 4 model. The safeguards are designed to make the model more difficult to jailbreak, as well as to help prevent it from assisting with the development of CBRN weapons. In its post, Anthropic also acknowledges the risks posed by agentic AI tools, including Computer Use, which lets Claude take control of a user's computer, as well as Claude Code, a tool that embeds Claude directly into a developer's terminal. 'These powerful capabilities introduce new risks, including potential for scaled abuse, malware creation, and cyber attacks,' Anthropic writes. The AI startup is responding to these potential risks by folding a new 'Do Not Compromise Computer or Network Systems' section into its usage policy. This section includes rules against using Claude to discover or exploit vulnerabilities, create or distribute malware, develop tools for denial-of-service attacks, and more. Additionally, Anthropic is loosening its policy around political content. Instead of banning the creation of all kinds of content related to political campaigns and lobbying, Anthropic will now only prohibit people from using Claude for 'use cases that are deceptive or disruptive to democratic processes, or involve voter and campaign targeting.' The company also clarified that its requirements for all its 'high-risk' use cases, which come into play when people use Claude to make recommendations to individuals or customers, only apply to consumer-facing scenarios, not for business use. Posts from this author will be added to your daily email digest and your homepage feed. See All by Emma Roth Posts from this topic will be added to your daily email digest and your homepage feed. See All AI Posts from this topic will be added to your daily email digest and your homepage feed. See All News Posts from this topic will be added to your daily email digest and your homepage feed. See All Tech

Dow Touches First Record High of the Year as UnitedHealth Group Stock Soars

Yahoo

28 minutes ago

Yahoo

Dow Touches First Record High of the Year as UnitedHealth Group Stock Soars

The Dow Jones Industrial Average opened at its first all-time high of the year on Friday as shares of health care giant UnitedHealth Group (UNH) soared. The Dow opened 0.5% higher to trade at about 45,150 Friday morning, leapfrogging its prior record of 45,073 set in early December. UnitedHealth Group led the index higher, rising nearly 12% after a regulatory filing Thursday afternoon revealed Warren Buffett's Berkshire Hathaway (BRK.B) had taken a $1.6 billion stake in the company. The Dow is the last of the major large-cap indexes to set a record high this year. The S&P 500 and Nasdaq Composite have closed at records 18 and 20 times, respectively, since the start of the year. The majority of those came in July, when solid earnings reports helped allay some lingering fears about the effect tariffs could have on corporate America's profits. The Dow started 2025 off on the back foot, mostly because of the stock that's fueling its gains today. UnitedHealth Group shares tumbled last December after the CEO of its insurance arm, Brian Thompson, was fatally shot in Manhattan. The shooting reignited debate over U.S. health care costs and intensified criticism of pharmacy benefit managers, including UnitedHealth's OptumRx. UnitedHealth and other insurers continued to struggle in the new year as elevated health care costs crimped their results and federal regulators scrutinized industry billing practices. By early August, UnitedHealth was trading more than 60% off its all-time high set in November. For the Dow to close at a record high, it will need to close above 45,014.04, about 0.2% above its close on Thursday. The index was up 0.2% at 44,995 in recent trading. Read the original article on Investopedia