Google talked AI for 2 hours. It didn't mention hallucinations.
This year, Google I/O 2025 had one focus: Artificial intelligence.
We've already covered all of the biggest news to come out of the annual developers conference: a new AI video generation tool called Flow. A $250 AI Ultra subscription plan. Tons of new changes to Gemini. A virtual shopping try-on feature. And critically, the launch of the search tool AI Mode to all users in the United States.
Yet over nearly two hours of Google leaders talking about AI, one word we didn't hear was "hallucination".
Hallucinations remain one of the most stubborn and concerning problems with AI models. The term refers to invented facts and inaccuracies that large-language models "hallucinate" in their replies. And according to the big AI brands' own metrics, hallucinations are getting worse — with some models hallucinating more than 40 percent of the time.
But if you were watching Google I/O 2025, you wouldn't know this problem existed. You'd think models like Gemini never hallucinate; you would certainly be surprised to see the warning appended to every Google AI Overview. ("AI responses may include mistakes".)
The closest Google came to acknowledging the hallucination problem came during a segment of the presentation on AI Mode and Gemini's Deep Search capabilities. The model would check its own work before delivering an answer, we were told — but without more detail on this process, it sounds more like the blind leading the blind than genuine fact-checking.
For AI skeptics, the degree of confidence Silicon Valley has in these tools seems divorced from actual results. Real users notice when AI tools fail at simple tasks like counting, spellchecking, or answering questions like "Will water freeze at 27 degrees Fahrenheit?"
Google was eager to remind viewers that its newest AI model, Gemini 2.5 Pro, sits atop many AI leaderboards. But when it comes to truthfulness and the ability to answer simple questions, AI chatbots are graded on a curve.
Gemini 2.5 Pro is Google's most intelligent AI model (according to Google), yet it scores just a 52.9 percent on the Functionality SimpleQA benchmarking test. According to an OpenAI research paper, the SimpleQA test is "a benchmark that evaluates the ability of language models to answer short, fact-seeking questions." (Emphasis ours.)
A Google representative declined to discuss the SimpleQA benchmark, or hallucinations in general — but did point us to Google's official Explainer on AI Mode and AI Overviews. Here's what it has to say:
[AI Mode] uses a large language model to help answer queries and it is possible that, in rare cases, it may sometimes confidently present information that is inaccurate, which is commonly known as 'hallucination.' As with AI Overviews, in some cases this experiment may misinterpret web content or miss context, as can happen with any automated system in Search...
We're also using novel approaches with the model's reasoning capabilities to improve factuality. For example, in collaboration with Google DeepMind research teams, we use agentic reinforcement learning (RL) in our custom training to reward the model to generate statements it knows are more likely to be accurate (not hallucinated) and also backed up by inputs.
Is Google wrong to be optimistic? Hallucinations may yet prove to be a solvable problem, after all. But it seems increasingly clear from the research that hallucinations from LLMs are not a solvable problem right now.
That hasn't stopped companies like Google and OpenAI from sprinting ahead into the era of AI Search — and that's likely to be an error-filled era, unless we're the ones hallucinating.

Try Our AI Features
Explore what Daily8 AI can do for you:
Comments
No comments yet...
Related Articles


New York Post
13 minutes ago
- New York Post
OpenAI finds more Chinese bad actors using ChatGPT for malicious purposes
Chinese bad actors are using ChatGPT for malicious purposes – generating social media posts to sow political division across the US and seeking information on military technology, OpenAI said. An organized China-linked operation, in one such incident dubbed 'Uncle Spam,' used ChatGPT to generate social media posts that were supportive and critical of contentious topics related to US politics – and then posted both versions of the comments from separate accounts, the company said in a report released Thursday. 'This appears likely designed to exploit existing political divisions rather than to promote a specific ideological stance,' OpenAI wrote in the report, describing what is known as an influence operation. Advertisement 3 A growing number of Chinese bad actors are using ChatGPT for malicious purposes, OpenAI said. REUTERS OpenAI said it followed Meta's lead to disrupt this operation, after the social media conglomerate discovered the actors were posting at hours through the day consistent with a work day in China. The actors also used ChatGPT to make logos for their social media accounts that supported fake organizations – mainly creating personas of US veterans critical of President Trump, like a so-called 'Veterans For Justice' group. These users also tried to request code from ChatGPT that they could use to extract personal data from social media platforms like X and Bluesky, OpenAI said. Advertisement While the number of these operations has jumped, they had relatively little impact as these social media accounts typically had small followings, OpenAI said. Another group of likely Chinese actors used ChatGPT to create polarizing comments on topics like USAID funding cuts and tariffs, which were then posted across social media sites. In the comments of a TikTok video about USAID funding cuts, one of these accounts wrote: 'Our goodwill was exploited. So disappointing.' Advertisement 3 Another group of likely Chinese actors used ChatGPT to create polarizing comments on topics like USAID funding cuts and tariffs. REUTERS Another post on X took the opposite stance: '$7.9M allocated to teach Sri Lankan journalists to avoid binary-gender language. Is this the best use of development funds?' These actors made posts on X appearing to justify USAID cuts as a means of offsetting the tariffs. 'Tariffs make imported goods outrageously expensive, yet the government splurges on overseas aid. Who's supposed to keep eating?' one post said. Advertisement Another read: 'Tariffs are choking us, yet the government is spending money to 'fund' foreign politics.' 3 The operations used ChatGPT to write divisive comments on some of the Trump administration's policies, including USAID funding cuts and tariffs. AFP via Getty Images In another China-linked operation, users posed as professionals based in Europe or Turkey working for nonexistent European news outlets. They engaged with journalists and analysts on social media platforms like X, and offered money in exchange for information on the US economy and classified documents, all while using ChatGPT to translate their requests. OpenAI said it also banned ChatGPT accounts associated with several bad actors who have been publicly linked to the People's Republic of China. These accounts asked ChatGPT for help with software development and for research into US military networks and government technology. OpenAI regularly releases reports on malicious activity across its platform, including reports on fake content for websites and social media platforms and attempts to create damaging malware.
Yahoo
24 minutes ago
- Yahoo
Why two AI leaders are losing talent to startup Anthropic
Why two AI leaders are losing talent to startup Anthropic originally appeared on TheStreet. Many tech companies have announced layoffs over the past few months. In the case of Microsoft, () it's happened more than once. The rise of artificial intelligence has upended the job market in undeniable ways, prompting companies to either completely automate away some positions or scale back their hiring in other areas while increasing their reliance on chatbots and AI agents. 💵💰💰💵 As more and more companies opt for job cuts and shift focus toward implementing an AI-first strategy, questions abound as to which jobs will survive this technology revolution. The number of companies embracing this method includes prominent names such as Shopify and Box. Not every tech company is slashing its workforce, though. AI startup Anthropic isn't slowing down on its hiring. In fact, it is successfully attracting talent for several industry leaders, launching a new battle for AI talent as the industry continues to boom. Founded in 2021, Anthropic is still a fairly new company, although it is making waves in the AI market. Often considered a rival to ChatGPT maker OpenAI, it is best known for producing the Claude family, a group of large language models (LLMs) that have become extremely popular, particularly in the tech describes itself as an AI safety and research company with a focus on creating 'reliable, interpretable, and steerable AI systems.' Most recently, though, it has been in the spotlight after CEO Dario Amodei predicted that AI will wipe out many entry-level white-collar jobs. Even so, Amodei's own company is currently hiring workers for many different areas, including policy, finance, and marketing. But recent reports indicate that Anthropic has been on an engineering hiring spree as well lately, successfully poaching talent from two of its primary competitors. Venture capital firm SignalFire recently released its State of Talent Report for 2025, in which it examined hiring trends in the tech sector. This year's report showed that in an industry dependent on highly skilled engineers, Anthropic isn't just successfully hiring the best talent; it is retaining it. According to SignalFire's data, 80% of the employees hired by Anthropic at least two years remain with the startup. While DeepMind is just behind it with a 78% retention rate, OpenAI trails both with only 78%, despite ChatGPT's popularity among broad ranges of users. As always, the numbers tell the story, and in this case, they highlight a compelling trend that is already shaping the future of AI. The report's authors provide further context on engineers choosing Anthropic over its rivals, stating: 'OpenAI and DeepMind. Engineers are 8 times more likely to leave OpenAI for Anthropic than the reverse. From DeepMind, the ratio is nearly 11:1 in Anthropic's favor. Some of that's expected—Anthropic is the hot new startup, while DeepMind's larger, tenured team is ripe for movement. But the scale of the shift is striking.' More AI News: OpenAI teams up with legendary Apple exec One AI stock makes up 78% of Nvidia's investment portfolio Nvidia, Dell announce major project to reshape AI Tech professionals seeking out opportunities with innovative startups is nothing new. But in this case, all three companies are offering engineers opportunities to work on important projects. This raises the question of what Anthropic more appealing than its peers. AI researcher and senior software engineer Nandita Giri, spoke to TheStreet about this trend, offering insight into why tech workers may be making these decisions. She sees it as being about far more than financial matters.'Anthropic is making serious investments in transparency tooling, scaling laws, and red-teaming infrastructure, which gives technical contributors greater ownership over how systems are evaluated and evolved,' she states. 'Compared to OpenAI and DeepMind both of which are increasingly focused on product cycles Anthropic offers more freedom to pursue deep, foundational research.' However, other experts speculate that it may be more than that. Wyatt Mayham, a lead consultant at Northwest AI, share some insights from his team, stating 'What we've heard from clients is that it's simply easier to work there with less burnout. More worklife balance if you will.' Technology consultant Kate Scott adds that while all three companies are doing important work, she sees this trend as reflecting a shift in the broader industry, one that shows engineers seeking environments 'where organizational purpose and daily execution feel closely aligned,' something that Anthropic seems to be two AI leaders are losing talent to startup Anthropic first appeared on TheStreet on Jun 5, 2025 This story was originally reported by TheStreet on Jun 5, 2025, where it first appeared. Error in retrieving data Sign in to access your portfolio Error in retrieving data Error in retrieving data Error in retrieving data Error in retrieving data
Yahoo
24 minutes ago
- Yahoo
Google says its updated Gemini 2.5 Pro AI model is better at coding
Google on Thursday announced an update to its Gemini 2.5 Pro preview model that the company claims is better at certain programming tasks. The company's calling it an "updated preview," building on the upgrade to Gemini 2.5 Pro that Google announced around a month ago. Google says the model will roll out in general availability in a "couple of weeks," and is available starting today in its AI developer platforms AI Studio and Vertex AI and the Gemini app. "[Gemini 2.5 Pro] continues to excel at coding, leading on difficult coding benchmarks," Google writes in a blog post. "It also shows top-tier performance [on] highly challenging benchmarks that evaluate a model's math, science, knowledge, and reasoning capabilities." So what else is new? Google says it addressed feedback from its previous 2.5 Pro release, improving the model's style and structure. Now, 2.5 Pro can be "more creative with better-formatted responses," Google claims. This article originally appeared on TechCrunch at