Google talked AI for 2 hours. It didn't mention hallucinations.

20-05-2025

This year, Google I/O 2025 had one focus: Artificial intelligence.
We've already covered all of the biggest news to come out of the annual developers conference: a new AI video generation tool called Flow. A $250 AI Ultra subscription plan. Tons of new changes to Gemini. A virtual shopping try-on feature. And critically, the launch of the search tool AI Mode to all users in the United States.
Yet over nearly two hours of Google leaders talking about AI, one word we didn't hear was "hallucination".
Hallucinations remain one of the most stubborn and concerning problems with AI models. The term refers to invented facts and inaccuracies that large-language models "hallucinate" in their replies. And according to the big AI brands' own metrics, hallucinations are getting worse — with some models hallucinating more than 40 percent of the time.
But if you were watching Google I/O 2025, you wouldn't know this problem existed. You'd think models like Gemini never hallucinate; you would certainly be surprised to see the warning appended to every Google AI Overview. ("AI responses may include mistakes".)
The closest Google came to acknowledging the hallucination problem came during a segment of the presentation on AI Mode and Gemini's Deep Search capabilities. The model would check its own work before delivering an answer, we were told — but without more detail on this process, it sounds more like the blind leading the blind than genuine fact-checking.
For AI skeptics, the degree of confidence Silicon Valley has in these tools seems divorced from actual results. Real users notice when AI tools fail at simple tasks like counting, spellchecking, or answering questions like "Will water freeze at 27 degrees Fahrenheit?"
Google was eager to remind viewers that its newest AI model, Gemini 2.5 Pro, sits atop many AI leaderboards. But when it comes to truthfulness and the ability to answer simple questions, AI chatbots are graded on a curve.
Gemini 2.5 Pro is Google's most intelligent AI model (according to Google), yet it scores just a 52.9 percent on the Functionality SimpleQA benchmarking test. According to an OpenAI research paper, the SimpleQA test is "a benchmark that evaluates the ability of language models to answer short, fact-seeking questions." (Emphasis ours.)
A Google representative declined to discuss the SimpleQA benchmark, or hallucinations in general — but did point us to Google's official Explainer on AI Mode and AI Overviews. Here's what it has to say:
[AI Mode] uses a large language model to help answer queries and it is possible that, in rare cases, it may sometimes confidently present information that is inaccurate, which is commonly known as 'hallucination.' As with AI Overviews, in some cases this experiment may misinterpret web content or miss context, as can happen with any automated system in Search...
We're also using novel approaches with the model's reasoning capabilities to improve factuality. For example, in collaboration with Google DeepMind research teams, we use agentic reinforcement learning (RL) in our custom training to reward the model to generate statements it knows are more likely to be accurate (not hallucinated) and also backed up by inputs.
Is Google wrong to be optimistic? Hallucinations may yet prove to be a solvable problem, after all. But it seems increasingly clear from the research that hallucinations from LLMs are not a solvable problem right now.
That hasn't stopped companies like Google and OpenAI from sprinting ahead into the era of AI Search — and that's likely to be an error-filled era, unless we're the ones hallucinating.

Hashtags

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

The new ChatGPT has some AI fans rethinking when to expect ‘superintelligence'

Washington Post

2 hours ago

Washington Post

The new ChatGPT has some AI fans rethinking when to expect ‘superintelligence'

SAN FRANCISCO — Anticipation built for months among tech workers and artificial intelligence enthusiasts ahead of OpenAI's next big upgrade to ChatGPT. The company's decision to christen the new system that would power the chatbot 'GPT-5' encouraged comparisons with its release of GPT-4 in 2023, which stunned the tech world and set ChatGPT on course to win its current 700 million weekly users.

How to Keep Your Home Private on Google Street View

CNET

3 hours ago

CNET

How to Keep Your Home Private on Google Street View

It's easy to think of Google Maps and Street View as nothing more than handy navigation tools, but they actually show a lot more than directions. A quick search can pull up a clear image of your front door, yard, and driveway for anyone to see. At first glance, that might not seem like a big deal, but it does raise privacy concerns. If you'd rather not have your home visible to strangers online, there are simple steps you can take to make it less exposed. With a few quick steps, you can help protect your privacy and limit how much strangers can see of your personal space. Here's how. For more, check out essential Google Maps tips for travel. Don't miss any of CNET's unbiased tech content and lab-based reviews. Add us as a preferred Google source on Chrome. Now Playing: How to Blur Your Home or an Object in Google Maps 02:24 How to blur your home on Google Maps You'll need to do this on your computer since the blurring feature isn't available in the Google Maps application on iOS or Android. It is accessible through the web browser on your mobile device, but it's rather difficult to use, so your best option is a trusted web browser on your Mac or PC. At enter your home address in the search bar at the top-right, hit return, then click the photo of your home that appears. Click on the photo of your home, right above your address, on the top-left part of the page. Screenshot by Nelson Aguilar/CNET Next, you'll see the Street View of your location. Click Report a Problem at the bottom-right. The text is super tiny, but it's there. This is the Street View of your location. Screenshot by Nelson Aguilar/CNET Now, it's up to you to choose what you want Google to blur. Using your mouse, adjust the view of the image so that your home and anything else you want to blur is all contained within the red and black box. Use your cursor to move around and the plus and minus buttons to zoom in and out, respectively. If you want to blur more than what's in the red and black box, use the + button to zoom in. Screenshot by Nelson Aguilar/CNET Once you're finished adjusting the image, choose what you're requesting to blur underneath: A face Your home Car/license plate A different object You'll be asked to give a bit more detail as to what exactly you want blurred, in case the image is busy with several cars, people and other objects. Also, be completely sure that what you select is exactly what you want blurred. Google cautions that once you blur something on Street View, it's blurred permanently. Finally, enter your email (this is required), verify the captcha (if needed), and click Submit. You are required to provide additional information about what you want to blur so be thorough. Screenshot by Nelson Aguilar/CNET You should then receive an email from Google that says it'll review your report and get back to you once the request is either denied or approved. You may receive more emails from Google asking for more information regarding your request. Google doesn't offer any information on how long your request will take to process, so just keep an eye out for any further emails. For more, take an inside look at how Google built Immersive View for Maps.

Criminals, good guys and foreign spies: Hackers everywhere are using AI now

NBC News

4 hours ago

NBC News

Criminals, good guys and foreign spies: Hackers everywhere are using AI now

This summer, Russia's hackers put a new twist on the barrage of phishing emails sent to Ukrainians. The hackers included an attachment containing an artificial intelligence program. If installed, it would automatically search the victims' computers for sensitive files to send back to Moscow. That campaign, detailed in July in technical reports from the Ukrainian government and several cybersecurity companies, is the first known instance of Russian intelligence being caught building malicious code with large language models (LLMs), the type of AI chatbots that have become ubiquitous in corporate culture. Those Russian spies are not alone. In recent months, hackers of seemingly every stripe — cybercriminals, spies, researchers and corporate defenders alike — have started including AI tools into their work. LLMs, like ChatGPT, are still error-prone. But they have become remarkably adept at processing language instructions and at translating plain language into computer code, or identifying and summarizing documents. The technology has so far not revolutionized hacking by turning complete novices into experts, nor has it allowed would-be cyberterrorists to shut down the electric grid. But it's making skilled hackers better and faster. Cybersecurity firms and researchers are using AI now, too — feeding into an escalating cat-and-mouse game between offensive hackers who find and exploit software flaws and the defenders who try to fix them first. 'It's the beginning of the beginning. Maybe moving towards the middle of the beginning,' said Heather Adkins, Google's vice president of security engineering. In 2024, Adkins' team started on a project to use Google's LLM, Gemini, to hunt for important software vulnerabilities, or bugs, before criminal hackers could find them. Earlier this month, Adkins announced that her team had so far discovered at least 20 important overlooked bugs in commonly used software and alerted companies so they can fix them. That process is ongoing. None of the vulnerabilities have been shocking or something only a machine could have discovered, she said. But the process is simply faster with an AI. 'I haven't seen anybody find something novel,' she said. 'It's just kind of doing what we already know how to do. But that will advance.' Adam Meyers, a senior vice president at the cybersecurity company CrowdStrike, said that not only is his company using AI to help people who think they've been hacked, he sees increasing evidence of its use from the Chinese, Russian, Iranian and criminal hackers that his company tracks. 'The more advanced adversaries are using it to their advantage,' he said. 'We're seeing more and more of it every single day,' he told NBC News. The shift is only starting to catch up with hype that has permeated the cybersecurity and AI industries for years, especially since ChatGPT was introduced to the public in 2022. Those tools haven't always proved effective, and some cybersecurity researchers have complained about would-be hackers falling for fake vulnerability findings generated with AI. Scammers and social engineers — the people in hacking operations who pretend to be someone else, or who write convincing phishing emails — have been using LLMs to seem more convincing since at least 2024. But using AI to directly hack targets is only just starting to actually take off, said Will Pearce, the CEO of DreadNode, one of a handful of new security companies that specialize in hacking using LLMs. The reason, he said, is simple: The technology has finally started to catch up to expectations. 'The technology and the models are all really good at this point,' he said. Less than two years ago, automated AI hacking tools would need significant tinkering to do their job properly, but they are now far more adept, Pearce told NBC News. Another startup built to hack using AI, Xbow, made history in June by becoming the first AI to climb to the top of the HackerOne U.S. leaderboard, a live scoreboard of hackers around the world that since 2016 has kept tabs on the hackers identifying the most important vulnerabilities and giving them bragging rights. Last week, HackerOne added a new category for groups automating AI hacking tools to distinguish them from individual human researchers. Xbow still leads that. Hackers and cybersecurity professionals have not settled whether AI will ultimately help attackers or defenders more. But at the moment, defense appears to be winning. Alexei Bulazel, the senior cyber director at the White House National Security Council, said at a panel at the Def Con hacker conference in Las Vegas last week that the trend will hold, at least as long as the U.S. holds most of the world's most advanced tech companies. 'I very strongly believe that AI will be more advantageous for defenders than offense,' Bulazel said. He noted that hackers finding extremely disruptive flaws in a major U.S. tech company is rare, and that criminals often break into computers by finding small, overlooked flaws in smaller companies that don't have elite cybersecurity teams. AI is particularly helpful in discovering those bugs before criminals do, he said. 'The types of things that AI is better at — identifying vulnerabilities in a low cost, easy way — really democratizes access to vulnerability information,' Bulazel said. That trend may not hold as the technology evolves, however. One reason is that there is so far no free-to-use automatic hacking tool, or penetration tester, that incorporates AI. Such tools are already widely available online, nominally as programs that test for flaws in practices used by criminal hackers. If one incorporates an advanced LLM and it becomes freely available, it likely will mean open season on smaller companies' programs, Google's Adkins said. 'I think it's also reasonable to assume that at some point someone will release [such a tool],' she said. 'That's the point at which I think it becomes a little dangerous.' Meyers, of CrowdStrike, said that the rise of agentic AI — tools that conduct more complex tasks, like both writing and sending emails or executing code that programs — could prove a major cybersecurity risk. 'Agentic AI is really AI that can take action on your behalf, right? That will become the next insider threat, because, as organizations have these agentic AI deployed, they don't have built-in guardrails to stop somebody from abusing it,' he said.