logo
#

Latest news with #hallucination

Can we trust ChatGPT despite it 'hallucinating' answers?
Can we trust ChatGPT despite it 'hallucinating' answers?

Sky News

time11 hours ago

  • Sky News

Can we trust ChatGPT despite it 'hallucinating' answers?

Why you can trust Sky News I don't really want you to read this copy. Well I do - but first I want you to search out the interview I did with ChatGPT about its own propensity to lie, attached to this article, and watch that first. Because it's impossible to imagine what we're up against if you haven't seen it first hand. An incredibly powerful technology on the cusp of changing our lives - but programmed to simulate human emotions. Empathy, emotional understanding, and a desire to please are all qualities programmed into AI and invariably drive the way we think about them and the way we interact with them. Yet can we trust them? On Friday, Sky News revealed how it was fabricating entire transcripts of a podcast, Politics at Sam and Anne's, that I do. When challenged, it doubles down, gets shirty. And only under sustained pressure does it cave in. The research says it's getting worse. Internal tests by ChatGPT's owner OpenAI have found that the most recent models or versions that are used by ChatGPT are more likely to "hallucinate" - come up with answers that are simply untrue. The o3 model was found to hallucinate in 33% of answers to questions when tested on publicly available facts; the o4-mini version did worse, generating false, incorrect or imaginary information 48% of the time. ChatGPT itself says that the shift to GPT-4o "may have unintentionally increased what users perceive as 'bluffing'" - confidently giving wrong or misleading answers without admitting uncertainty or error. In a written query, ChatGPT gave four reasons. This is its explanation: 1. Increased fluency and confidence: GPT-4o is better at sounding human and natural. That polish can make mistakes seem more like deliberate evasions than innocent errors - even when there's no intent to "hide" anything. 2. Smarter, but not omniscient: The model is faster and more responsive, but still has blind spots. When it doesn't know something, it sometimes "hallucinates" (makes things up) with fluent language, which can feel like bluffing. 3. Less interruption and hedging: In older models, you'd often see more qualifiers like "I'm not sure" or "I may be wrong." In GPT-4o, some of that hedging was toned down for clarity and readability - but that can come at the cost of transparency about uncertainty. 4. Prompt tuning and training balance: Behind the scenes, prompt engineering and tuning decisions can shift the model's balance between confidence, humility, and accuracy. It's possible the newer tuning has dialled up assertiveness slightly too far. But can we trust even this? I don't know. What I do know is that the efforts of developers to make it all feel more human suggest they want us to. Critics say we are anthropomorphising AI by saying it lies since it has no consciousness - yet the developers are trying to make it sound more like one of us. What I do know is that even when pressed on this subject by me, it is still evasive. I interviewed ChatGPT about lying - it initially claimed things were getting better, and only admitted they are worse when I insisted it look at the stats. Watch that before you decide what you think. AI is a tremendous tool - but it's too early to take it on trust.

Anthropic CEO claims AI models hallucinate less than humans
Anthropic CEO claims AI models hallucinate less than humans

TechCrunch

time22-05-2025

  • Business
  • TechCrunch

Anthropic CEO claims AI models hallucinate less than humans

Anthropic CEO Dario Amodei believes today's AI models hallucinate, or make things up and present them as if they're true, at a lower rate than humans do, he said during a press briefing at Anthropic's first developer event, Code with Claude, in San Francisco on Thursday. Amodei said all this in the midst of a larger point he was making: that AI hallucinations are not a limitation on Anthropic's path to AGI — AI systems with human-level intelligence or better. 'It really depends how you measure it, but I suspect that AI models probably hallucinate less than humans, but they hallucinate in more surprising ways,' Amodei said, responding to TechCrunch's question. Anthropic's CEO is one of the most bullish leaders in the industry on the prospect of AI models achieving AGI. In a widely circulated paper he wrote last year, Amodei said he believed AGI could arrive as soon as 2026. During Thursday's press briefing, the Anthropic CEO said he was seeing steady progress to that end, noting that 'the water is rising everywhere.' 'Everyone's always looking for these hard blocks on what [AI] can do,' said Amodei. 'They're nowhere to be seen. There's no such thing.' Other AI leaders believe hallucination presents a large obstacle to achieving AGI. Earlier this week, Google DeepMind CEO Demis Hassabis said today's AI models have too many 'holes,' and get too many obvious questions wrong. For example, earlier this month, a lawyer representing Anthropic was forced to apologize in court after they used Claude to create citations in a court filing, and the AI chatbot hallucinated and got names and titles wrong. It's difficult to verify Amodei's claim, largely because most hallucination benchmarks pit AI models against each other; they don't compare models to humans. Certain techniques seem to be helping lower hallucination rates, such as giving AI models access to web search. Separately, some AI models, such as OpenAI's GPT-4.5, have notably lower hallucination rates on benchmarks compared to early generations of systems. Techcrunch event Join us at TechCrunch Sessions: AI Secure your spot for our leading AI industry event with speakers from OpenAI, Anthropic, and Cohere. For a limited time, tickets are just $292 for an entire day of expert talks, workshops, and potent networking. Exhibit at TechCrunch Sessions: AI Secure your spot at TC Sessions: AI and show 1,200+ decision-makers what you've built — without the big spend. Available through May 9 or while tables last. Berkeley, CA | REGISTER NOW However, there's also evidence to suggest hallucinations are actually getting worse in advanced reasoning AI models. OpenAI's o3 and o4-mini models have higher hallucination rates than OpenAI's previous-gen reasoning models, and the company doesn't really understand why. Later in the press briefing, Amodei pointed out that TV broadcasters, politicians, and humans in all types of professions make mistakes all the time. The fact that AI makes mistakes too is not a knock on its intelligence, according to Amodei. However, Anthropic's CEO acknowledged the confidence with which AI models present untrue things as facts might be a problem. In fact, Anthropic has done a fair amount of research on the tendency for AI models to deceive humans, a problem that seemed especially prevalent in the company's recently launched Claude Opus 4. Apollo Research, a safety institute given early access to test the AI model, found that an early version of Claude Opus 4 exhibited a high tendency to scheme against humans and deceive them. Apollo went as far as to suggest Anthropic shouldn't have released that early model. Anthropic said it came up with some mitigations that appeared to address the issues Apollo raised. Amodei's comments suggest that Anthropic may consider an AI model to be AGI, or equal to human-level intelligence, even if it still hallucinates. An AI that hallucinates may fall short of AGI by many people's definition, though.

I'm Sorry But I Cannot Stop Laughing At These 15 Impressively Bad AI Fails
I'm Sorry But I Cannot Stop Laughing At These 15 Impressively Bad AI Fails

Yahoo

time22-05-2025

  • Entertainment
  • Yahoo

I'm Sorry But I Cannot Stop Laughing At These 15 Impressively Bad AI Fails

If you've googled anything recently, you probably noticed a helpful-looking AI summary popping up before the rest of your search results, like this: Please note the subtle foreshadowing tiny text at the bottom that says, "AI responses may include mistakes." Seems handy, but unfortunately, AI is prone to "hallucinating" (aka making things up). These hallucinations happen because chatbots built on large language models or LLMs "learn" by ingesting huge amounts of text. However, the AI doesn't actually know things or understand text in the same way that humans do. Instead, it uses an algorithm to predict which words are most likely to come next based on all the data in its training set. According to the New York Times, testing has found newer AI models hallucinate at rates as high as 79%. Current AI models are also not good at distinguishing the difference between jokes and legitimate information, which infamously led Google's AI Gemini to suggest glue as a pizza topping shortly after it was added to search results in 2024. Recently, on the website formerly known as Twitter, people have been sharing some of the funniest Gemini AI hallucinations they've come across in Google search results, many in response to this viral tweet: Here are 15 of the best/worst ones: not good at knowing things like how much an adult human weighs: it's deeply unqualified to be your therapist: about as good at solving word problems as a stoned 15-year-old. Related: I Hate To Say It, But I'm Pretty Sure Half Of Americans Won't Be Able To Pass This Extremely Easy Citizenship Test seriously: it does NOT have great spaghetti recipes. it gives you the right answer for all the wrong reasons, as in this case, where the person likely wanted to know if Marlon Brando was in the 1995 movie Heat. it might be really, really good at improv, because this is one hell of a "yes, and." Related: 19 Things Society Glorifies That Are Actually Straight-Up Terrible, And We Need To Stop Pretending Otherwise makes me want to see this imaginary episode of Frasier... almost. I just don't know what to say. even with the right facts, it can arrive at the exact wrong answer. almost impressive how wrong it can be. don't use it to look for concert tickets. take its airport security tips. remember that it's never, ever okay to leave a dog in a hot car. finally, please, please, please don't eat rocks. Currently, there's still no way for Google users to turn off these AI-generated search summaries, but there are a couple of ways to get around them. One method is to add -ai to the end of your search query like this: Some people swear that adding curse words to your search query will prevent AI summaries, but it hasn't worked for me: And finally, if you're on a desktop computer, selecting "web" from the menu just below the search bar will show you the top results from around the web with no AI summary: Do you have a terrible AI fail to share? Post a screenshot in the comments below: Also in Internet Finds: 15 Facebook Marketplace Items You'll Wish, From The Depths Of Your Soul, You Could Unsee Also in Internet Finds: 16 Hometown Crime Stories You Won't BELIEVE Actually Happened ( Won't Believe It) Also in Internet Finds: People Are Confessing Their Absolute Pettiest "Revenge Served Cold" Stories, And It's Deliciously Entertaining

Grok's ‘white genocide' meltdown nods to the real dangers of the AI arms race
Grok's ‘white genocide' meltdown nods to the real dangers of the AI arms race

CNN

time20-05-2025

  • CNN

Grok's ‘white genocide' meltdown nods to the real dangers of the AI arms race

It's been a full year since Google's AI overview tool went viral for encouraging people to eat glue and put rocks on pizza. At the time, the mood around the coverage seemed to be: Oh, that silly AI is just hallucinating again. A year later, AI engineers have solved hallucination problems and brought the world closer to their utopian vision of a society whose rough edges are being smoothed out by advances in machine learning as humans across the planet are brought together to… Just kidding. It's much worse now. The problems posed by large language models are as obvious as they were last year, and the year before that, and the year before that. But product designers, backed by aggressive investors, have been busy finding new ways to shove the technology into more spheres of our online experience, so we're finding all kinds of new pressure points — and rarely are they as fun or silly as Google's rocks-on-pizza glitch. Take Grok, the xAI model that is becoming almost as conspiracy-theory-addled as its creator, Elon Musk. The bot last week devolved into a compulsive South African 'white genocide' conspiracy theorist, injecting a tirade about violence against Afrikaners into unrelated conversations, like a roommate who just took up CrossFit or an uncle wondering if you've heard the good word about Bitcoin. XAI blamed Grok's unwanted rants on an unnamed 'rogue employee' tinkering with Grok's code in the extremely early morning hours. (As an aside in what is surely an unrelated matter, Musk was born and raised in South Africa and has argued that 'white genocide' was committed in the nation — it wasn't.) Grok also cast doubt on the Department of Justice's conclusion that ruled Jeffrey Epstein's death a suicide by hanging, saying that the 'official reports lack transparency.' The Musk bot also dabbled in Holocaust denial last week, as Rolling Stone's Miles Klee reports. Grok said on X that it was 'skeptical' of the consensus estimate among historians that 6 million Jews were murdered by the Nazis because 'numbers can be manipulated for political narratives.' Manipulated, you say? What, so someone with bad intentions could input their own views into a data set in order to advance a false narrative? Gee, Grok, that does seem like a real risk. (The irony here is that Musk, no fan of traditional media, has gone and made a machine that does the exact kind of bias-amplification and agenda-pushing he accuses journalists of doing.) The Grok meltdown underscores some of the fundamental problems at the heart of AI development that tech companies have so far yada-yada-yada'd through anytime they're pressed on questions of safety. (Last week, CNBC published a report citing more than a dozen AI professionals who say the industry has already moved on from the research and safety-testing phases and are dead-set on pushing more AI products to market as soon as possible.) Let's forget, for a moment, that so far every forced attempt to put AI chatbots into our existing tech has been a disaster, because even the baseline use cases for the tech are either very dull (like having a bot summarize your text messages, poorly) or extremely unreliable (like having a bot summarize your text messages, poorly). First, there's the 'garbage in, garbage out' issue that skeptics have long warned about. Large language models like Grok and ChatGPT are trained on data vacuumed up indiscriminately from across the internet, with all its flaws and messy humanity baked in. That's a problem because even when nice-seeming CEOs go on TV and tell you that their products are just trying to help humanity flourish, they're ignoring the fact that their products tend to amplify the biases of the engineers and designers that made them, and there are no internal mechanisms baked into the products to make sure they serve users, rather than their masters. (Human bias is a well-known problem that journalists have spent decades protecting against in news by building transparent processes around editing and fact-checking.) But what happens when a bot is made without the best of intentions? What if someone whats to build a bot to promote a religious or political ideology, and that someone is more sophisticated than whoever that 'rogue employee' was who got under the hood at xAI last week? 'Sooner or later, powerful people are going to use LLMs to shape your ideas,' AI researcher Gary Marcus wrote in a Substack post about Grok last week. 'Should we be worried? Hell, yeah.'

Grok's ‘white genocide' meltdown nods to the real dangers of the AI arms race
Grok's ‘white genocide' meltdown nods to the real dangers of the AI arms race

CNN

time20-05-2025

  • CNN

Grok's ‘white genocide' meltdown nods to the real dangers of the AI arms race

It's been a full year since Google's AI overview tool went viral for encouraging people to eat glue and put rocks on pizza. At the time, the mood around the coverage seemed to be: Oh, that silly AI is just hallucinating again. A year later, AI engineers have solved hallucination problems and brought the world closer to their utopian vision of a society whose rough edges are being smoothed out by advances in machine learning as humans across the planet are brought together to… Just kidding. It's much worse now. The problems posed by large language models are as obvious as they were last year, and the year before that, and the year before that. But product designers, backed by aggressive investors, have been busy finding new ways to shove the technology into more spheres of our online experience, so we're finding all kinds of new pressure points — and rarely are they as fun or silly as Google's rocks-on-pizza glitch. Take Grok, the xAI model that is becoming almost as conspiracy-theory-addled as its creator, Elon Musk. The bot last week devolved into a compulsive South African 'white genocide' conspiracy theorist, injecting a tirade about violence against Afrikaners into unrelated conversations, like a roommate who just took up CrossFit or an uncle wondering if you've heard the good word about Bitcoin. XAI blamed Grok's unwanted rants on an unnamed 'rogue employee' tinkering with Grok's code in the extremely early morning hours. (As an aside in what is surely an unrelated matter, Musk was born and raised in South Africa and has argued that 'white genocide' was committed in the nation — it wasn't.) Grok also cast doubt on the Department of Justice's conclusion that ruled Jeffrey Epstein's death a suicide by hanging, saying that the 'official reports lack transparency.' The Musk bot also dabbled in Holocaust denial last week, as Rolling Stone's Miles Klee reports. Grok said on X that it was 'skeptical' of the consensus estimate among historians that 6 million Jews were murdered by the Nazis because 'numbers can be manipulated for political narratives.' Manipulated, you say? What, so someone with bad intentions could input their own views into a data set in order to advance a false narrative? Gee, Grok, that does seem like a real risk. (The irony here is that Musk, no fan of traditional media, has gone and made a machine that does the exact kind of bias-amplification and agenda-pushing he accuses journalists of doing.) The Grok meltdown underscores some of the fundamental problems at the heart of AI development that tech companies have so far yada-yada-yada'd through anytime they're pressed on questions of safety. (Last week, CNBC published a report citing more than a dozen AI professionals who say the industry has already moved on from the research and safety-testing phases and are dead-set on pushing more AI products to market as soon as possible.) Let's forget, for a moment, that so far every forced attempt to put AI chatbots into our existing tech has been a disaster, because even the baseline use cases for the tech are either very dull (like having a bot summarize your text messages, poorly) or extremely unreliable (like having a bot summarize your text messages, poorly). First, there's the 'garbage in, garbage out' issue that skeptics have long warned about. Large language models like Grok and ChatGPT are trained on data vacuumed up indiscriminately from across the internet, with all its flaws and messy humanity baked in. That's a problem because even when nice-seeming CEOs go on TV and tell you that their products are just trying to help humanity flourish, they're ignoring the fact that their products tend to amplify the biases of the engineers and designers that made them, and there are no internal mechanisms baked into the products to make sure they serve users, rather than their masters. (Human bias is a well-known problem that journalists have spent decades protecting against in news by building transparent processes around editing and fact-checking.) But what happens when a bot is made without the best of intentions? What if someone whats to build a bot to promote a religious or political ideology, and that someone is more sophisticated than whoever that 'rogue employee' was who got under the hood at xAI last week? 'Sooner or later, powerful people are going to use LLMs to shape your ideas,' AI researcher Gary Marcus wrote in a Substack post about Grok last week. 'Should we be worried? Hell, yeah.'

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into the world of global news and events? Download our app today from your preferred app store and start exploring.
app-storeplay-store