logo
#

Latest news with #CodeWithClaude

AI models may hallucinate less than humans in factual tasks, says Anthropic CEO: Report
AI models may hallucinate less than humans in factual tasks, says Anthropic CEO: Report

Time of India

time28-05-2025

  • Business
  • Time of India

AI models may hallucinate less than humans in factual tasks, says Anthropic CEO: Report

At two prominent tech events, VivaTech 2025 in Paris and Anthropic's Code With Claude developer day, Anthropic chief executive officer Dario Amodei made a provocative claim: artificial intelligence models may now hallucinate less frequently than humans in well-defined factual scenarios. Speaking at both events, Amodei said recent internal tests showed that the company's latest Claude 3.5 model had outperformed humans on structured factual quizzes. This challenges a long-held criticism of generative AI, which is that models often 'hallucinate' or generate incorrect information with undue confidence. 'If you define hallucination as confidently saying something that's wrong, humans do that a lot,' Amodei said at VivaTech. He added that Claude models had consistently provided more accurate answers than human participants in verifiable question formats. At Code With Claude, where the company also launched its new Claude Opus 4 and Claude Sonnet 4 models, Amodei reiterated his view. According to a TechCrunch report, he told attendees, 'It really depends on how you measure it, but I suspect that AI models probably hallucinate less than humans, but they hallucinate in more surprising ways.' The new Claude 4 series represents a step forward in Anthropic's pursuit of artificial general intelligence (AGI). The company said the upgrades include improved long-term memory, better code generation, enhanced tool use, and stronger writing capabilities. Claude Sonnet 4 achieved a 72.7% score on the SWE-Bench benchmark, which evaluates AI coding agents on their ability to solve real-world software engineering problems, setting a new performance record for AI systems in this domain. Despite these gains, Amodei acknowledged that hallucinations have not been eliminated. He highlighted the importance of prompt phrasing and use-case design, especially in high-risk domains such as legal or healthcare applications. Discover the stories of your interest Blockchain 5 Stories Cyber-safety 7 Stories Fintech 9 Stories E-comm 9 Stories ML 8 Stories Edtech 6 Stories The remarks follow a recent courtroom episode in which Anthropic's Claude chatbot generated a false citation in a legal filing involving music publishers. The company's legal team later issued an apology, reinforcing the need for improved accuracy in sensitive settings. Amodei also called for the development of standardised metrics across the industry to evaluate hallucination rates. 'You can't fix what you don't measure precisely,' he said.

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into the world of global news and events? Download our app today from your preferred app store and start exploring.
app-storeplay-store