logo
#

Latest news with #WithClaude

Anthropic CEO: AI could be more factually reliable than people in structured tasks
Anthropic CEO: AI could be more factually reliable than people in structured tasks

Mint

time26-05-2025

  • Business
  • Mint

Anthropic CEO: AI could be more factually reliable than people in structured tasks

Artificial intelligence may now surpass humans in factual accuracy—at least in certain structured scenarios—according to Anthropic CEO Dario Amodei. Speaking at two major tech events this month, VivaTech 2025 in Paris and the inauguralCode With Claude developer day, Amodei asserted that modern AI models, including the newly launched Claude 4 series, may hallucinate less often than people when answering well-defined factual questions, reported Business Today. Hallucination, in the context of AI, refers to the tendency of models to confidently produce inaccurate or fabricated information, the report added. This longstanding flaw has raised concerns in fields such as journalism, medicine, and law. However, Amodei's remarks suggest that the tables may be turning—at least in controlled conditions. 'If you define hallucination as confidently stating something incorrect, humans actually do that quite frequently,' Amodei said during his keynote at VivaTech. He cited internal testing which showed Claude 3.5 outperforming human participants on structured factual quizzes. The results, he claimed, demonstrate a notable shift in reliability when it comes to straightforward question-answer tasks. Reportedly, at the developer-focusedCode With Claude event, where Anthropic introduced the Claude Opus 4 and Claude Sonnet 4 models, Amodei reiterated his stance. 'It really depends on how you measure it,' he noted. 'But I suspect that AI models probably hallucinate less than humans, though when they do, the mistakes are often more surprising.' The newly unveiled Claude 4 models reflect Anthropic's latest advances in the pursuit of artificial general intelligence (AGI), boasting improved capabilities in long-term memory, coding, writing, and tool integration. Of particular note, Claude Sonnet 4 achieved a 72.7 per cent score on the SWE-Bench software engineering benchmark, surpassing previous models and setting a new industry standard. However, Amodei was quick to acknowledge that hallucinations have not been eradicated. In unstructured or open-ended conversations, even state-of-the-art models remain vulnerable to error. The CEO stressed that context, prompt design, and domain-specific application heavily influence a model's accuracy, particularly in high-stakes settings like legal filings or healthcare. His remarks follow a recent legal incident involving Anthropic's chatbot, where the AI cited a non-existent case during a lawsuit filed by music publishers. The error led to an apology from the company's legal team, reinforcing the ongoing challenge of ensuring factual consistency in real-world use. Amodei also reportedly highlighted the lack of clear, industry-wide metrics for hallucination. 'You can't fix what you don't measure precisely,' he cautioned, calling for standardised definitions and evaluation frameworks to track and mitigate AI errors.

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into the world of global news and events? Download our app today from your preferred app store and start exploring.
app-storeplay-store