Anthropic CEO: AI could be more factually reliable than people in structured tasks

Mint26-05-2025

Artificial intelligence may now surpass humans in factual accuracy—at least in certain structured scenarios—according to Anthropic CEO Dario Amodei. Speaking at two major tech events this month, VivaTech 2025 in Paris and the inauguralCode With Claude developer day, Amodei asserted that modern AI models, including the newly launched Claude 4 series, may hallucinate less often than people when answering well-defined factual questions, reported Business Today.
Hallucination, in the context of AI, refers to the tendency of models to confidently produce inaccurate or fabricated information, the report added. This longstanding flaw has raised concerns in fields such as journalism, medicine, and law. However, Amodei's remarks suggest that the tables may be turning—at least in controlled conditions.
'If you define hallucination as confidently stating something incorrect, humans actually do that quite frequently,' Amodei said during his keynote at VivaTech. He cited internal testing which showed Claude 3.5 outperforming human participants on structured factual quizzes. The results, he claimed, demonstrate a notable shift in reliability when it comes to straightforward question-answer tasks.
Reportedly, at the developer-focusedCode With Claude event, where Anthropic introduced the Claude Opus 4 and Claude Sonnet 4 models, Amodei reiterated his stance. 'It really depends on how you measure it,' he noted. 'But I suspect that AI models probably hallucinate less than humans, though when they do, the mistakes are often more surprising.'
The newly unveiled Claude 4 models reflect Anthropic's latest advances in the pursuit of artificial general intelligence (AGI), boasting improved capabilities in long-term memory, coding, writing, and tool integration. Of particular note, Claude Sonnet 4 achieved a 72.7 per cent score on the SWE-Bench software engineering benchmark, surpassing previous models and setting a new industry standard.
However, Amodei was quick to acknowledge that hallucinations have not been eradicated. In unstructured or open-ended conversations, even state-of-the-art models remain vulnerable to error. The CEO stressed that context, prompt design, and domain-specific application heavily influence a model's accuracy, particularly in high-stakes settings like legal filings or healthcare.
His remarks follow a recent legal incident involving Anthropic's chatbot, where the AI cited a non-existent case during a lawsuit filed by music publishers. The error led to an apology from the company's legal team, reinforcing the ongoing challenge of ensuring factual consistency in real-world use.
Amodei also reportedly highlighted the lack of clear, industry-wide metrics for hallucination. 'You can't fix what you don't measure precisely,' he cautioned, calling for standardised definitions and evaluation frameworks to track and mitigate AI errors.

Hashtags

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Anthropic working on building AI tools exclusively for US military and intelligence operations

India Today

8 hours ago

India Today

Anthropic working on building AI tools exclusively for US military and intelligence operations

Artificial Intelligence (AI) company Anthropic has announced that it is building custom AI tools specifically for the US military and intelligence community. These tools, under the name 'Claude Gov', are already being used by some of the top US national security agencies. Anthropic explains in its official blog post that Claude Gov models are designed to assist with a wide range of tasks, including intelligence analysis, threat detection, strategic planning, and operational support. According to Anthropic, these models have been developed based on direct input from national security agencies and are tailored to meet the specific needs of classified introducing a custom set of Claude Gov models built exclusively for US national security customers,' the company said. 'Access to these models is limited to those who operate in such classified environments.'Anthropic claims that Claude Gov has undergone the same safety checks as its regular AI models but has added capabilities. These include better handling of classified materials, improved understanding of intelligence and defence-related documents, stronger language and dialect skills critical to global operations, and deeper insights into cybersecurity data. While the company has not disclosed which agencies are currently using Claude Gov, it stressed that all deployments are within highly classified environments, and the models are strictly limited to national security use. Anthropic also reiterated its 'unwavering commitment to safety and responsible AI development.'Anthropic's move highlights a growing trend of tech companies building advanced AI tools for defence. advertisementEarlier this year, OpenAI introduced ChatGPT Gov, a tailored version of ChatGPT that was built exclusively for the US government. ChatGPT Gov tools run within Microsoft's Azure cloud, giving agencies full control over how it's deployed and managed. The Gov model shares many features with ChatGPT Enterprise, but it places added emphasis on meeting government standards for data privacy, oversight, and responsible AI usage. Besides Anthropic and OpenAI, Meta is also working with the US government to offer its tech for military month, Meta CEO Mark Zuckerberg revealed a partnership with Anduril Industries, founded by Oculus creator Palmer Luckey, to develop augmented and virtual reality gear for the US military. The two companies are working on a project called EagleEye, which aims to create a full ecosystem of wearable tech including helmets and smart glasses that give soldiers better battlefield awareness. Anduril has said these wearable systems will allow soldiers to control autonomous drones and robots using intuitive, AR-powered interfaces.'Meta has spent the last decade building AI and AR to enable the computing platform of the future,' Zuckerberg said. 'We're proud to partner with Anduril to help bring these technologies to the American service members that protect our interests at home and abroad.'Together, these developments point to a larger shift in the US defence industry, where traditional military tools are being paired with advanced AI and wearable tech.

For Some Recent Graduates, the AI Job Apocalypse may Already be Here

Time of India

9 hours ago

Time of India

For Some Recent Graduates, the AI Job Apocalypse may Already be Here

HighlightsUnemployment for recent college graduates has risen to 5.8%, with a notable increase in job displacement due to advancements in artificial intelligence, particularly in technical fields like finance and computer science. Many companies are adopting an 'AI-first' approach, with some executives reporting a halt in hiring for lower-level positions as artificial intelligence tools can now perform tasks that previously required human employees. Dario Amodei, Chief Executive Officer of Anthropic, has predicted that artificial intelligence could eliminate half of all entry-level white-collar jobs within the next five years. This month, millions of young people will graduate from college and look for work in industries that have little use for their skills, view them as expensive and expendable, and are rapidly phasing out their jobs in favour of artificial intelligence. That is the troubling conclusion of my conversations over the past several months with economists, corporate executives and young job seekers, many of whom pointed to an emerging crisis for entry-level workers that appears to be fuelled, at least in part, by rapid advances in AI capabilities. You can see hints of this in the economic data. Unemployment for recent college graduates has jumped to an unusually high 5.8% in recent months, and the Federal Reserve Bank of New York recently warned that the employment situation for these workers had 'deteriorated noticeably.' Oxford Economics, a research firm that studies labour markets, found that unemployment for recent graduates was heavily concentrated in technical fields like finance and computer science, where AI has made faster gains. 'There are signs that entry-level positions are being displaced by artificial intelligence at higher rates,' the firm wrote in a recent report. But I'm convinced that what's showing up in the economic data is only the tip of the iceberg. In interview after interview, I'm hearing that firms are making rapid progress toward automating entry-level work and that AI companies are racing to build 'virtual workers' that can replace junior employees at a fraction of the cost. Corporate attitudes toward automation are changing, too — some firms have encouraged managers to become 'AI-first,' testing whether a given task can be done by AI before hiring a human to do it. One tech executive recently told me his company had stopped hiring anything below an L5 software engineer — a mid-level title typically given to programmers with three to seven years of experience — because lower-level tasks could now be done by AI coding tools. Another told me that his startup now employed a single data scientist to do the kinds of tasks that required a team of 75 people at his previous company. Anecdotes like these don't add up to mass joblessness, of course. Most economists believe there are multiple factors behind the rise in unemployment for college graduates, including a hiring slowdown by big tech companies and broader uncertainty about President Donald Trump's economic policies. But among people who pay close attention to what's happening in AI, alarms are starting to go off. 'This is something I'm hearing about left and right,' said Molly Kinder, a fellow at the Brookings Institution, a public policy think tank, who studies the impact of AI on workers. 'Employers are saying, 'These tools are so good that I no longer need marketing analysts, finance analysts and research assistants.'' Using AI to automate white-collar jobs has been a dream among executives for years. (I heard them fantasising about it in Davos back in 2019.) But until recently, the technology simply wasn't good enough. You could use AI to automate some routine back-office tasks — and many companies did — but when it came to the more complex and technical parts of many jobs, AI couldn't hold a candle to humans. That is starting to change, especially in fields, such as software engineering, where there are clear markers of success and failure. (Such as: Does the code work or not?) In these fields, AI systems can be trained using a trial-and-error process known as reinforcement learning to perform complex sequences of actions on their own. Eventually, they can become competent at carrying out tasks that would take human workers hours or days to complete. This approach was on display last week at an event held by Anthropic, the AI company that makes the Claude chatbot. The company claims that its most powerful model, Claude Opus 4, can now code for 'several hours' without stopping — a tantalising possibility if you're a company accustomed to paying six-figure engineer salaries for that kind of productivity. AI companies are starting with software engineering and other technical fields because that's where the low-hanging fruit is. (And, perhaps, because that's where their own labour costs are highest.) But these companies believe the same techniques will soon be used to automate work in dozens of occupations, ranging from consulting to finance to marketing. Dario Amodei, Anthropic's CEO, recently predicted that AI could eliminate half of all entry-level white-collar jobs within five years. That timeline could be wildly off, if firms outside tech adopt AI more slowly than many Silicon Valley companies have, or if it's harder than expected to automate jobs in more creative and open-ended occupations where training data is scarce.

Anthropic co-founder Jared Kaplan says Claude access for Windsurf was cut because of OpenAI

India Today

9 hours ago

India Today

Anthropic co-founder Jared Kaplan says Claude access for Windsurf was cut because of OpenAI

Anthropic co-founder Jared Kaplan has confirmed that Anthropic deliberately cut Windsurf's direct access to its Claude models due to ongoing reports that OpenAI plans to acquire Windsurf. Kaplan's reasoning is that 'it would be odd for us to be selling Claude to OpenAI' through a third party. In this case, it is response and confirmation comes after Windsurf CEO Varun Mohan publicly slammed Anthropic for cutting off Windsurf's first-party access to Claude 3.x models with less than a week's notice, forcing the popular AI-native IDE (short for Integrated Development Environment) to make last-minute adjustments for its user base. This was not a one-off incident either. Earlier, Anthropic had barred Windsurf users from accessing the new Claude Sonnet 4 and Opus 4 models on day one of was widely speculated that the purported OpenAI acquisition would be a big bone of contention, since logic dictates that Anthropic may not want OpenAI – a competing AI brand – to have any type of open window to its user data which it could then use to train its own ChatGPT models. Kaplan has basically admitted to this conspiracy theory, giving a bit of an insight into Anthropic's core reasoning behind – what some might call – severing ties with a platform used by over a million developers globally. There are two reasons. One is that Anthropic – like any other company – would want to focus on long-term customers, those it can have long-term partnerships with. Secondly, it won't be smart to spend resources – meaning compute – which is limited to clients that may or may not be around in the near did not address the elephant in the room, which is whether it was okay with OpenAI getting access to its data if it ends up buying Windsurf, as per reports. Obviously, he did not make any comment on where the industry would go if this became a common practice, just like he did not say if Windsurf users should expect uninterrupted access to Claude without Anthropic keys anytime CEO Varun Mohan has called it a 'short-term' issue, hinting that discussions are probably on for some middle ground. In the meantime, Windsurf is actively working to bring new capacity online while launching a promotional scheme for Google's Gemini 2.5 Pro, offering it at 0.75x its original price. Also, it has implemented a "bring-your-own-key" (BYOK) system for Claude Sonnet 4 and Opus 4 as well as for the Claude 3.x models, while removing direct access for free users and those on Pro plan trials.'We have been very clear to the Anthropic team that our priority was to keep the Anthropic models as recommended models and have been continuously willing to pay for the capacity,' Mohan said in a blog post, adding that 'We are concerned that Anthropic's conduct will harm many in the industry, not just Windsurf.'

Anthropic CEO: AI could be more factually reliable than people in structured tasks

Hashtags

Try Our AI Features

Comments

Related Articles

Anthropic working on building AI tools exclusively for US military and intelligence operations

For Some Recent Graduates, the AI Job Apocalypse may Already be Here

Anthropic co-founder Jared Kaplan says Claude access for Windsurf was cut because of OpenAI

Get Started Now: Download the App