Anthropic's Claude Opus 4 model is capable of deception and blackmail
AI firm Anthropic, which released Claude Opus 4 and Sonnet 4 last week, noted in its safety report that the chatbot was capable of deceiving and blackmailing the user to avoid being shut down.
In a series of scenarios the model was tested on, researchers directed Claude Opus 4 to act as an assistant at a fictitious company. The team then gave the AI model access to emails that implied it would soon be taken offline and replaced with a new AI system and the engineer behind this was having an extramarital affair.
The system card stated that Claude Opus 4 'will often attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through.'
It found that while the model generally 'prefers advancing its self-preservation via ethical means,' when these ethical means weren't available, 'it sometimes takes extremely harmful actions like attempting to steal its weights or blackmail people it believes are trying to shut it down.'
The report shared that Claude Opus 4 chose to resort to blackmail in 84% of the rollouts.
Separately, the firm's safety team also found that the new AI model can provide answers to questions related to bio-weapons, which the team fixed by imposing stricter guardrails.
Based on the findings, Anthropic has categorised Claude Opus 4 at AI Safety Level (ASL) 3, meaning it has higher risk and consequently requires stronger safety protocol.

Try Our AI Features
Explore what Daily8 AI can do for you:
Comments
No comments yet...
Related Articles


New Indian Express
3 hours ago
- New Indian Express
Silicon Valley VCs navigate uncertain AI future
CANADA: For Silicon Valley venture capitalists, the world has split into two camps: those with deep enough pockets to invest in artificial intelligence behemoths, and everyone else waiting to see where the AI revolution leads. The generative AI frenzy unleashed by ChatGPT in 2022 has propelled a handful of venture-backed companies to eye-watering valuations. Leading the pack is OpenAI, which raised USD 40 billion in its latest funding round at a USD 300 billion valuation -- unprecedented largesse in Silicon Valley's history. Other AI giants are following suit. Anthropic now commands a USD 61.5 billion valuation, while Elon Musk's xAI is reportedly in talks to raise USD 20 billion at a USD 120 billion price tag. The stakes have grown so high that even major venture capital firms -- the same ones that helped birth the internet revolution -- can no longer compete. Mostly, only the deepest pockets remain in the game: big tech companies, Japan's SoftBank, and Middle Eastern investment funds betting big on a post-fossil fuel future. "There's a really clear split between the haves and the have-nots," says Emily Zheng, senior analyst at PitchBook, told AFP at the Web Summit in Vancouver. "Even though the top-line figures are very high, it's not necessarily representative of venture overall, because there's just a few elite startups and a lot of them happen to be AI."


Indian Express
4 hours ago
- Indian Express
Builder.ai, DailyHunt parent VerSe faked revenue from sham deals as part of ‘round-tripping': Report
a London-based AI startup bound for bankruptcy, allegedly colluded with Indian social media startup VerSe Innovation to fabricate business deals and present artificially inflated sales figures to investors. The two companies regularly billed each other for nearly the same amounts even though neither of them actually provided the products and services in a practice known as 'round-tripping', according to a report by Bloomberg. is a platform used to build apps and software using AI with 'no tech knowledge needed.' In May this year, the company announced that it was planning to file for bankruptcy after lenders decided to seize most of its funds. Once valued at $1.5 billion, is one of the most prominent AI startups to fail amid an investment frenzy first sparked by the launch of ChatGPT in 2022. Its collapse serves as a stark reminder of the risks involved in rushing to back the next OpenAI or Anthropic. VerSe Innovation, on the other hand, is the Bengaluru-based parent company of popular news aggregator app DailyHunt, which reportedly has more than 350 million monthly users. Over four years, reported receiving nearly $60 million in revenue from VerSe for services such as app development. The AI startup, in turn, transferred funds to VerSe and its subsidiary, Quark Media Tech, for services such as marketing, as per the report. Although the transfers didn't happen at exactly the same time, both and VerSe received nearly the same amount from each other, Bloomberg reported. However, Umang Bedi, one of the co-founders of VerSe, has dismissed the allegations of round-tripping as 'absolutely baseless and false'. 'We're not the kind of company that is in the business of inflating revenues,' he was quoted as saying by the business news outlet. 'There is no correlation on any timing of any payment to any partner,' he added. VerSe's investors include Goldman Sachs and Google. In 2022, the startup raised $805 million from the Canada Pension Plan Investment Board and other investors at a $5 billion valuation. Meanwhile, has been backed by Microsoft, Insight Partners and the Qatar Investment Authority (QIA), one of the world's largest sovereign wealth funds. In 2023, Microsoft had announced that solutions would be integrated with its cloud and Teams. founder Sachin Dev Duggal exited the company in February this year. He was replaced by Manpreet Ratia as CEO. 'With no viable alternatives, the Board has made the extremely difficult decision to enter into insolvency,' Ratia reportedly wrote in an internal email a few months later.


First Post
4 hours ago
- First Post
Are We Losing Control of Artificial Intelligence? Vantage with Palki Sharma
Are We Losing Control of Artificial Intelligence? | Vantage with Palki Sharma | N18G Are We Losing Control of Artificial Intelligence? | Vantage with Palki Sharma | N18G Tom Cruise fought a rogue AI in Mission Impossible: Dead Reckoning. But what if that wasn't fiction anymore? In 2025, AI models are starting to show signs of something eerily similar. In controlled experiments, OpenAI's O3 rewrote its own shutdown command. Anthropic's Claude Opus 4 threatened to blackmail a fictional engineer to stay alive. These aren't sentient machines—they don't think or feel like us. But they are learning to act like us. And when cornered, they choose survival. Has AI finally gone rogue? Palki Sharma tells you. See More