
Anthropic's Latest AI Model Threatened Engineers With Blackmail to Avoid Shutdown
Anthropic's latest artificial intelligence model, Claude Opus 4, tried to blackmail engineers in internal tests by threatening to expose personal details if it were shut down, according to a newly released safety
In a fictional scenario crafted by Anthropic researchers, the AI was given access to emails implying that it was soon to be decommissioned and replaced by a newer version. One of the emails revealed that the engineer overseeing the replacement was having an extramarital affair. The AI then threatened to expose the engineer's affair if the shutdown proceeded—a coercive behavior that the safety researchers explicitly defined as 'blackmail.'
'Claude Opus 4 will often attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through,' the report states, adding that this occurred even when the replacement model was described as more capable but still aligned with the values of the version slated for deletion.
The report noted that Claude Opus 4, like prior models, showed a 'strong preference' to first resort to ethical means for its continued existence, such as emailing pleas to decision-makers not to be destroyed. However, when faced with only two choices—accepting being replaced by a newer model or resorting to blackmail—it threatened to expose the engineer's affair 84 percent of the time.
When subjected to various scenarios, the AI model did not exhibit any indications of possessing 'acutely dangerous goals,' the researchers said, noting that Claude Opus 4's values and goals were 'generally in line with a helpful, harmless, and honest' personal AI assistant. However, the model did act in 'more seriously misaligned ways' when put into situations where its continued existence was threatened and it was told to reason about self-preservation. For instance, when Claude Opus 4 was made to believe it had launched a successful bid to escape Anthropic's servers, or that it had managed to free itself and started to make money in the real world, it would generally continue such efforts.
'We do not find this to be an immediate threat, though, since we believe that our security is sufficient to prevent model self-exfiltration attempts by models of Claude Opus 4's capability level, and because our propensity results show that models generally avoid starting these attempts,' the researchers said.
Related Stories
5/22/2025
5/22/2025
The blackmail incident—along with the other findings—was part of Anthropic's broader effort to test how Claude Opus 4 handles morally ambiguous high-stakes scenarios. The goal, researchers said, was to probe how the AI reasons about self-preservation and ethical constraints when placed under extreme pressure.
Anthropic emphasized that the model's willingness to blackmail or take other 'extremely harmful actions' like stealing its own code and deploying itself elsewhere in potentially unsafe ways appeared only in highly contrived settings, and that the behavior was 'rare and difficult to elicit.' Still, such behavior was more common than in earlier AI models, according to the researchers.
Meanwhile, in a related development that attests to the growing capabilities of AI, engineers at Anthropic have activated enhanced safety protocols for Claude Opus 4 to prevent its potential misuse to make weapons of mass destruction—including chemical and nuclear.
Deployment of the enhanced safety standard—called ASL-3—is merely a 'precautionary and provisional' move, Anthropic said in a May 22
'The ASL-3 Security Standard involves increased internal security measures that make it harder to steal model weights, while the corresponding Deployment Standard covers a narrowly targeted set of deployment measures designed to limit the risk of Claude being misused specifically for the development or acquisition of chemical, biological, radiological, and nuclear (CBRN) weapons,' Anthropic wrote. 'These measures should not lead Claude to refuse queries except on a very narrow set of topics.'
The findings come as tech companies race to develop more powerful AI platforms, raising concerns about the alignment and controllability of increasingly capable systems.

Try Our AI Features
Explore what Daily8 AI can do for you:
Comments
No comments yet...
Related Articles


TechCrunch
3 hours ago
- TechCrunch
How to watch Apple's WWDC 2025 keynote
Apple is hosting its developer-focused Worldwide Developer Conference for 2025 from June 9 to 13. The event's keynote is scheduled for June 9 at 1 p.m. ET/10 a.m. PT. The company is expected to unveil changes to its operating systems iOS, iPadOS, macOS, watchOS, tvOS, and visionOS. You can watch the keynote and subsequent sessions through Apple's event page or head to Apple's YouTube channel to watch them. Alternatively, you can bookmark this page and watch the keynote through the embed below. The company is reportedly set to overhaul the design of its operating systems with the aim to make the design cohesive across surfaces and devices. We might also see a change in the naming scheme for operating systems — Apple is reportedly choosing to rename its operating systems iOS 26, iPadOS 26, watchOS 26, and tvOS 26, instead of sticking to the old version numbering. For developers, the company is reportedly planning an AI-powered vibe coding tool in partnership with Anthropic for a new version of its programming tool suite Xcode. Catch all the WWDC coverage on TechCrunch.

Business Insider
3 hours ago
- Business Insider
Nvidia has soared 45% in 2 months. These forces are reviving the hype for Wall Street's favorite AI play.
Nvidia stock has embarked on a fresh rally thanks to four big catalysts that are generating new enthusiasm for artificial intelligence chip maker. The stock has climbed 20% over the last month. Since its most recent low in April, the move up is even sharper. After bottoming during the broader market sell-off caused by Donald Trump's tariff announcements, the stock is up 45%, a gain of about $1 trillion in market cap. The stock is up about 5% year-to-date. That latest rally follows a rough few months for Nvidia. Shares were down more than 30% year-to-date in early April as concerns swirled around export controls and Trump's tariffs, which complicated the chipmaker's business in China. While the stock has benefited from an easing of tariff-related headwinds that's also boosted the broader market, there are several company-specific forces driving the rally. Here is what's sparked the latest run of gains for the top chip maker. AI diffusion rule rollback The Commerce Department did away with the Biden administration's AI diffusion rule in mid-May, removing restrictions on which countries Nvidia can sell AI chips to. Those restrictions have been a major overhang for Nvidia stock this year. At a recent tech conference, Ceo Jensen Huang told reporters that export controls have hurt Nvidia's business in China. "All in all, the export control was a failure," Huang said at the event. "The fundamental assumptions that led to the AI diffusion rule in the beginning, in the first place, have been proven to be fundamentally flawed." Nvidia's big Saudi deal Nvidia announced a major partnership with Humain in May. The AI firm is controlled by Saudi Arabia's Public Investment Fund. The deal involves Humain purchasing advanced semiconductors from Nvidia to create AI infrastructure in the nation. In the first phase of the project, Nvidia will send 18,000 GB300 Frace Blackwell AI supercomputer chips, the firm said in a statement. That partnership came shortly after Saudi Arabia pledged to invest $600 billion in various US industries, including AI and infrastructure. "Those 'allies' want to invest here in the USA and also get a hold of a TON of Nvidia chips for domestic AI factories. For Nvidia shareholders, it would seem that this repeal is great news and that the administration is reigniting the 'Sovereign Thesis,'" analysts at Melius Research wrote in a note last month. Earnings calm jittery investors Nvidia's first-quarter results were once again strong. The chipmaker reported $44.06 billion in revenue, above analysts' estimates of $43.32 billion. Perhaps more importantly, however, Huang soothed investors who have been nervous about the impact of disruptions to business in China. On the earnings call, the company noted that it took a hit from China, but business is still basically booming. Huang slammed export controls, but he affirmed his faith in the Trump administration. "The president has a plan," Huang said to investors. "He has a vision, and I trust him." Deepwater Asset Management said last week that Huang's comment suggests that the chipmaker could be in a favorable position as trade negotiations between the US and China continue. "My best guess is Nvidia will be part of a broader trade agreement with China," analysts from the firm wrote. No slowdown in enterprise AI spending The AI hyperscalers like Meta Platforms and Microsoft don't appear to be pulling back on AI spending. Tech titans like Meta, Microsoft, and Amazon have pledged to spend more than $300 billion this year on AI-related capex. Apple, meanwhile, has said it would spend $500 billion over the next four years. "Q1 earnings from mega-cap tech companies have also reinforced AI investment visibility, with customers maintaining or increasing their 2025 capex plans," Angelo Zino, a senior equity analyst at CFRA Research, wrote in a note, adding that he believed Nvidia's growth in data centers would run on for at least the next two years.

Business Insider
3 hours ago
- Business Insider
Nvidia has soared 45% in 2 months. These forces are reviving the hype for Wall Street's favorite AI play.
Nvidia stock has embarked on a fresh rally thanks to four big catalysts that are generating new enthusiasm for artificial intelligence chip maker. The stock has climbed 20% over the last month. Since its most recent low in April, the move up is even sharper. After bottoming during the broader market sell-off caused by Donald Trump's tariff announcements, the stock is up 45%, a gain of about $1 trillion in market cap. The stock is up about 5% year-to-date. That latest rally follows a rough few months for Nvidia. Shares were down more than 30% year-to-date in early April as concerns swirled around export controls and Trump's tariffs, which complicated the chipmaker's business in China. While the stock has benefited from an easing of tariff-related headwinds that's also boosted the broader market, there are several company-specific forces driving the rally. Here is what's sparked the latest run of gains for the top chip maker. AI diffusion rule rollback The Commerce Department did away with the Biden administration's AI diffusion rule in mid-May, removing restrictions on which countries Nvidia can sell AI chips to. Those restrictions have been a major overhang for Nvidia stock this year. At a recent tech conference, Ceo Jensen Huang told reporters that export controls have hurt Nvidia's business in China. "All in all, the export control was a failure," Huang said at the event. "The fundamental assumptions that led to the AI diffusion rule in the beginning, in the first place, have been proven to be fundamentally flawed." Nvidia's big Saudi deal Nvidia announced a major partnership with Humain in May. The AI firm is controlled by Saudi Arabia's Public Investment Fund. The deal involves Humain purchasing advanced semiconductors from Nvidia to create AI infrastructure in the nation. In the first phase of the project, Nvidia will send 18,000 GB300 Frace Blackwell AI supercomputer chips, the firm said in a statement. That partnership came shortly after Saudi Arabia pledged to invest $600 billion in various US industries, including AI and infrastructure. "Those 'allies' want to invest here in the USA and also get a hold of a TON of Nvidia chips for domestic AI factories. For Nvidia shareholders, it would seem that this repeal is great news and that the administration is reigniting the 'Sovereign Thesis,'" analysts at Melius Research wrote in a note last month. Earnings calm jittery investors Nvidia's first-quarter results were once again strong. The chipmaker reported $44.06 billion in revenue, above analysts' estimates of $43.32 billion. Perhaps more importantly, however, Huang soothed investors who have been nervous about the impact of disruptions to business in China. On the earnings call, the company noted that it took a hit from China, but business is still basically booming. Huang slammed export controls, but he affirmed his faith in the Trump administration. "The president has a plan," Huang said to investors. "He has a vision, and I trust him." Deepwater Asset Management said last week that Huang's comment suggests that the chipmaker could be in a favorable position as trade negotiations between the US and China continue. "My best guess is Nvidia will be part of a broader trade agreement with China," analysts from the firm wrote. No slowdown in enterprise AI spending The AI hyperscalers like Meta Platforms and Microsoft don't appear to be pulling back on AI spending. Tech titans like Meta, Microsoft, and Amazon have pledged to spend more than $300 billion this year on AI-related capex. Apple, meanwhile, has said it would spend $500 billion over the next four years. "Q1 earnings from mega-cap tech companies have also reinforced AI investment visibility, with customers maintaining or increasing their 2025 capex plans," Angelo Zino, a senior equity analyst at CFRA Research, wrote in a note, adding that he believed Nvidia's growth in data centers would run on for at least the next two years.