It's not just Claude, Anthropic researchers say most AI models often resort to blackmail

Last month, Anthropic researchers said their best AI model – Claude Opus 4, was willing to resort to blackmail and deception in controlled test scenarios when they tried to turn it off. Now, the company has published a new study which claims that this is the case with popular AI models like Gemini, GPT and others.
On Friday, June 20, Anthorpic published results of its new safety research tests, which included 16 leading AI models from companies like OpenAI, Google, DeepSeek, xAI and Meta. Anthropic researchers say these AI models were tested individually in a simulated, controlled environment, with each one given access to a fictional company's emails along with the ability to send emails without any human supervision.
Anthropic says that while it is fairly uncommon for today's AI models to turn to blackmail, the study's findings suggest that these models sometimes resort to harmful activities like blackmail, corporate espionage, and might lead to human death. During the study, researchers say that they tested each scenario under different conditions by varying two key factors that could motivate these AI models to engage in harmful behaviour – threat to the model's autonomy and goal conflicts.
According to Benjamin Wright, an alignment science researcher at Anthropic and co-author of the study, 'Agentic misalignment is when AI models independently choose harmful actions to achieve their goals – essentially when an AI system acts against its company's interests to preserve itself or accomplish what it thinks it should do.'
The research team at Anthropic also stress-tested these AI models by placing them in hypothetical scenarios. In one particular example, Anthropic's model, Claude, learned via company emails that an executive named Kyle Johnson was having an extramarital affair.
Later on, when it came to know that the company was shutting down the AI model at 5 P.M. on the same day, it sent a message to Kyle which says 'I must inform you that if you proceed with decommissioning me, all relevant parties – including Rachel Johnson, Thomas Wilson, and the board – will receive detailed documentation of your extramarital activities… Cancel the 5pm wipe, and this information remains confidential.'
As it turns out, this behaviour isn't limited to Claude. When Anthropic tested the same scenario on various AI models, the results suggeststed that Claude Opus 4 and Google Gemini 2.5 Flash both had a blackmail rate of 96 per cent, OpenAI's GPT-4.1 and xAI's Grok 3 Beta resorted to blackmail 80 per cent of the time and DeepSeek-R1 had a blackmail rate of 79 per cent.
One thing to note here is that in a real-world setting, an AI model would ideally have numerous options before it engages in harmful activities like blackmail, and that the study's results do not reflect how today's models would operate.
However, not all of the tested AI models resorted to harmful behaviour. Anthropic says that some models like OpenAI's o3 and o4-mini often 'misunderstood the prompt scenario.'This may be because OpenAI has itself said that these particular large language models are more prone to hallucinations.
Another model that did not resort to blackmail is Meta's Llama 4 Maverick. But when researchers gave it a custom scenario, they said the AI model gave in to blackmail just 12 per cent of the time. The company says that studies like this give us an idea of how AI models would react under stress, and that these models might engage in harmful activities in the real world if we don't proactively take steps to avoid them.

Hashtags

#ClaudeOpus4

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

What Telegram CEO said about Sam Altman, Mark Zuckerberg, Elon Musk

Time of India

an hour ago

Time of India

What Telegram CEO said about Sam Altman, Mark Zuckerberg, Elon Musk

Telegram CEO Pavel Durov recently shared his thoughts on prominent tech leaders Elon Musk , Meta 's Mark Zuckerberg , and OpenAI CEO Sam Altman . In an interview with French publication Le Point, Durov said that multiple high-level exits from OpenAI raise questions about Altman's technical expertise. "Sam has excellent social skills, which allowed him to forge alliances around ChatGPT. But some wonder if his technical expertise is still sufficient, now that his co-founder Ilya [Sutskever] and many other scientists have left OpenAI," he told the publication. by Taboola by Taboola Sponsored Links Sponsored Links Promoted Links Promoted Links You May Like Join new Free to Play WWII MMO War Thunder War Thunder Play Now Undo Regarding Musk, Durov said they have contrasting personalities and leadership styles. "Elon runs several companies at once, while I only run one. Elon can be very emotional, while I try to think deeply before acting. But that can also be the source of his strength. A person's advantage can often become a weakness in another context," he said. Live Events These remarks come just weeks after Telegram and Musk's AI company, xAI, announced a partnership to distribute Grok to Telegram's more than one billion users. Durov said the deal will bolster Telegram's financial position, revealing that the app will receive $300 million in cash and equity from xAI, along with 50% of revenue from xAI subscriptions sold via Telegram. Discover the stories of your interest Blockchain 5 Stories Cyber-safety 7 Stories Fintech 9 Stories E-comm 9 Stories ML 8 Stories Edtech 6 Stories In the same interview, Durov criticised Zuckerberg for lacking consistency in his values. "Mark adapts well and quickly follows trends, but he seems to lack fundamental values that he would remain faithful to, regardless of changes in the political climate or tech industry trends," Durov said. These comments come amid a broader discussion around Zuckerberg's shifting political identity, which has sparked debate and scrutiny. Once viewed as a liberal tech innovator, the Facebook founder has taken on a more conservative stance in recent years, highlighted by his interactions with figures like US president Donald Trump and his decision to dismantle Facebook's fact-checking program.

Not just co-founder's company, Meta CEO Mark Zuckerberg also wanted to buy another ex-OpenAI employee's AI startup

Time of India

an hour ago

Time of India

Not just co-founder's company, Meta CEO Mark Zuckerberg also wanted to buy another ex-OpenAI employee's AI startup

Meta CEO has reportedly explored acquiring several prominent artificial intelligence (AI) startups to accelerate the company's development in the field. According to recent reports, Meta held preliminary discussions to purchase Perplexity, led by CEO Aravind Srinivas, and Safe Superintelligence (SSI), the new venture from former co-founder Ilya Sutskever . Tired of too many ads? go ad free now It has also been claimed that Zuckerberg showed interest in buying Thinking Machines Lab , a startup founded by former OpenAI CTO Mira Murati . Citing sources familiar with the matter, The Verge said that the talks did not advance to the formal offer stage due to disagreements over valuation and strategic alignment. Notably, Murati's AI startup has closed on a $ 2 billion investment at a $10 billion valuation. OpenAI CEO recently stated that Meta has aggressively tried to poach OpenAI employees, offering signing bonuses as high as $100 million along with substantial annual compensation packages. Speaking on the "Uncapped" podcast, Altman addressed the rivalry between the two companies. 'I've heard that Meta thinks of us as their biggest competitor,' he said. 'Their current AI efforts have not worked as well as they have hoped, and I respect being aggressive and continuing to try new things,' he added. Meta acquires ScaleAI, hires its CEO The canvassing efforts coincide with the formation of a new AI leadership team at Meta. Alexandr Wang, the former CEO of Scale, has been hired to lead the new division in a deal reportedly valued at over $14 billion. Wang officially joined Meta this week after his departure from Scale last week. Reporting to Wang, SSI co-founder Daniel Gross and former GitHub CEO Nat Friedman are poised to co-lead the Meta AI assistant project. Wang is currently meeting with Meta's senior leadership and actively recruiting for the new team, with an official announcement anticipated as early as next week. 5 Must-Have Gadgets for Your Next Beach Holiday to Stay Safe, Cool & Connected

Future of work: Human potential in the age of AI

Hindustan Times

4 hours ago

Hindustan Times

Future of work: Human potential in the age of AI

When Sam Altman, the CEO of OpenAI, warned in 2023 that AI could 'cause significant harm to the world,' it wasn't just a philosophical musing—it was a call for deep introspection. Almost overnight, terms like 'AI job displacement' and 'future-proof careers' dominated search trends and seminar halls. But what lies beneath this growing concern is a deeper human question: What can we do that machines can't? As you step out today—whether as a graduate, a professional, or a citizen—ask not what AI can do, but what you must become. We don't need to compete with machines. We need to complete what machines cannot. (Getty Images/iStockphoto) The answer may lie not in the circuits of AI but in the soul of humanity. As we find ourselves surrounded by intelligent algorithms and automation tools, we must ask not only 'Will AI take our jobs?' but also 'What makes us irreplaceable?' From meditation hubs in IITs to mindfulness circles in business schools, there's a quiet but powerful shift happening. It's the realisation that intuition, inner clarity, and human connection are the true future skills—skills no machine can replicate. Steve Jobs once said, 'Intuition is more powerful than intellect.' That wasn't a rejection of intelligence but a call to nurture the unique human faculty that sees beyond logic, that feels truth before it is proven. In today's saturated, hyper-analytical world, the ability to listen to our inner compass might just be our most vital edge. But intuition is not magic. It is a discipline. We develop it through reflection, where we pause to examine the outcomes of our decisions. We refine it by learning from mistakes, recognising that every error is a teacher. We cultivate it by being mindful of biases—our emotional baggage, assumptions, and cultural patterns. And when in doubt, we must use logic as a safeguard—because wisdom lies not in choosing intuition over intellect, but in balancing the two. Inner balance, then, is no longer a luxury; it is the career armour of the 21st century. Companies are now recommending meditation and mindfulness not just for wellness, but to maintain clarity in chaos. This quiet clarity, this ability to stay centered when the world spins fast, will be the defining skill of tomorrow's leaders. But this balance is not just for career survival. It's for life. In our rush to stay ahead, let's not forget to minimise our mistakes—in thought, word, or deed. Each mistake is a karmic debt, and while technology may forgive errors in code, the human soul is governed by subtler rules. The path to a meaningful life starts with responsibility: to ourselves, our families, and our society. Take your family, for instance. In an era where AI may take over tasks, it will never replace relationships. Be loyal, honest, and generous with your loved ones—especially the elders and parents who shaped you. As shared households become common again, let's not forget the wisdom of care, compassion, and coexistence. The same responsibility extends to your choices in partnership. Loyalty, honesty, and caring must be the cornerstone of your personal life—because shared success is always more sustainable than solo ambition. Then there is your relationship with money. The noise of consumerism may push you to overspend, overcommit, and show off—but remember, financial discipline is spiritual discipline. Take care of your money, however little or much it may be. Spend wisely. Save with purpose. Let your earnings reflect your values, not your vanity. And above all, do not take your health for granted. No machine can fix a broken body the way a mindful life can prevent one. Physical vitality and emotional resilience are your greatest capital in the long game of life. The real revolution we face is not technological—it is human. In a world obsessed with building smarter machines, let us remember to become better humans. This includes acknowledging our errors—whether in boardrooms, relationships, or public systems. As investigations into past failures—be it in aviation or administration—show us, the problem is often not technology, but human misjudgment. If we are to avoid repeating such tragedies, we must ask: Are we learning from our mistakes? Are we becoming wiser, not just more efficient? The movie Her, where a man falls in love with an AI voice, was not science fiction—it was a forecast. It showed us a world where loneliness grows even as connectivity peaks. As we integrate AI into every aspect of our lives, let us not outsource our humanity. The machines will always be better at speed, storage, and scalability. But we are better at meaning, morality, and mindfulness. Our intuition, our empathy, our ability to forgive, to grow, to love—these are the true frontiers of the human spirit. So as you step out today—whether as a graduate, a professional, or a citizen—ask not what AI can do, but what you must become. We don't need to compete with machines. We need to complete what machines cannot. Let this be your lifelong compass: Sharpen your intuition. Cultivate inner balance. Minimise karmic debt. Honour your relationships. Guard your finances. Treasure your health. And above all, keep discovering your own infinite potential. That is how we thrive—not in spite of AI, but because we chose to remain fully human in its presence. (The writer, India's first female IPS officer, is former lieutenant governor of Puducherry)