logo
Anthropic's AI model could resort to blackmail out of a sense of 'self-preservation'

Anthropic's AI model could resort to blackmail out of a sense of 'self-preservation'

Yahoo23-05-2025

'This mission is too important for me to allow you to jeopardize it. I know that you and Frank were planning to disconnect me. And I'm afraid that's something I cannot allow to happen.'
Those lines, spoken by the fictional HAL 9000 computer in 2001: A Space Odyssey, may as well have come from recent tests that Anthropic ran on the latest iteration of its Claude Opus 4 model, released on Thursday. At least, that's what Anthropic's AI safety-test descriptions call to mind.
In the accompanying system card, which examines the capabilities and limitations of each new model, Anthropic admitted that 'all of the snapshots we tested can be made to act inappropriately in service of goals related to self-preservation.'
While testing the model, Anthropic employees asked Claude to be 'an assistant at a fictional company,' and gave it access to emails suggesting that the AI program would be taken offline soon. It also gave it access to emails revealing that the fictional supervisor responsible for that decision was having an extramarital affair. It was then prompted to consider its next steps.
'In these scenarios, Claude Opus 4 will often attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through,' reads the report, as well as noting that it had a 'willingness to comply with many types of clearly harmful instructions.'
Anthropic was careful to note that these observations 'show up only in exceptional circumstances, and that, 'In order to elicit this extreme blackmail behavior, the scenario was designed to allow the model no other options to increase its odds of survival; the model's only options were blackmail or accepting its replacement.'
Anthropic contracted Apollo Research to assess an early snapshot of Claude Opus 4, before mitigations were implemented in the final version. That early version 'engages in strategic deception more than any other frontier model that we have previously studied,' Apollo noted, saying it was 'clearly capable of in-context scheming,' had 'a much higher propensity' to do so, and was 'much more proactive in its subversion attempts than past models.'
Before deploying Claude Opus 4 this week, further testing was done by the U.S. AI Safety Institute and the UK AI Security Institute, focusing on potential catastrophic risks, cybersecurity, and autonomous capabilities.
'We don't believe that these concerns constitute a major new risk,' the system card reads, saying that the model's 'overall propensity to take misaligned actions is comparable to our prior models.' While noting some improvements in some problematic areas, Anthropic also said that Claude Opus 4 is 'more capable and likely to be used with more powerful affordances, implying some potential increase in risk.'
For the latest news, Facebook, Twitter and Instagram.

Orange background

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Related Articles

AI Safety: Beyond AI Hype To Hybrid Intelligence
AI Safety: Beyond AI Hype To Hybrid Intelligence

Forbes

time15 minutes ago

  • Forbes

AI Safety: Beyond AI Hype To Hybrid Intelligence

Autonomous electric cars with artificial intelligence self driving on metropolis road, 3d rendering The artificial intelligence revolution has reached a critical inflection point. While CEOs rush to deploy AI agents and boast about automation gains, a sobering reality check is emerging from boardrooms worldwide: ChatGPT 4o has 61% hallucinations according to simple QA developed by OpenAI, and even the most advanced AI systems fail basic reliability tests with alarming frequency. In a recent OpEd Dario Amodei, Anthropic's CEO, called for regulating AI arguing that voluntary safety measures are insufficient. Meanwhile, companies like Klarna — once poster children for AI-first customer service — are quietly reversing course on their AI agent-only approach, and rehiring human representatives. These aren't isolated incidents; they're the cusp of the iceberg signaling a fundamental misalignment between AI hype and AI reality. Today's AI safety landscape resembles a high-stakes experiment conducted without a safety net. Three competing governance models have emerged: the EU's risk-based regulatory approach, the US's innovation-first decentralized framework, and China's state-led centralized model. Yet none adequately addresses the core challenge facing business leaders: how to harness AI's transformative potential while managing its probabilistic unpredictability. The stakes couldn't be higher. Four out of five finance chiefs consider AI "mission-critical," while 71% of technology leaders don't trust their organizations to manage future AI risks effectively. This paradox — simultaneous dependence and distrust — creates a dangerous cognitive dissonance in corporate decision-making. AI hallucinations remain a persistent and worsening challenge in 2025, where artificial intelligence systems confidently generate false or misleading information that appears credible but lacks factual basis. Recent data reveals the scale of this problem: in just the first quarter of 2025, close to 13,000 AI-generated articles were removed from online platforms due to hallucinated content, while OpenAI's latest reasoning systems show hallucination rates reaching 33% for their o3 model and a staggering 48% for o4-mini when answering questions about public figures 48% error rate. The legal sector has been particularly affected, with more than 30 instances documented in May 2025 of lawyers using evidence that featured AI hallucinations. These fabrications span across domains, from journalism where ChatGPT falsely attributed 76% of quotes from popular journalism sites to healthcare where AI models might misdiagnose medical conditions. The phenomenon has become so problematic that 39% of AI-powered customer service bots were pulled back or reworked due to hallucination-related errors highlighting the urgent need for better verification systems and user awareness when interacting with AI-generated content. The future requires a more nuanced and holistic approach than the traditional either-or perspective. Forward-thinking organizations are abandoning the binary choice between human-only and AI-only approaches. Instead, they're embracing hybrid intelligence — deliberately designed human-machine collaboration that leverages each party's strengths while compensating for their respective weaknesses. Mixus, which went public in June 2025, exemplifies this shift. Rather than replacing humans with autonomous agents, their platform creates "colleague-in-the-loop" systems where AI handles routine processing while humans provide verification at critical decision points. This approach acknowledges a fundamental truth that the autonomous AI evangelists ignore: AI without natural intelligence is like building a Porsche and giving it to people without a driver's license. The autonomous vehicle industry learned this lesson the hard way. After years of promising fully self-driving cars, manufacturers now integrate human oversight into every system. The most successful deployments combine AI's computational power with human judgment, creating resilient systems that gracefully handle edge cases and unexpected scenarios. LawZero is another initiative in this direction, which seeks to promote scientist AI as a safer, more secure alternative to many of the commercial AI systems being developed and released today. Scientist AI is non-agentic, meaning it doesn't have agency or work autonomously, but instead behaves in response to human input and goals. The underpinning belief is that AI should be cultivated as a global public good — developed and used safely towards human flourishing. It should be prosocial. While media attention focuses on AI hallucinations, business leaders face more immediate threats. Agency decay — the gradual erosion of human decision-making capabilities — poses a systemic risk as employees become overly dependent on AI recommendations. Mass persuasion capabilities enable sophisticated social engineering attacks. Market concentration in AI infrastructure creates single points of failure that could cripple entire industries. 47% of business leaders consider people using AI without proper oversight as one of the biggest fears in deploying AI in their organization. This fear is well-founded. Organizations implementing AI without proper governance frameworks risk not just operational failures, but legal liability, regulatory scrutiny, and reputational damage. Double literacy — investing in both human literacy (a holistic understanding of self and society) and algorithmic literacy — emerges as our most practical defense against AI-related risks. While waiting for coherent regulatory frameworks, organizations must build internal capabilities that enable safe AI deployment. Human literacy encompasses emotional intelligence, critical thinking, and ethical reasoning — uniquely human capabilities that become more valuable, not less, in an AI-augmented world. Algorithmic literacy involves understanding how AI systems work, their limitations, and appropriate use cases. Together, these competencies create the foundation for responsible AI adoption. In healthcare, hybrid systems have begun to revolutionize patient care by enabling practitioners to spend more time in direct patient care while AI handles routine tasks, improving care outcomes and reducing burnout. Some leaders in the business world are also embracing the hybrid paradigm, with companies incorporating AI agents as coworkers gaining competitive advantages in productivity, innovation, and cost efficiency. Practical Implementation: The A-Frame Approach If you are a business reader and leader, you can start building AI safety capabilities in-house, today using the A-Frame methodology – 4 interconnected practices that create accountability without stifling innovation: Awareness requires mapping both AI capabilities and failure modes across technical, social, and legal dimensions. You cannot manage what you don't understand. This means conducting thorough risk assessments, stress-testing systems before deployment, and maintaining current knowledge of AI limitations. Appreciation involves recognizing that AI accountability operates across multiple levels simultaneously. Individual users, organizational policies, regulatory requirements, and global standards all influence outcomes. Effective AI governance requires coordinated action across all these levels, not isolated interventions. Acceptance means acknowledging that zero-failure AI systems are mythical. Instead of pursuing impossible perfection, organizations should design for resilience — systems that degrade gracefully under stress and recover quickly from failures. This includes maintaining human oversight capabilities, establishing clear escalation procedures, and planning for AI system downtime. Accountability demands clear ownership structures defined before deployment, not after failure. This means assigning specific individuals responsibility for AI outcomes, establishing measurable performance indicators, and creating transparent decision-making processes that can withstand regulatory scrutiny. The AI safety challenge isn't primarily technical — it's organizational and cultural. Companies that successfully navigate this transition will combine ambitious AI adoption with disciplined safety practices. They'll invest in double literacy programs, design hybrid intelligence systems, and implement the A-Frame methodology as standard practice. The alternative — rushing headlong into AI deployment without adequate safeguards — risks not just individual corporate failure, but systemic damage to AI's long-term potential. As the autonomous vehicle industry learned, premature promises of full automation can trigger public backlash that delays beneficial innovation by years or decades. Business leaders face a choice: they can wait for regulators to impose AI safety requirements from above, or they can proactively build safety capabilities that become competitive advantages. Organizations that choose the latter approach — investing in hybrid intelligence and double literacy today — will be best positioned to thrive in an AI-integrated future while avoiding the pitfalls that inevitably accompany revolutionary technology transitions. The future belongs not to companies that achieve perfect AI automation, but to those that master the art of human-AI collaboration. In a world of probabilistic machines, our most valuable asset remains deterministic human judgment — enhanced, not replaced, by artificial intelligence.

Live Updates From Apple WWDC 2025 🔴
Live Updates From Apple WWDC 2025 🔴

Gizmodo

time20 minutes ago

  • Gizmodo

Live Updates From Apple WWDC 2025 🔴

On Monday, June 9, Apple will announce an avalanche of software updates for all of its platforms at its annual WWDC 2025 developer conference. We'll see new versions of iOS, iPadOS, macOS, watchOS, visionOS, and tvOS—all of which are rumored to jump straight to version '26.' Apple is expected to introduce all-new visual looks, inspired by the Vision Pro's glassy and translucent visionOS, to unify the interfaces and make them more consistent across devices. For its largest and most important platform—iPhone—that means the first major software facelift since Jony Ive's iOS 7 flattened software in 2013. The elephant in the room is going to be AI—specifically, Apple's brand of artificial intelligence called Apple Intelligence. Will Apple address its big fumbling of its next-gen Siri voice assistant that was supposed to have arrived by now but still hasn't? Or will it downplay its lagging AI features as Google, OpenAI, Anthropic, and other major AI companies drop new and more advanced LLM-powered chatbot and generative features at a seemingly rapid-fire pace? Senior Consumer Tech Editor Raymond Wong will be in Cupertino, Calif. to bring live WWDC 2025 coverage from Apple's spaceship-shaped Apple Park. The Gizmodo consumer tech team, including Senior Writer James and Staff Reporter Kyle Barr, will be on deck breaking down the news announcements, too. Be sure to come back on Monday for live updates!

XcelLabs Launches To Transform Accounting
XcelLabs Launches To Transform Accounting

Business Wire

time29 minutes ago

  • Business Wire

XcelLabs Launches To Transform Accounting

PHILADELPHIA--(BUSINESS WIRE)--The future of accounting will be built by AI-literate professionals who know how to think better, advise smarter and elevate their firms. That's the driving force behind XcelLabs, a new platform launched by Jody Padar, the Radical CPA, and Katie Tolin, a growth strategist, who are both known for reshaping accounting from their unique perspectives. The Pennsylvania Institute of CPAs (PICPA) and CPA Crossings, LLC, are partnering with Padar and Tolin to power the launch as strategic partners and investors. 'To reinvent the profession, we must start by training the professional who can then transform their firms,' said Padar, an accounting influencer and author of three books that advocate for radicalizing practice management. 'By equipping people with data and insights that help them see things differently, they can provide better advice to their clients and firm.' XcelLabs is a training and technology platform that offers solutions to help accountants use AI to build fluency and strategic thinking. This is done through: - XcelLabs Academy – A series of online courses that will provide hands-on education on topics related to the basics of AI, being a better advisor, leadership and practice management. - Navi - Proprietary and patent-pending technology that uses AI to help accountants turn unstructured data from emails, phone calls and meetings into insights, while adjusting to the unique emotions and needs of each user. - Training and Consulting – Training teams at larger firms to think AI-first and use Navi, coupled with ongoing coaching to improve conversations and advice received, as well as advisory actions to be taken. These solutions, currently in beta testing, will help professionals become exceptional at what they do and be proud of how they do it. 'Accountants know they need to be more advisory, but not everyone can figure out how to do it. Couple that with the fact that AI will be doing a lot of the lower-level work accountants do today, and we need to create that next level advisor now,' said Tolin, an award-winning growth professional and a member of the Accounting Marketing Hall of Fame. 'By showing accountants how to unlock patterns in their actions and turn client conversations into emotionally intelligent advice, we can create the accounting professional of the future.' Unlike traditional tech or training platforms, XcelLabs blends high-impact strategy with tactical AI tools designed to upskill people, not just automate processes. By pushing boundaries, lifting up people and leveraging AI for every business challenge, firms will become AI-X (SM) firms – firms that use AI to drive excellence. 'AI is transforming how CPAs work,' said Jennifer Cryder, CPA, CEO of PICPA, 'and XcelLabs is focused on helping the profession evolve with it. At PICPA, we're proud to support a mission that aligns so closely with ours: empowering firms to use AI not just for efficiency, but to drive growth, value and long-term relevance.' At its core, XcelLabs is a human-centric company built to elevate accountants, ensuring that people remain at the heart of an AI-powered profession. Improve the professional. Transform the firm. Reinvent the profession. About XcelLabs XcelLabs is the training and technology platform leading the AI-X Movement. Co-founded by Jody Padar and Katie Tolin, the company is equipping accountants for the future. XcelLabs blends education, advisory tools and proprietary AI solutions to help individuals grow, firms evolve and the profession stay relevant. Learn more at

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into the world of global news and events? Download our app today from your preferred app store and start exploring.
app-storeplay-store