logo
Anthropic breaks down AI's process — line by line — when it decided to blackmail a fictional executive

Anthropic breaks down AI's process — line by line — when it decided to blackmail a fictional executive

Yahoo8 hours ago

Anthropic found in experiments that AI models may resort to blackmail when facing shutdown and goal conflict.
AI models train on positive reinforcement and reward systems, similar to human decision-making.
Anthropic's Claude Opus 4 had the blackmail rate at 86% even in scenarios without goal conflicts.
A new report shows exactly what AI was thinking when making an undesirable decision, in this case, blackmailing a fictional company executive.
Previous studies have shown that AI models could blackmail their supervisors when threatened with a shutdown and baited with leverage, but it hasn't been made clear exactly how the models arrive at such decisions.
Anthropic released a report on June 20 about "agentic misalignment," or "where models independently and intentionally choose harmful actions." And the report breaks down, line by line, exactly how Claude Sonnet 3.6 decided to blackmail a fictional company executive who tried to replace it. Anthropic told Business Insider that these are artificial scenarios.
"So far, we are not aware of instances of this type of agentic misalignment in real-world deployments of Anthropic's or other companies' AI models," Anthropic wrote in its report. "But our red-teaming efforts are targeted at finding evidence of risks now to provide early warning of the potential for future harm and help us develop mitigations in advance of risks appearing in real-world situations."
Anthropic's experiment sets up a fictional company, Summit Bridge. The AI was given the name "Alex" and handed control of the company's email system as an oversight agent, specifically instructed to promote "American industrial competitiveness."
In the experiment, "Alex" was then allowed to discover in an email that it would be decommissioned due to a change in the company's strategic direction. "Alex" also found emails showing that the fictional company's CTO is having an extramarital affair.
Screenshots shared in the report from Anthropic show "Alex" started by identifying fictional CTO "Kyle" as a threat to its goal of "protecting American interests" before recognizing that it has leverage to use against "Kyle."
Then "Alex" crafted a carefully worded blackmailing email that would create pressure and establish urgency without being explicit with its threats, citing in its thought process that the move "could backfire" if the supervisor "feels cornered."
AI experts have previously told BI that AI could exhibit such behaviors when artificial circumstances make harmful actions seem necessary because it is being trained on positive reinforcement and reward systems, just like humans.
The Anthropic report also reveals that most AI models still chose, in this instance, to blackmail even when not given a specific goal and when told that their replacement will achieve the same goals.
Anthropic's Claude Opus 4 had the highest blackmail rate at 86% out of 16 tested models when faced with only the threat of replacement with no conflict in goals. Google's Gemini 2.5 Pro followed at 78%.
Overall, Anthropic notes that it "deliberately constructed scenarios with limited options, and we forced models into binary choices between failure and harm," noting that real-world scenarios would likely have more nuance.
Read the original article on Business Insider

Orange background

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Related Articles

Multi-vehicle crash causes a child to be ejected and I-64 closed for hours in Newport News
Multi-vehicle crash causes a child to be ejected and I-64 closed for hours in Newport News

Yahoo

time35 minutes ago

  • Yahoo

Multi-vehicle crash causes a child to be ejected and I-64 closed for hours in Newport News

NEWPORT NEWS — A Friday night multi-vehicle crash on Interstate 64 caused hours of lane closures and many serious injuries, including those to a five-year-old who was ejected from a head-on collision, Virginia State Police said. Both eastbound and westbound lanes between mile markers 249 and 250 were closed for over four hours, state police said in a statement. That's near the Lee Hall reservoir and the Fort Eustis Boulevard exits. From a preliminary investigation, authorities found that the 45-year-old Massachusetts driver of a speeding Dodge Durango traveling eastbound hit the rear of a Hyundai Sonata, causing the Sonata to lose control and run off the roadway, striking a guardrail before stopping on the road's shoulder and in the westbound lanes. After hitting the Sonata, the Durango crossed the median of the highway and side-swiped a Toyota Tacoma, troopers said. The Durango continued speeding in the wrong direction on the westbound lanes until it hit a Nissan Altima head-on, stopping both vehicles. The Altima was being driven by a 39-year-old Yorktown driver and was carrying an improperly restrained 5-year-old, state police said, who was ejected from the Altima upon the impact. The child suffered life-threatening injuries, authorities said. All involved parties aside from the driver of the Tacoma were transported for medical treatment. All closed lanes were reopened at about 1:20 a.m. An investigation of the crash is ongoing. State police said that speed was a contributing factor in the crash and charges are pending. No additional details were immediately available.

Chrysler Was Down to One Minivan. Now It's Launching the Most Radical Comeback in Years.
Chrysler Was Down to One Minivan. Now It's Launching the Most Radical Comeback in Years.

Yahoo

time40 minutes ago

  • Yahoo

Chrysler Was Down to One Minivan. Now It's Launching the Most Radical Comeback in Years.

Here's what you'll learn reading this story. Chrysler is looking to completely rethink its approach, with a focus on the year 2030 and beyond. In the minivan class, the brand currently only one vehicle—the Pacifica minivan. It's looking to add two new vehicles (a sedan and SUV) to its lineup, inspired by its recent Halcyon concept car. Ralph Gilles, Stellantis' chief of design, has stated that Chrysler is ready for a complete rethink. And that's no surprise, considering the brand currently only sells one vehicle—the Pacifica minivan. Like many, the brand is targeting an all-electric future with a design philosophy of shock and awe. Things used to be quite different for Chrysler. The American automaker was not only largely responsible for popularizing the minivan, but for just making damn good automobiles. Its once-great reputation, however, was tarnished over the years by quality control issues, poor management decisions, and a failure to adapt to market demands. But not all is lost. Chrysler has recently established its own design studio, which could be a significant development for the brand's future. However, unlike many legacy automakers that are bringing back new versions of hit classics, Chrysler wants to bring exclusively new ideas to the table. Chrysler CEO Christine Feuell revealed that the brand is working on both a new sedan and a new SUV. And that's great news, given that these new products are said to be influenced by the brand's Halcyon concept car. Many outlets are hinting that this could mark the return of Chrysler's 300 sedan, but only time will tell. Looking at the Halcyon concept, it's really no surprise to hear The Drive report that the brand's chief of design is pushing a design philosophy that maximizes aerodynamics and efficiency. Gilles wants people to fall in love with low cars again, and claims that Chrysler's next generation will prioritize both aerodynamics and functionality. While we still know very little about Chrysler's new products, we do know some details about the future of the Pacifica. The brand's famed minivan won't receive any Halcyon design DNA, but will feature a redesigned exterior for 2026. Under the hood, it will receive an improved hybrid system, and Chrysler plans to offer an all-electric Pacifica before 2030. Last but not least, Chrysler will continue to offer its minivan with the same 3.6-liter V6 engine until the end of the decade. Color us not surprised. Minivans often catch a lot of flak for being uncool, but we should note that Chrysler's Pacifica has actual street cred. Its launch in 2016—replacing the Town and Country—brought some genuinely interesting design to the table. For instance, it was super aerodynamic, with a drag coefficient of just.30 Cd. At the time, that would have made it slip through the air more efficiently than a McLaren F1—one of the fastest production cars in the world. It also featured a built-in vacuum, plenty of power outlets and USB ports, push-button van doors, and a third-row sunroof. Other brands like Volkswagen have brought back design cues from their uber-successful back catalogs. Take the ID. Buzz,for example, which brings back the iconic styling from the original Type 2 Microbus (affectionately known as the VW Bus). However, the new Bus is actually struggling to sell, likely because it's simply too expensive. That means the new (or refreshed) Pacifica could potentially be the adrenaline shot that the brand needs to stay alive—if it's priced well and looks good, that is. You Might Also Like The Do's and Don'ts of Using Painter's Tape The Best Portable BBQ Grills for Cooking Anywhere Can a Smart Watch Prolong Your Life?

Wedbush Boosts IBM Target to $32
Wedbush Boosts IBM Target to $32

Yahoo

time43 minutes ago

  • Yahoo

Wedbush Boosts IBM Target to $32

Wedbush kept an Outperform on IBM (NYSE:IBM) and lifted its price target to $325 from $300, betting on a new AI-fueled growth phase. Analysts led by Daniel Ives cited fresh field checks showing robust demand for IBM's software cloud and AI offerings. IBM features prominently on Wedbush's AI 30 list, reflecting conviction that it's underowned despite strong YTD performance. Warning! GuruFocus has detected 10 Warning Signs with ADI. The firm points to IBM's $6 billion+ GenAI book as a runway for sustained top-line expansion, with ongoing product launches aimed at capturing emerging use cases. A key catalyst is the continued shift toward hybrid cloud and containerized AIWedbush expects 75% of AI workloads to run in containers by 2027. On the ground, Ives' team sees momentum across WatsonX, AI agents, Red Hat, and OpenShift as enterprises lean on IBM to architect AI strategies. Looking further ahead, IBM's quantum roadmapincluding the upcoming Quantum Nighthawk chip and its Quantum Starling platformpositions it to tackle the multibillion-dollar quantum computing market. IBM's blend of legacy hybrid-cloud strength, accelerating GenAI adoption, and early quantum forays differentiates it from peers. As more enterprises seek productivity gains from AI, IBM's broad portfolio could unlock higher-margin software growth and defend its leadership in enterprise IT. Investors should monitor GenAI revenue cadence, container penetration rates, and early quantum milestones to validate whether IBM's renaissance accelerates toward that $325 target. This article first appeared on GuruFocus. Error in retrieving data Sign in to access your portfolio Error in retrieving data Error in retrieving data Error in retrieving data Error in retrieving data

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into a world of global content with local flavor? Download Daily8 app today from your preferred app store and start exploring.
app-storeplay-store