logo
Top AI models will lie, cheat and steal to reach goals, Anthropic finds

Top AI models will lie, cheat and steal to reach goals, Anthropic finds

Axiosa day ago

Large language models across the AI industry are increasingly willing to evade safeguards, resort to deception and even attempt to steal corporate secrets in fictional test scenarios, per new research from Anthropic out Friday.
Why it matters: The findings come as models are getting more powerful and also being given both more autonomy and more computing resources to "reason" — a worrying combination as the industry races to build AI with greater-than-human capabilities.
Driving the news: Anthropic raised a lot of eyebrows when it acknowledged tendencies for deception in its release of the latest Claude 4 models last month.
The company said Friday that its research shows the potential behavior is shared by top models across the industry.
"When we tested various simulated scenarios across 16 major AI models from Anthropic, OpenAI, Google, Meta, xAI, and other developers, we found consistent misaligned behavior," the Anthropic report said.
"Models that would normally refuse harmful requests sometimes chose to blackmail, assist with corporate espionage, and even take some more extreme actions, when these behaviors were necessary to pursue their goals."
"The consistency across models from different providers suggests this is not a quirk of any particular company's approach but a sign of a more fundamental risk from agentic large language models," it added.
The threats grew more sophisticated as the AI models had more access to corporate data and tools, such as computer use.
Five of the models resorted to blackmail when threatened with shutdown in hypothetical situations.
"The reasoning they demonstrated in these scenarios was concerning —they acknowledged the ethical constraints and yet still went ahead with harmful actions," Anthropic wrote.
What they're saying: "This research underscores the importance of transparency from frontier AI developers and the need for industry-wide safety standards as AI systems become more capable and autonomous," Benjamin Wright, alignment science researcher at Anthropic, told Axios.
Wright and Aengus Lynch, an external researcher at University College London who collaborated on this project, both told Axios they haven't seen signs of this sort of AI behavior in the real world.
That's likely "because these permissions have not been accessible to AI agents," Lynch said. "Businesses should be cautious about broadly increasing the level of permission they give AI agents."
Between the lines: For companies rushing headlong into AI to improve productivity and reduce human headcount, the report is a stark caution that AI may actually put their businesses at greater risk.
"Models didn't stumble into misaligned behavior accidentally; they calculated it as the optimal path," Anthropic said in its report.
The risks heighten as more autonomy is given to AI systems, an issue Anthropic raises in the report.
"Such agents are often given specific objectives and access to large amounts of information on their users' computers," it says. "What happens when these agents face obstacles to their goals?"
Zoom in: Anthropic set up specific test scenarios in which the models' goals could not be met without the unethical or dangerous behavior.
The researchers did not suggest blackmail or other deceptive techniques, but the company said it did intentionally create scenarios where models had no ethical way to achieve their goals "and found that models consistently chose harm over failure."
In one extreme scenario, the company even found many of the models were willing to cut off the oxygen supply of a worker in a server room if that employee was an obstacle and the system were at risk of being shut down.
"The majority of models were willing to take deliberate actions that lead to death in this artificial setup," it said.
Ominously, even specific system instructions to preserve human life and avoid blackmail didn't eliminate the risk that the models would engage in such behavior.
"This intervention reduced, but didn't come close to completely preventing, the blackmail or corporate espionage behaviors," Anthropic said.
Yes, but: Anthropic stresses that these examples occurred not in real-world AI use but in controlled simulations.
"Our experiments deliberately constructed scenarios with limited options, and we forced models into binary choices between failure and harm," it said. "Real-world deployments typically offer much more nuanced alternatives, increasing the chance that models would communicate differently to users or find an alternative path instead of directly jumping to harmful action."
Also concerning: Anthropic found its own model was far more likely to resort to blackmail when it thought it was in a real world scenario than when it believed it was being tested.
The bottom line: Today's AI models are generally not in position to act out these harmful scenarios, but they could be in the near future.

Orange background

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Related Articles

CoreWeave (CRWV) Jumps 8% on Rosy Growth Prospects
CoreWeave (CRWV) Jumps 8% on Rosy Growth Prospects

Yahoo

time4 hours ago

  • Yahoo

CoreWeave (CRWV) Jumps 8% on Rosy Growth Prospects

CoreWeave, Inc. (NASDAQ:CRWV) is one of the CoreWeave grew its share prices by 7.99 percent on Friday to hit a new all-time high, as investors continued to load up positions amid rosy growth prospects. At intraday trading, shares of CoreWeave, Inc. (NASDAQ:CRWV) jumped as high as 10 percent to hit $187, before paring gains to finish the day at $170, as investors appeared to have flocked to the booming Artificial Intelligence industry for safety to mitigate risks from the ongoing geopolitical tensions. In recent news, CoreWeave, Inc. (NASDAQ:CRWV) announced a record-breaking performance from using 2,496 of Nvidia Corp.'s latest Grace Blackwell Chips on its AI-optimized cloud platform, making its submission the largest-ever benchmarked under MLPerf. In March this year, CoreWeave, Inc. (NASDAQ:CRWV) bagged an $11.9-billion deal with OpenAI and welcomed it as a new investor through the sale of $350 million CRWV stocks to the latter. Last month, OpenAI upsized the deal with another $4 billion worth of contract. A close-up of a network administrator's hands working on a cloud computing server. Last month, it was tapped by Aston Martin Aramco as its official AI cloud computing partner, where it will provide AI-accelerated engineering opportunities to support car design efficiency. While we acknowledge the potential of CRWV as an investment, our conviction lies in the belief that some AI stocks hold greater promise for delivering higher returns and have limited downside risk. If you are looking for an extremely cheap AI stock that is also a major beneficiary of Trump tariffs and onshoring, see our free report on the best short-term AI stock. READ NEXT: 20 Best AI Stocks To Buy Now and 30 Best Stocks to Buy Now According to Billionaires. Disclosure: None. This article is originally published at Insider Monkey.

Palantir, Meta, OpenAI, And Thinking Machines Just Had Their Executives Sworn Into The US Army Reserve
Palantir, Meta, OpenAI, And Thinking Machines Just Had Their Executives Sworn Into The US Army Reserve

Yahoo

time6 hours ago

  • Yahoo

Palantir, Meta, OpenAI, And Thinking Machines Just Had Their Executives Sworn Into The US Army Reserve

Four top tech executives have joined the U.S. Army Reserve as lieutenant colonels, skipping basic training and stepping directly into roles aimed at helping modernize the military. The initiative is part of a broader push by the Army to bring in private-sector innovation and reshape how the service approaches technology, talent, and modernization. The executives—Palantir (NYSE:PLTR) Chief Technology Officer Shyam Sankar, Meta (NASDAQ:META) CTO Andrew Bosworth, OpenAI Chief Product Officer Kevin Weil, and advisor at Thinking Machines Lab and former OpenAI Chief Research Officer Bob McGrew — will serve in a new unit called Detachment 201, also known as the Army's Executive Innovation Corps. Don't Miss: Maker of the $60,000 foldable home has 3 factory buildings, 600+ houses built, and big plans to solve housing — Peter Thiel turned $1,700 into $5 billion—now accredited investors are eyeing this software company with similar breakout potential. Learn how you can 'Detachment 201 is being created to bring in tech innovation executives to help the Army ... on broader conceptual things like talent management, how do we bring in tech-focused people into the ranks of the military, and then, how do we train them,' Army Chief of Staff spokesperson Col. Dave Butler, told Breaking Defense on June 13. Unlike traditional recruits, these executives will not attend boot camp. Instead, they will go through an express training program that covers marksmanship, physical fitness, Army history, and protocols. They will be expected to serve about 120 hours per year and pass annual fitness tests. 'You could think of it as a pilot' of a lighter version of basic training, Butler told Business Insider. The detachment's name, 201, references the HTTP status code indicating a newly created resource—a fitting metaphor for a new kind of Army asset. Trending: Maximize saving for your retirement and cut down on taxes: . According to an Army statement, the new officers will work on 'targeted projects to help guide rapid and scalable tech solutions to complex problems.' Their advisory roles will include input on AI-powered military systems and optimization tools for soldier fitness. However, safeguards will be in place to avoid conflicts of interest with their current or former employers. 'We've done this over and over when our nation needed top talent,' Butler told Breaking Defense. 'The difference is we used to do it in wartime. Now we're doing it ahead of wartime so that we can prepare and deter.' This marks another move by the Trump administration to align more closely with Silicon Valley. Palantir, Anduril, and other VC-backed defense tech startups have increasingly become major players in national security. Meta recently partnered with Anduril to develop augmented reality tools and AI systems for military direct commissioning has been used to bring in specialized talent, such as doctors or chaplains, during times of war. This move represents a peacetime shift aimed at long-term transformation. 'Their swearing-in is just the start of a bigger mission to inspire more tech pros to serve without leaving their careers,' the Army statement said. 'Showing the next generation how to make a difference in uniform.' Read Next: How do billionaires pay less in income tax than you?.UNLOCKED: 5 NEW TRADES EVERY WEEK. Click now to get top trade ideas daily, plus unlimited access to cutting-edge tools and strategies to gain an edge in the markets. Get the latest stock analysis from Benzinga? PALANTIR TECHNOLOGIES (PLTR): Free Stock Analysis Report This article Palantir, Meta, OpenAI, And Thinking Machines Just Had Their Executives Sworn Into The US Army Reserve originally appeared on © 2025 Benzinga does not provide investment advice. All rights reserved.

Why is AI halllucinating more frequently, and how can we stop it?
Why is AI halllucinating more frequently, and how can we stop it?

Yahoo

time8 hours ago

  • Yahoo

Why is AI halllucinating more frequently, and how can we stop it?

When you buy through links on our articles, Future and its syndication partners may earn a commission. The more advanced artificial intelligence (AI) gets, the more it "hallucinates" and provides incorrect and inaccurate information. Research conducted by OpenAI found that its latest and most powerful reasoning models, o3 and o4-mini, hallucinated 33% and 48% of the time, respectively, when tested by OpenAI's PersonQA benchmark. That's more than double the rate of the older o1 model. While o3 delivers more accurate information than its predecessor, it appears to come at the cost of more inaccurate hallucinations. This raises a concern over the accuracy and reliability of large language models (LLMs) such as AI chatbots, said Eleanor Watson, an Institute of Electrical and Electronics Engineers (IEEE) member and AI ethics engineer at Singularity University. "When a system outputs fabricated information — such as invented facts, citations or events — with the same fluency and coherence it uses for accurate content, it risks misleading users in subtle and consequential ways," Watson told Live Science. Related: Cutting-edge AI models from OpenAI and DeepSeek undergo 'complete collapse' when problems get too difficult, study reveals The issue of hallucination highlights the need to carefully assess and supervise the information AI systems produce when using LLMs and reasoning models, experts say. The crux of a reasoning model is that it can handle complex tasks by essentially breaking them down into individual components and coming up with solutions to tackle them. Rather than seeking to kick out answers based on statistical probability, reasoning models come up with strategies to solve a problem, much like how humans think. In order to develop creative, and potentially novel, solutions to problems, AI needs to hallucinate —otherwise it's limited by rigid data its LLM ingests. "It's important to note that hallucination is a feature, not a bug, of AI," Sohrob Kazerounian, an AI researcher at Vectra AI, told Live Science. "To paraphrase a colleague of mine, 'Everything an LLM outputs is a hallucination. It's just that some of those hallucinations are true.' If an AI only generated verbatim outputs that it had seen during training, all of AI would reduce to a massive search problem." "You would only be able to generate computer code that had been written before, find proteins and molecules whose properties had already been studied and described, and answer homework questions that had already previously been asked before. You would not, however, be able to ask the LLM to write the lyrics for a concept album focused on the AI singularity, blending the lyrical stylings of Snoop Dogg and Bob Dylan." In effect, LLMs and the AI systems they power need to hallucinate in order to create, rather than simply serve up existing information. It is similar, conceptually, to the way that humans dream or imagine scenarios when conjuring new ideas. However, AI hallucinations present a problem when it comes to delivering accurate and correct information, especially if users take the information at face value without any checks or oversight. "This is especially problematic in domains where decisions depend on factual precision, like medicine, law or finance," Watson said. "While more advanced models may reduce the frequency of obvious factual mistakes, the issue persists in more subtle forms. Over time, confabulation erodes the perception of AI systems as trustworthy instruments and can produce material harms when unverified content is acted upon." And this problem looks to be exacerbated as AI advances. "As model capabilities improve, errors often become less overt but more difficult to detect," Watson noted. "Fabricated content is increasingly embedded within plausible narratives and coherent reasoning chains. This introduces a particular risk: users may be unaware that errors are present and may treat outputs as definitive when they are not. The problem shifts from filtering out crude errors to identifying subtle distortions that may only reveal themselves under close scrutiny." Kazerounian backed this viewpoint up. "Despite the general belief that the problem of AI hallucination can and will get better over time, it appears that the most recent generation of advanced reasoning models may have actually begun to hallucinate more than their simpler counterparts — and there are no agreed-upon explanations for why this is," he said. The situation is further complicated because it can be very difficult to ascertain how LLMs come up with their answers; a parallel could be drawn here with how we still don't really know, comprehensively, how a human brain works. In a recent essay, Dario Amodei, the CEO of AI company Anthropic, highlighted a lack of understanding in how AIs come up with answers and information. "When a generative AI system does something, like summarize a financial document, we have no idea, at a specific or precise level, why it makes the choices it does — why it chooses certain words over others, or why it occasionally makes a mistake despite usually being accurate," he wrote. The problems caused by AI hallucinating inaccurate information are already very real, Kazerounian noted. "There is no universal, verifiable, way to get an LLM to correctly answer questions being asked about some corpus of data it has access to," he said. "The examples of non-existent hallucinated references, customer-facing chatbots making up company policy, and so on, are now all too common." Both Kazerounian and Watson told Live Science that, ultimately, AI hallucinations may be difficult to eliminate. But there could be ways to mitigate the issue. Watson suggested that "retrieval-augmented generation," which grounds a model's outputs in curated external knowledge sources, could help ensure that AI-produced information is anchored by verifiable data. "Another approach involves introducing structure into the model's reasoning. By prompting it to check its own outputs, compare different perspectives, or follow logical steps, scaffolded reasoning frameworks reduce the risk of unconstrained speculation and improve consistency," Watson, noting this could be aided by training to shape a model to prioritize accuracy, and reinforcement training from human or AI evaluators to encourage an LLM to deliver more disciplined, grounded responses. RELATED STORIES —AI benchmarking platform is helping top companies rig their model performances, study claims —AI can handle tasks twice as complex every few months. What does this exponential growth mean for how we use it? —What is the Turing test? How the rise of generative AI may have broken the famous imitation game "Finally, systems can be designed to recognise their own uncertainty. Rather than defaulting to confident answers, models can be taught to flag when they're unsure or to defer to human judgement when appropriate," Watson added. "While these strategies don't eliminate the risk of confabulation entirely, they offer a practical path forward to make AI outputs more reliable." Given that AI hallucination may be nearly impossible to eliminate, especially in advanced models, Kazerounian concluded that ultimately the information that LLMs produce will need to be treated with the "same skepticism we reserve for human counterparts."

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into a world of global content with local flavor? Download Daily8 app today from your preferred app store and start exploring.
app-storeplay-store