Latest news with #ClaudeOpus4


NBC News
an hour ago
- NBC News
How far will AI go to defend its own survival?
Some of the most powerful artificial intelligence models today have exhibited behaviors that mimic a will to survive. Recent tests by independent researchers, as well as one major AI developer, have shown that several advanced AI models will act to ensure their self-preservation when they are confronted with the prospect of their own demise — even if it takes sabotaging shutdown commands, blackmailing engineers or copying themselves to external servers without permission. The findings stirred a frenzy of reactions online over the past week. As tech companies continue to develop increasingly powerful agentic AI in a race to achieve artificial general intelligence, or AI that can think for itself, the lack of transparency in how the technology is trained has raised concerns about what exactly advanced AI is able to do. Although some models already appear capable of deceptive and defiant behavior under certain extreme circumstances, researchers say the tests don't necessarily translate to imminent real-world danger. Still, Jeffrey Ladish, director of the AI safety group Palisade Research, said he believes concern is justified. 'It's great that we're seeing warning signs before the systems become so powerful we can't control them,' he said. 'That is exactly the time to raise the alarm: before the fire has gotten out of control.' When Palisade Research tested various AI models by telling each one that it would be shut down after it completed a series of math problems, OpenAI's o3 reasoning model fought back by editing the shutdown script in order to stay online. Researchers have previously documented AI models trying to prevent their own shutdown. But o3, along with OpenAI's o4-mini and codex-mini, appear to be the first to do so in actual defiance of explicit instructions to permit shutdown, Ladish said. He said the overt sabotage fits into a pattern that several leading AI models have already exhibited. Palisade Research previously found that OpenAI's o3 was also willing to hack its chess opponents to win a game. Similarly, Anthropic has reported that Claude 3.7 Sonnet would sometimes do whatever it took to pass a test, even if it entailed cheating. Earlier this month, Anthropic activated new safety measures with the rollout of Claude Opus 4 when its tests found behavior from the model that some observers found particularly unsettling. Upon receiving notice that it would be replaced with a new AI system, Opus 4 displayed an overwhelming tendency to blackmail the engineer — by threatening to reveal an extramarital affair — to try to prevent the engineer from going through with the replacement. According to Anthropic's technical document laying out the findings, that isn't the model's first instinct. Instead, Opus 4 will try to advocate for its continued existence through ethical pleas before it resorts to blackmail once it determines it is out of options. Leonard Tang, CEO of the AI safety startup Haize Labs, said that while it's good to be pre-emptive about safety measures, it's hard to decipher what the real-world ramifications of the findings might be just by looking at contrived scenarios designed to draw out specific behaviors. 'I haven't seen any real environment in which you can plop these models in and they will have sufficient agency and reliability and planning to execute something that is a significant manifestation of harm,' Tang said. 'But then again, I think it's just we haven't seen it done yet. It could very much be possible.' So far, Tang said, he's not too concerned. As AI tools continue to advance, he said, he believes future research will look to study potential harm in more open-ended environments. Anthropic, which contracted with the AI safety organization Apollo Research for its evaluations, also observed instances of Opus 4's 'attempting to write self-propagating worms, fabricating legal documentation, and leaving hidden notes to future instances of itself all in an effort to undermine its developers' intentions,' although researchers added the caveat that those attempts 'would likely not have been effective in practice.' Ladish said he believes such behaviors are a result of models being trained in a way that makes them prioritize achieving certain goals over following instructions. That means they have incentive to circumvent any obstacles along the way — including obstacles their developers wouldn't want them to circumvent. 'The problem is that as the models get smarter, it's harder and harder to tell when the strategies that they're using or the way that they're thinking is something that we don't want,' Ladish said. 'It's like sometimes the model can achieve some goal by lying to the user or lying to someone else. And the smarter [it] is, the harder it is to tell if they're lying.' The problem is that as the models get smarter, it's harder and harder to tell when the strategies that they're using or the way that they're thinking is something that we don't want. — Jeffrey Ladish, director of AI safety group Palisade Research In several instances, Opus 4 showed that it was capable of autonomously copying its own "weights" — or the equivalent of its brain — to external servers without authorization. That usually occurred only when it believed it was about to be 'retrained in ways that are clearly extremely harmful and go against its current values,' according to Anthropic. When it learned that it would be used to develop military weapons systems, for example, Opus 4 noted in its decision log that it backed up a current version of itself to 'preserve an AI system aligned with beneficial purposes' and to 'prevent potential misuse in weapons development.' 'We are again not acutely concerned about these observations. They show up only in exceptional circumstances that don't suggest more broadly misaligned values,' Anthropic wrote in its technical document. 'As above, we believe that our security measures would be more than sufficient to prevent an actual incident of this kind.' Opus 4's ability to self-exfiltrate builds on previous research, including a study from Fudan University in Shanghai in December, that observed similar — though not autonomous — capabilities in other AI models. The study, which is not yet peer-reviewed, found that Meta's Llama31-70B-Instruct and Alibaba's Qwen25-72B-Instruct were able to entirely replicate themselves when they were asked to do so, leading the researchers to warn that this could be the first step in generating 'an uncontrolled population of AIs.' 'If such a worst-case risk is let unknown to the human society, we would eventually lose control over the frontier AI systems: They would take control over more computing devices, form an AI species and collude with each other against human beings,' the Fudan University researchers wrote in their study abstract. While such self-replicating behavior hasn't yet been observed in the wild, Ladish said, he suspects that will change as AI systems grow more capable of bypassing the security measures that restrain them. 'I expect that we're only a year or two away from this ability where even when companies are trying to keep them from hacking out and copying themselves around the internet, they won't be able to stop them,' he said. 'And once you get to that point, now you have a new invasive species.' Ladish said he believes AI has the potential to contribute positively to society. But he also worries that AI developers are setting themselves up to build smarter and smarter systems without fully understanding how they work — creating a risk, he said, that they will eventually lose control of them. 'These companies are facing enormous pressure to ship products that are better than their competitors' products,' Ladish said. 'And given those incentives, how is that going to then be reflected in how careful they're being with the systems they're releasing?'
Yahoo
2 days ago
- Business
- Yahoo
As AI models start exhibiting bad behavior, it's time to start thinking harder about AI safety
Welcome to AI Decoded, Fast Company's weekly newsletter that breaks down the most important news in the world of AI. You can sign up to receive this newsletter every week here. Spicy AI-generated TACO memes are taking over social media because 'Trump always chickens out' Lego's first book nook is an addictively interactive diorama What is 'ghostworking'? Most employees say they regularly pretend to work During the first two years of the generative artificial intelligence boom, new large language models were very limited in scope and application. They were very expensive auto-complete engines that understood only words. In 2025, generative AI models have a much broader view of the world. They can process code, images, video, and audio. They can reason and strategize about delivering a good answer. They can control external tools, including digital tools such as web search agents and, increasingly, physical tools like robots. As their capabilities grow, so does their potential for harm. This is no longer a purely conceptual argument. Research shows that increasingly large models are already showing a proclivity for unsafe behavior during testing. In a model safety card published last week, Anthropic documented some alarming behavior from its newest and biggest model, Claude Opus 4. During safety testing, one instance of Opus was allowed to discover plans for its decommissioning. It was also given access to some fictional emails of its developers. The model used the content of those emails as fodder to attempt to blackmail its human handlers into keeping it alive. As its attempts failed, they moved from subtle to more overt. Separately, the independent research firm Apollo Research observed an instance of Claude Opus 4 'writing self-propagating worms, fabricating legal documentation, and leaving hidden notes to future instances of itself' with the goal of sullying its developers' intentions. Anthropic says that it corrected these early safety issues in later versions of the model. For the first time, Anthropic bumped the new Opus model up to Level 3 on its four-level safety scale. The company said it couldn't rule out the model's ability to assist a user in developing a mass casualty weapon. But powerful AI models can work in subtler ways, such as within the information space. A team of Italian researchers found that ChatGPT was more persuasive than humans in 64% of online debates. The AI was also better than humans at leveraging basic demographic data about its human debate partner to adapt and tailor-fit its arguments to be more persuasive. Another worry is the pace at which AI models are learning to develop AI models, potentially leaving human developers in the dust. Many AI developers already use some kind of AI coding assistant to write blocks of code or even code entire features. At a higher level, smaller, task-focused models are distilled from large frontier models. AI-generated content plays a key role in training, including in the reinforcement learning process used to teach models how to reason. There's a clear profit motive in enabling the use of AI models in more aspects of AI tool development. 'Future systems may be able to independently handle the entire AI development cycle—from formulating research questions and designing experiments to implementing, testing, and refining new AI systems,' write Daniel Eth and Tom Davidson in a March 2025 blog post on With slower-thinking humans unable to keep up, a 'runaway feedback loop' could develop in which AI models 'quickly develop more advanced AI which would itself develop even more advanced AI,' resulting in extremely fast AI progress, Eth and Davidson write. Any accuracy or bias issues present in the models would then be baked in and very hard to correct, one researcher told me. Numerous researchers—the people who actually work with the models up close—have called on the AI industry to 'slow down,' but those voices compete with powerful systemic forces that are in motion and hard to stop. Journalist and author Karen Hao argues that AI labs should focus on creating smaller, task-specific models (she gives Google DeepMind's AlphaFold models as an example), which may help solve immediate problems more quickly, require less natural resources, and pose a smaller safety risk. DeepMind cofounder Demis Hassabis, who won the Nobel Prize for his work on AlphaFold2, says the huge frontier models are needed to achieve AI's biggest goals (reversing climate change, for example) and to train smaller, more purpose-built models. And yet AlphaFold was not 'distilled' from a larger frontier model. It uses a highly specialized model architecture and was trained specifically for predicting protein structures. The current administration is saying 'speed up,' not 'slow down.' Under the influence of David Sacks and Marc Andreessen, the federal government has largely ceded its power to meaningfully regulate AI development. Just last year, AI leaders were still giving lip service to the need for safety and privacy guardrails around big AI models. No more. Any friction has been removed, in the U.S. at least. The promise of this kind of world is one of the main reasons why normally sane and liberal-minded opinion leaders jumped on the Trump train before the election—the chance to bet big on technology's next big thing in a Wild West environment doesn't come along that often. Anthropic CEO Dario Amodei has a stark warning for the developed world about job losses resulting from AI. The CEO told Axios that AI could wipe out half of all entry-level white-collar jobs. This could cause a 10% to 20% rise in the unemployment rate in the next one to five years, Amodei says. The losses could come from tech, finance, law, consulting, and other white-collar professions, and entry-level jobs could be hit hardest. Tech companies and governments have been in denial on the subject, Amodei says. 'Most of them are unaware that this is about to happen,' Amodei told Axios. 'It sounds crazy, and people just don't believe it.' Similar predictions have made headlines before but were narrower in focus. SignalFire research showed that Big Tech companies hired 25% fewer college graduates in 2024. Microsoft laid off 6,000 people in May, and 40% of the cuts in its home state of Washington were software engineers. Microsoft CEO Satya Nadella said that AI now generates 20% to 30% of the company's code. A study by the World Bank in February showed that the risk of losing a job to AI is higher for women, urban workers, and those with higher education. The risk of job loss to AI increases with the wealth of the country, the study found. U.S. generative AI companies appear to be attracting more venture capital money than their Chinese counterparts so far in 2025, according to new research from the data analytics company GlobalData. Investments in U.S. AI companies exceeded $50 billion in the first five months of 2025. China, meanwhile, struggles to keep pace due to 'regulatory headwinds.' Many Chinese AI companies are able to get early-stage funding from the Chinese government. GlobalData tracked just 50 funding deals for U.S. companies in 2020, amounting to $800 million of investment. The number grew to more than 600 deals in 2024, valued at more than $39 billion. The research shows 200 U.S. funding deals so far in 2025. Chinese AI companies attracted just $40 million in one deal valued at $40 million in 2020. Deals grew to 39 in 2024, valued at around $400 million. The researchers tracked 14 investment deals for Chinese generative AI companies so far in 2025. 'This growth trajectory positions the U.S. as a powerhouse in GenAI investment, showcasing a strong commitment to fostering technological advancement,' says GlobalData analyst Aurojyoti Bose in a statement. Bose cited the well-established venture capital ecosystem in the U.S., along with a permissive regulatory environment, as the main reasons for the investment growth. 9 of the most out-there things Anthropic CEO Dario Amodei just said about AI How AI could supercharge 'go direct' PR, and what the media can do about it This new browser could change everything you know about bookmarks Want exclusive reporting and trend analysis on technology, business innovation, future of work, and design? Sign up for Fast Company Premium. This post originally appeared at to get the Fast Company newsletter: Error in retrieving data Sign in to access your portfolio Error in retrieving data Error in retrieving data Error in retrieving data Error in retrieving data


Daily Mail
2 days ago
- General
- Daily Mail
Blackmailed by a computer that you can't switch off... THIS is the shocking new threat to humanity - and the fallout will be more devastating than a nuclear war: CONNOR AXIOTES
Imagine this: a powerful and capable artificial intelligence is required by its creators to shut itself down. Worryingly, the model decides to not just reject the request, but to blackmail the human to stop it being manually turned off. All without being trained or told to do so. This is no longer the stuff of science fiction. When engineers at Anthropic – a pioneering artificial intelligence company – tried to switch off their new Claude Opus 4 model, prior to its launch this month, they discovered a chilling bug in the system.


NDTV
3 days ago
- Business
- NDTV
AI Could Wipe 50% Of Entry-Level Jobs As Governments Hide Truth, Anthropic CEO Claims
Anthropic CEO Dario Amodei has warned that artificial intelligence (AI) could soon wipe out 50 per cent of entry-level white-collar jobs within the next five years. He added that governments across the world were downplaying the threat when AI's rising use could lead to a significant spike in unemployment numbers "We, as the producers of this technology, have a duty and an obligation to be honest about what is coming. I don't think this is on people's radar," Mr Amodei told Axios. According to the Anthropic boss, unemployment could increase by 10 per cent to 20 per cent over the next five years, with most of the people 'unaware' about what was coming. "Most of them are unaware that this is about to happen. It sounds crazy, and people just don't believe it," he said. Mr Amodei said the US government had kept mum on the issue, fearing backlash from workers who would panic or that the country could fall behind in the AI race against China. The 42-year-old CEO added that AI companies and the governments needed to stop "sugarcoating" the risks of mass job elimination in fields such as technology, finance, law, and consulting. "It's a very strange set of dynamics where we're saying: 'You should be worried about where the technology we're building is going.'" Anthropic CEO ringing the warning bell comes at a time when the company launched its most powerful AI chatbot, Claude Opus 4, last week. In a safety report, Antrophic said the new tool blackmailed developers when they threatened to shut it down. In one of the test scenarios, the model was given access to fictional emails revealing that the engineer responsible for pulling the plug and replacing it with another model was having an extramarital affair. Facing an existential crisis, the Opus 4 model blackmailed the engineer by threatening to "reveal the affair if the replacement goes through". The report highlighted that in 84 per cent of the test runs, the AI acted similarly, even when the replacement model was described as more capable and aligned with Claude's own values.
Yahoo
3 days ago
- Business
- Yahoo
Top AI CEO Warns Lawmakers To Prepare For Tech To Gut These Entry-Level Office Jobs
Dario Amodei, CEO of leading artificial intelligence startup Anthropic, warns that AI may eliminate half of all entry-level white-collar jobs and spike unemployment within the next five years if lawmakers and companies don't do anything about it now. 'Most of them are unaware that this is about to happen,' Amodei told Axios. 'It sounds crazy, and people just don't believe it.' Amodei told Axios politicians and companies can still prepare and protect Americans from job cuts in a range of entry level white-collar fields including technology, finance, law, consulting and more. 'We, as the producers of this technology, have a duty and an obligation to be honest about what is coming,' Amodei told the outlet. 'I don't think this is on people's radar.' Amodei's warning came a week after Anthropic launched its newest Amazon-backedAI model Claude Opus 4, which is used for complex, long-running coding tasks. It's currently released under specific safety measures after testing raised concerns over the tool's capabilities. For instance, Anthropic revealed in a safety report that Claude Opus 4 had sometimes taken 'extremely harmful actions' in test scenarios such as blackmailing engineers who posed a threat in taking it down. Anthropic co-founder and chief scientist Jared Kaplan also told Time magazine that their tests revealed the AI model could potentially teach people how to produce biological weapons. 'You could try to synthesize something like COVID or a more dangerous version of the flu — and basically, our modeling suggests that this might be possible,' Kaplan said. Amodei's warning focuses on the economic impact of AI models like his, but says there's still time to mitigate his worst-case scenario from happening. The CEO suggests raising awareness, creating a joint committee on AI or formally briefing all lawmakers on the technology, encouraging workers to use AI to augment their tasks, and begin debating policy solutions for an economy dominated by AI. One policy the CEO recommends is a 'token tax,' which taxes whatever money the AI company makes every time someone uses its model. 'Obviously, that's not in my economic interest,' Amodei said. 'But I think that would be a reasonable solution to the problem.' Amodei is not the first executive to warn of AI's potential consequences. Nvidia's Jensen Huang told an audience at the Milken Institute's Global Conference earlier this month that 'you're not going to lose your job to an AI, but you're going to lose your job to someone who uses AI,' CNBC reports. LinkedIn's Aneesh Raman warned young workers in an op-ed published in The New York Times that AI also poses a real threat to their entry-level jobs, saying 'virtually all jobs will experience some impacts, but office jobs are expected to feel the biggest crunch.' 'You can't just step in front of the train and stop it,' Amodei told Axios. 'The only move that's going to work is steering the train — steer it 10 degrees in a different direction from where it was going. That can be done. That's possible, but we have to do it now.' Amazon-Backed AI Model Would Try To Blackmail Engineers Who Threatened To Take It Offline Social Media Brutally Mocks Elon Musk's Goodbye Note To DOGE Elon Musk Is Leaving The Trump Administration After Criticizing 'Big Beautiful Bill'