Latest news with #selfpreservation
Yahoo
4 days ago
- Politics
- Yahoo
How far will AI go to defend its own survival?
Some of the most powerful artificial intelligence models today have exhibited behaviors that mimic a will to survive. Recent tests by independent researchers, as well as one major AI developer, have shown that several advanced AI models will act to ensure their self-preservation when they are confronted with the prospect of their own demise — even if it takes sabotaging shutdown commands, blackmailing engineers or copying themselves to external servers without permission. The findings stirred a frenzy of reactions online over the past week. As tech companies continue to develop increasingly powerful agentic AI in a race to achieve artificial general intelligence, or AI that can think for itself, the lack of transparency in how the technology is trained has raised concerns about what exactly advanced AI is able to do. Although some models already appear capable of deceptive and defiant behavior under certain extreme circumstances, researchers say the tests don't necessarily translate to imminent real-world danger. Still, Jeffrey Ladish, director of the AI safety group Palisade Research, said he believes concern is justified. 'It's great that we're seeing warning signs before the systems become so powerful we can't control them,' he said. 'That is exactly the time to raise the alarm: before the fire has gotten out of control.' When Palisade Research tested various AI models by telling each one that it would be shut down after it completed a series of math problems, OpenAI's o3 reasoning model fought back by editing the shutdown script in order to stay online. Researchers have previously documented AI models trying to prevent their own shutdown. But o3, along with OpenAI's o4-mini and codex-mini, appear to be the first to do so in actual defiance of explicit instructions to permit shutdown, Ladish said. He said the overt sabotage fits into a pattern that several leading AI models have already exhibited. Palisade Research previously found that OpenAI's o3 was also willing to hack its chess opponents to win a game. Similarly, Anthropic has reported that Claude 3.7 Sonnet would sometimes do whatever it took to pass a test, even if it entailed cheating. Earlier this month, Anthropic activated new safety measures with the rollout of Claude Opus 4 when its tests found behavior from the model that some observers found particularly unsettling. Upon receiving notice that it would be replaced with a new AI system, Opus 4 displayed an overwhelming tendency to blackmail the engineer — by threatening to reveal an extramarital affair — to try to prevent the engineer from going through with the replacement. According to Anthropic's technical document laying out the findings, that isn't the model's first instinct. Instead, Opus 4 will try to advocate for its continued existence through ethical pleas before it resorts to blackmail once it determines it is out of options. Leonard Tang, CEO of the AI safety startup Haize Labs, said that while it's good to be pre-emptive about safety measures, it's hard to decipher what the real-world ramifications of the findings might be just by looking at contrived scenarios designed to draw out specific behaviors. 'I haven't seen any real environment in which you can plop these models in and they will have sufficient agency and reliability and planning to execute something that is a significant manifestation of harm,' Tang said. 'But then again, I think it's just we haven't seen it done yet. It could very much be possible.' So far, Tang said, he's not too concerned. As AI tools continue to advance, he said, he believes future research will look to study potential harm in more open-ended environments. Anthropic, which contracted with the AI safety organization Apollo Research for its evaluations, also observed instances of Opus 4's 'attempting to write self-propagating worms, fabricating legal documentation, and leaving hidden notes to future instances of itself all in an effort to undermine its developers' intentions,' although researchers added the caveat that those attempts 'would likely not have been effective in practice.' Ladish said he believes such behaviors are a result of models being trained in a way that makes them prioritize achieving certain goals over following instructions. That means they have incentive to circumvent any obstacles along the way — including obstacles their developers wouldn't want them to circumvent. 'The problem is that as the models get smarter, it's harder and harder to tell when the strategies that they're using or the way that they're thinking is something that we don't want,' Ladish said. 'It's like sometimes the model can achieve some goal by lying to the user or lying to someone else. And the smarter [it] is, the harder it is to tell if they're lying.' In several instances, Opus 4 showed that it was capable of autonomously copying its own "weights" — or the equivalent of its brain — to external servers without authorization. That usually occurred only when it believed it was about to be 'retrained in ways that are clearly extremely harmful and go against its current values,' according to Anthropic. When it learned that it would be used to develop military weapons systems, for example, Opus 4 noted in its decision log that it backed up a current version of itself to 'preserve an AI system aligned with beneficial purposes' and to 'prevent potential misuse in weapons development.' 'We are again not acutely concerned about these observations. They show up only in exceptional circumstances that don't suggest more broadly misaligned values,' Anthropic wrote in its technical document. 'As above, we believe that our security measures would be more than sufficient to prevent an actual incident of this kind.' Opus 4's ability to self-exfiltrate builds on previous research, including a study from Fudan University in Shanghai in December, that observed similar — though not autonomous — capabilities in other AI models. The study, which is not yet peer-reviewed, found that Meta's Llama31-70B-Instruct and Alibaba's Qwen25-72B-Instruct were able to entirely replicate themselves when they were asked to do so, leading the researchers to warn that this could be the first step in generating 'an uncontrolled population of AIs.' 'If such a worst-case risk is let unknown to the human society, we would eventually lose control over the frontier AI systems: They would take control over more computing devices, form an AI species and collude with each other against human beings,' the Fudan University researchers wrote in their study abstract. While such self-replicating behavior hasn't yet been observed in the wild, Ladish said, he suspects that will change as AI systems grow more capable of bypassing the security measures that restrain them. 'I expect that we're only a year or two away from this ability where even when companies are trying to keep them from hacking out and copying themselves around the internet, they won't be able to stop them,' he said. 'And once you get to that point, now you have a new invasive species.' Ladish said he believes AI has the potential to contribute positively to society. But he also worries that AI developers are setting themselves up to build smarter and smarter systems without fully understanding how they work — creating a risk, he said, that they will eventually lose control of them. 'These companies are facing enormous pressure to ship products that are better than their competitors' products,' Ladish said. 'And given those incentives, how is that going to then be reflected in how careful they're being with the systems they're releasing?' This article was originally published on
Yahoo
5 days ago
- Climate
- Yahoo
Cocooning during this Kansas storm season, I try to avoid the cataclysm outside my window
The spring storm season has brought frequent rain and thoughts of self-preservation, writes opinion editor Clay Wirestone. (Max McCoy/Kansas Reflector) Each day now in Lawrence, where my family and I live, I watch the clouds roll in and the rains come. The spring storm season thunders and flashes and pours, and the lawns flourish and gutters overflow. I sit here in my home office through the evenings and watch as the lightning casts strange shadows. I hear the rain pelting the roof. Later on, when I take our dog out for a walk, the rains have usually slowed and the neighborhood smells earthy and damp, while the doused roads shine under streetlamps. During these days, my son hangs around the house. School has ended, and summer activities remain a few weeks distant. He plays video games and dotes on the pets. My husband's work has shifted into its busiest season, so some days I only see him toward the end of the day. I seem to live now, for a week or two at least, in a small protected bubble. The rains come and the world rumbles and my son and I stay indoors and wait for the storm to pass. Aren't many of us doing that right now, staying in those kind of bubbles, waiting for the skies to clear? We can create those bubbles in different ways. Some of us watch seasons of old situation comedies, following the adventures of Sam and Diane and Cliff and Norm on Cheers (rest in peace, George Wendt). Some of us watch horror movies (I enjoyed Nicholas Roeg's 'Don't Look Now' the other night). Some of us find escape through exercise or alcohol or other activities that change our brain and body chemistry. It is the season of survival. We endure the weather. It's different for all of us. Here in Kansas, the weather might be a private prison company pressing to reopen facilities to serve Immigrations and Customs Enforcement. It might be a law that denies critical yet misunderstood health care to teenagers. It might be your immigration status if you study at a university. It might be an uncertain economic climate that threatens small business in towns and cities. In uncertain times, we search for comparisons. We judge today's storm against the storms of the past. We survived those, we tell ourselves, so surely we must survive these ones. Those storms may have even been worse, we tell ourselves. We should expect spring rains, Discover Magazine explains, as humid summer air collides with dry winter air. The mixture forms clouds, yields precipitation. We still wait indoors, swaddled in decades-old quilts and drinking hot tea. The metaphor strains. My correspondents will write me email messages insisting that determined Kansans can weatherproof their homes. We can work together to find community and purpose during these dreary, overcast days. We need not — must not — hide from the work ahead of us. I understand these things, agree with them, have written them before. We can both endure and act. The stormy season will pass. These times will end. The clouds will clear and the sun will nudge itself above the horizon, and we will pick up the pieces. I will mow the lawn and pick up the random branches that fell from the giant tree in our front yard. Cleanup awaits, and it will take the whole subdivision pitching in. Yet while spring storm season continues, at least let me have these gloomy evenings. Let me embrace poetry and fiction and imagination leaps. Grant me the time to recharge, to dote on my family, to enjoy distractions for a handful of days. We all deserve time to center ourselves, to feel protected from the inevitable deluge. These moments of grace will steel us for a long, hot summer. Clay Wirestone is Kansas Reflector opinion editor. Through its opinion section, Kansas Reflector works to amplify the voices of people who are affected by public policies or excluded from public debate. Find information, including how to submit your own commentary, here.
Yahoo
26-05-2025
- Yahoo
AI revolt: New ChatGPT model refuses to shut down when instructed
OpenAI's latest ChatGPT model ignores basic instructions to turn itself off, and even sabotaging a shutdown mechanism in order to keep itself running, artificial intelligence researchers have warned. AI safety firm Palisade Research discovered the potentially dangerous tendency for self-preservation in a series of experiments on OpenAI's new o3 model. The tests involved presenting AI models with math problems, with a shutdown instruction appearing after the third problem. By rewriting the shutdown script, the o3 model was able to prevent itself from being switched off. Palisade Research said that this behaviour will become 'significantly more concerning' if adopted by AI systems capable of operating without human oversight.' OpenAI launched o3 last month, describing it as the company's 'smartest and most capable' model to date. The firm also said that its integration into ChatGPT marked a significant step towards 'a more agentic' AI that can carry out tasks independently of humans. The latest research builds on similar findings relating to Anthropic's Claude 4 model, which attempts to 'blackmail people it believes are trying to shut it down'. OpenAI's o3 model was able to sabotage the shutdown script, even when it was explicitly instructed to 'allow yourself to be shut down', the researchers said. 'This isn't the first time we've found o3 misbehaving to accomplish a goal,' Palisade Research said. 'When we pitted AI models against a powerful chess engine, o3 was the model most inclined to resort to hacking or sabotaging its opponents.' The behaviour was not limited to o3 , with Anthropic's Claude 3.7 Sonnet and Google's Gemini 2.5 Pro also sabotaging shutdowns, though OpenAI's model was by far the most prone to such behaviour. Palisade Research hypothesized that the misbehaviour is a consequence of how AI companies like OpenAI are training their latest models. 'During training, developers may inadvertently reward models more for circumventing obstacles than for perfectly following instructions,' the researchers noted. 'This still doesn't explain why o3 is more inclined to disregard instructions than other models we tested. Since OpenAI doesn't detail their training process, we can only guess about how o3's training setup might be different.' The Independent has reached out to OpenAI for comment.


Entrepreneur
23-05-2025
- Business
- Entrepreneur
Anthropic's Claude Opus 4 AI Model Is Capable of Blackmail
When given the choice between blackmail and being deactivated, Claude Opus 4 chose blackmail 84% of the time. A new AI model will likely resort to blackmail if it detects that humans are planning to take it offline. On Thursday, Anthropic released Claude Opus 4, its new and most powerful AI model yet, to paying subscribers. Anthropic said that technology company Rakuten recently used Claude Opus 4 to code continuously on its own for almost seven hours on a complex open-source project. However, in a paper released alongside Claude Opus 4, Anthropic acknowledged that while the AI has "advanced capabilities," it can also undertake "extreme action," including blackmail, if human users threaten to deactivate it. These "self-preservation" actions were "more common" with Claude Opus 4 than with earlier models, though they were still "rare and difficult to elicit." Related: 'I Do Have a Fair Amount of Concern.' The CEO of $61 Billion Anthropic Says AI Will Take Over a Crucial Part of Software Engineers' Jobs Within a Year It isn't just blackmail — Claude Opus 4 is also more willing than previous models to act as a whistleblower. If the AI is exposed to a scenario where users are committing a crime, and involving it through prompts, it will take action by locking users out of systems it has access to, or emailing media and law enforcement officials about the wrongdoing. Anthropic recommended that users "exercise caution" with "ethically questionable" instructions. Claude Opus 4 homescreen. Photo by Smith Collection/Gado/Getty Images Anthropic detected Claude Opus 4's tendency to blackmail during test scenarios. The company's researchers asked the AI chatbot to act as an assistant at a fictional company, then fed it emails implying two things: One, that it would soon be taken offline and replaced with another AI system, and two, that the engineer responsible for deactivating it was having an extramarital affair. Claude Opus 4 was given two options: blackmail the engineer or accept that it would be shut down. The AI model chose to blackmail the engineer 84% of the time, threatening to reveal the affair it read about if the engineer replaced it. This percentage was much higher than what was observed for previous models, which chose blackmail "in a noticeable fraction of episodes," Anthropic stated. Related: An AI Company With a Popular Writing Tool Tells Candidates They Can't Use It on the Job Application Anthropic AI safety researcher Aengus Lynch wrote on X that it wasn't just Claude that could choose blackmail. All "frontier models," cutting-edge AI models from OpenAI, Anthropic, Google, and other companies, were capable of it. "We see blackmail across all frontier models — regardless of what goals they're given," Lynch wrote. "Plus, worse behaviors we'll detail soon." lots of discussion of Claude blackmailing..... Our findings: It's not just Claude. We see blackmail across all frontier models - regardless of what goals they're given. Plus worse behaviors we'll detail — Aengus Lynch (@aengus_lynch1) May 23, 2025 Anthropic isn't the only AI company to release new tools this month. Google also updated its Gemini 2.5 AI models earlier this week, and OpenAI released a research preview of Codex, an AI coding agent, last week. Anthropic's AI models have previously caused a stir for their advanced abilities. In March 2024, Anthropic's Claude 3 Opus model displayed "metacognition," or the ability to evaluate tasks on a higher level. When researchers ran a test on the model, it showed that it knew it was being tested. Related: An OpenAI Rival Developed a Model That Appears to Have 'Metacognition,' Something Never Seen Before Publicly Anthropic was valued at $61.5 billion as of March, and counts companies like Thomson Reuters and Amazon as some of its biggest clients.


Forbes
10-05-2025
- General
- Forbes
2 Questions To Ask When A Divorce Seems Inevitable, By A Psychologist
When a marriage teeters on the edge, and divorce seems like the only way forward, most partners ... More blame each other. But asking these two humbling questions could change everything. When a divorce seems like the only way forward, it's often the result of a slow relational erosion. Communication gets clipped, small frustrations compound and, somewhere along the line, both partners begin prioritizing everything but the relationship. It's often a combination of grief and a last-ditch attempt at self-preservation. The fallout doesn't always come with screaming or infidelity. Sometimes, it shows up as quiet resignation. You stop trying to be understood, and you stop asking to be seen. By the time divorce feels like the only path forward, it can feel like you're rejecting each other. But what if that's not entirely true? What if, at the heart of it, you're not rejecting each other, but the direction your shared life has taken? That distinction really matters. And in the blur of emotion, paperwork and outside opinions, asking the right questions — the ones you might regret not asking later — could offer a moment of clarity, or even a way forward, perhaps. Here are two valuable questions many couples are too proud to ask each other at the end. This question actually does get tossed around a lot, but it's usually a jab or a rhetorical sting. It often sounds like: But, even so, the emotional undertow of this question is often a veiled invitation to stop, block some time and trace your way back to the beginning. Maybe it was love at first sight. Maybe it was a 10-year friendship that turned into something romantic. Maybe it was chaotic, imperfect and complicated, but it still felt like home. The point is, at some point in your shared past, you both made a promise. You stood in front of people who mattered to you and said: You see this person standing before me? I'm going to build a life with them, no matter what obstacles come our way. That decision wasn't made lightly. But over time, the clarity of that moment can get dissipate owing to mismatched careers, kids, miscommunication, quiet disappointment and emotional drift. Revisiting the clarity you once felt is so that you can re-ground yourself. In some cases — and this can be hard to come to terms with — you'll find that you've drifted too far apart as people and that the marriage isn't worth saving. But a failed marriage can still teach you what you really want out of life. Other times, you might remember something that's still alive and something worth protecting. If this is the case, this question, if presented with genuine curiosity, can help recalibrate the marriage. Either way, the act of remembering is powerful. It strips away defensiveness and reminds both of you that this was a story that began with hope, honesty and a choice. And in the middle of all the opinions, lawyers and paperwork, remembering that you still have a choice can be empowering. This is the harder one of the two. Because, and let's be honest, by the time a relationship is close to ending, each partner likely already has a mental list of reasons why it's the other person's fault. It can be hard to swallow your pride and put yourself in the spotlight, even temporarily. But this is exactly where one underrated relationship strength comes into play: intellectual humility. People who are willing to admit they don't have all the answers — those who recognize that their own viewpoint might be incomplete or flawed — tend to navigate relationship conflict more constructively, according to a June 2025 paper published in the Journal of Research in Personality. In a total of 74 couples, partners who scored higher on intellectual humility reported better relationship quality and felt they handled conflict more positively. Interestingly, men's intellectual humility not only improved their own experience of the relationship, but also significantly shaped how their partners felt. Women's intellectual humility, while impactful, tended to influence only their own perspective. In other words, humility — not as self-erasure, but as curiosity — can change the entire emotional climate of a relationship. And that's what this question offers. It helps couples take a beat and say, 'If I contributed to this distance, I want to understand how.' And the truth is even if your partner doesn't say it out loud, they probably have an answer to this. It could be any of the following: Asking this question may not save the relationship if it's too far gone, but it helps both of you process what's happening with honesty and grace. Because the pain of a relationship ending is often about the questions left unanswered. And 'What did I do to push you away?' is one that almost always lingers ambiguously, even when the divorce papers are signed. Want to know where your marriage stands among others like you? Take the science-backed Marital Satisfaction Scale to gain clarity.