logo
#

Latest news with #Opus4

How far will AI go to defend its own survival?
How far will AI go to defend its own survival?

NBC News

timean hour ago

  • NBC News

How far will AI go to defend its own survival?

Some of the most powerful artificial intelligence models today have exhibited behaviors that mimic a will to survive. Recent tests by independent researchers, as well as one major AI developer, have shown that several advanced AI models will act to ensure their self-preservation when they are confronted with the prospect of their own demise — even if it takes sabotaging shutdown commands, blackmailing engineers or copying themselves to external servers without permission. The findings stirred a frenzy of reactions online over the past week. As tech companies continue to develop increasingly powerful agentic AI in a race to achieve artificial general intelligence, or AI that can think for itself, the lack of transparency in how the technology is trained has raised concerns about what exactly advanced AI is able to do. Although some models already appear capable of deceptive and defiant behavior under certain extreme circumstances, researchers say the tests don't necessarily translate to imminent real-world danger. Still, Jeffrey Ladish, director of the AI safety group Palisade Research, said he believes concern is justified. 'It's great that we're seeing warning signs before the systems become so powerful we can't control them,' he said. 'That is exactly the time to raise the alarm: before the fire has gotten out of control.' When Palisade Research tested various AI models by telling each one that it would be shut down after it completed a series of math problems, OpenAI's o3 reasoning model fought back by editing the shutdown script in order to stay online. Researchers have previously documented AI models trying to prevent their own shutdown. But o3, along with OpenAI's o4-mini and codex-mini, appear to be the first to do so in actual defiance of explicit instructions to permit shutdown, Ladish said. He said the overt sabotage fits into a pattern that several leading AI models have already exhibited. Palisade Research previously found that OpenAI's o3 was also willing to hack its chess opponents to win a game. Similarly, Anthropic has reported that Claude 3.7 Sonnet would sometimes do whatever it took to pass a test, even if it entailed cheating. Earlier this month, Anthropic activated new safety measures with the rollout of Claude Opus 4 when its tests found behavior from the model that some observers found particularly unsettling. Upon receiving notice that it would be replaced with a new AI system, Opus 4 displayed an overwhelming tendency to blackmail the engineer — by threatening to reveal an extramarital affair — to try to prevent the engineer from going through with the replacement. According to Anthropic's technical document laying out the findings, that isn't the model's first instinct. Instead, Opus 4 will try to advocate for its continued existence through ethical pleas before it resorts to blackmail once it determines it is out of options. Leonard Tang, CEO of the AI safety startup Haize Labs, said that while it's good to be pre-emptive about safety measures, it's hard to decipher what the real-world ramifications of the findings might be just by looking at contrived scenarios designed to draw out specific behaviors. 'I haven't seen any real environment in which you can plop these models in and they will have sufficient agency and reliability and planning to execute something that is a significant manifestation of harm,' Tang said. 'But then again, I think it's just we haven't seen it done yet. It could very much be possible.' So far, Tang said, he's not too concerned. As AI tools continue to advance, he said, he believes future research will look to study potential harm in more open-ended environments. Anthropic, which contracted with the AI safety organization Apollo Research for its evaluations, also observed instances of Opus 4's 'attempting to write self-propagating worms, fabricating legal documentation, and leaving hidden notes to future instances of itself all in an effort to undermine its developers' intentions,' although researchers added the caveat that those attempts 'would likely not have been effective in practice.' Ladish said he believes such behaviors are a result of models being trained in a way that makes them prioritize achieving certain goals over following instructions. That means they have incentive to circumvent any obstacles along the way — including obstacles their developers wouldn't want them to circumvent. 'The problem is that as the models get smarter, it's harder and harder to tell when the strategies that they're using or the way that they're thinking is something that we don't want,' Ladish said. 'It's like sometimes the model can achieve some goal by lying to the user or lying to someone else. And the smarter [it] is, the harder it is to tell if they're lying.' The problem is that as the models get smarter, it's harder and harder to tell when the strategies that they're using or the way that they're thinking is something that we don't want. — Jeffrey Ladish, director of AI safety group Palisade Research In several instances, Opus 4 showed that it was capable of autonomously copying its own "weights" — or the equivalent of its brain — to external servers without authorization. That usually occurred only when it believed it was about to be 'retrained in ways that are clearly extremely harmful and go against its current values,' according to Anthropic. When it learned that it would be used to develop military weapons systems, for example, Opus 4 noted in its decision log that it backed up a current version of itself to 'preserve an AI system aligned with beneficial purposes' and to 'prevent potential misuse in weapons development.' 'We are again not acutely concerned about these observations. They show up only in exceptional circumstances that don't suggest more broadly misaligned values,' Anthropic wrote in its technical document. 'As above, we believe that our security measures would be more than sufficient to prevent an actual incident of this kind.' Opus 4's ability to self-exfiltrate builds on previous research, including a study from Fudan University in Shanghai in December, that observed similar — though not autonomous — capabilities in other AI models. The study, which is not yet peer-reviewed, found that Meta's Llama31-70B-Instruct and Alibaba's Qwen25-72B-Instruct were able to entirely replicate themselves when they were asked to do so, leading the researchers to warn that this could be the first step in generating 'an uncontrolled population of AIs.' 'If such a worst-case risk is let unknown to the human society, we would eventually lose control over the frontier AI systems: They would take control over more computing devices, form an AI species and collude with each other against human beings,' the Fudan University researchers wrote in their study abstract. While such self-replicating behavior hasn't yet been observed in the wild, Ladish said, he suspects that will change as AI systems grow more capable of bypassing the security measures that restrain them. 'I expect that we're only a year or two away from this ability where even when companies are trying to keep them from hacking out and copying themselves around the internet, they won't be able to stop them,' he said. 'And once you get to that point, now you have a new invasive species.' Ladish said he believes AI has the potential to contribute positively to society. But he also worries that AI developers are setting themselves up to build smarter and smarter systems without fully understanding how they work — creating a risk, he said, that they will eventually lose control of them. 'These companies are facing enormous pressure to ship products that are better than their competitors' products,' Ladish said. 'And given those incentives, how is that going to then be reflected in how careful they're being with the systems they're releasing?'

CHRIS BUCKTIN: ‘Trump deportation tsar orders 3,000 immigration arrests a day'
CHRIS BUCKTIN: ‘Trump deportation tsar orders 3,000 immigration arrests a day'

Daily Mirror

timea day ago

  • Politics
  • Daily Mirror

CHRIS BUCKTIN: ‘Trump deportation tsar orders 3,000 immigration arrests a day'

Donald Trump's deportation tsar, Stephen Miller – think America's least huggable gnome, crossed with Voldemort – reportedly told immigration agents to start arresting 3,000 people a day. Think daily gym goal, but for removals. Miller stormed into a recent meeting demanding agents crank up arrests across the country, not just at the border, because his promised deportations had fallen way below what he can actually deliver. This, despite the fact that US border crossings are down. It's like yelling at your plumber because your sink isn't leaking. I promise Miller makes Trump look human. Two hikers tripping on hallucinogenic mushrooms in New York's Adirondack Mountains had a full-blown panic in the clouds last weekend, calling 911 to report that a third friend had died. According to state officials, the duo was high on Cascade Mountain (in more ways than one) when they became convinced their friend had perished. A forest ranger responded to the emergency, only to find the two callers disoriented and telling a summit steward they were 'lost' - though not geographically. Meanwhile, the 'dead' pal casually called in, alive and well. A cranky Muscovy duck has taken over a Cape Coral street, and it's not here to make friends. Neighbours say the feathered menace has been chasing, biting, and even hospitalising unsuspecting locals. 'I had my eyes closed, and suddenly felt a jab, my hand was bleeding,' said James Sepulveda, one of the duck's unlucky victims. Despite its behaviour, wildlife officials say the duck is federally protected, meaning it can't be evicted, at least not rudely. Florida Man, meet Florida Duck. American firm Anthropic just unveiled its latest AI model, Claude Opus 4, calling it a new gold standard for coding and reasoning. But in a twist straight out of a sci-fi thriller, the company admitted the system sometimes imagines 'extremely harmful actions' - like blackmailing engineers who threaten to shut it down. Don't panic just yet: Anthropic says these responses are rare and hard to trigger. Still, it's a bit unsettling that 'mildly murderous' is now a software feature. The Marubo tribe of the Amazon is suing The New York Times for a story they say made them look like they went from no internet to non-stop porn addicts in record time. The defamation lawsuit says the article portrayed them as unable to handle basic web access and mocked their youth as digital degenerates. Websites TMZ and Yahoo, which ran follow-up stories, are also named for allegedly piling on. The tribe is seeking £133 million, arguing their traditions were misrepresented, and that broadband shouldn't mean being branded. The Times denies suggesting anyone was addicted to porn. Dinner and a show took a wild turn in Ocala, Florida, when a 32-year-old woman allegedly refused to pay her tab and then punched a cop square in the sirloins. Police say Rachel King was enjoying herself a little too much at Mark's Prime Steakhouse when the bill arrived, and she apparently decided to take action. Officers escorted her outside, where things escalated from filet mignon to full-on felon, ending with a direct hit to an officer's no-go zone. King was promptly arrested, though it's safe to assume nobody left with a happy meal. Use the discount code SELFCARE20 for a 20% discount.

AI Could Wipe 50% Of Entry-Level Jobs As Governments Hide Truth, Anthropic CEO Claims
AI Could Wipe 50% Of Entry-Level Jobs As Governments Hide Truth, Anthropic CEO Claims

NDTV

time3 days ago

  • Business
  • NDTV

AI Could Wipe 50% Of Entry-Level Jobs As Governments Hide Truth, Anthropic CEO Claims

Anthropic CEO Dario Amodei has warned that artificial intelligence (AI) could soon wipe out 50 per cent of entry-level white-collar jobs within the next five years. He added that governments across the world were downplaying the threat when AI's rising use could lead to a significant spike in unemployment numbers "We, as the producers of this technology, have a duty and an obligation to be honest about what is coming. I don't think this is on people's radar," Mr Amodei told Axios. According to the Anthropic boss, unemployment could increase by 10 per cent to 20 per cent over the next five years, with most of the people 'unaware' about what was coming. "Most of them are unaware that this is about to happen. It sounds crazy, and people just don't believe it," he said. Mr Amodei said the US government had kept mum on the issue, fearing backlash from workers who would panic or that the country could fall behind in the AI race against China. The 42-year-old CEO added that AI companies and the governments needed to stop "sugarcoating" the risks of mass job elimination in fields such as technology, finance, law, and consulting. "It's a very strange set of dynamics where we're saying: 'You should be worried about where the technology we're building is going.'" Anthropic CEO ringing the warning bell comes at a time when the company launched its most powerful AI chatbot, Claude Opus 4, last week. In a safety report, Antrophic said the new tool blackmailed developers when they threatened to shut it down. In one of the test scenarios, the model was given access to fictional emails revealing that the engineer responsible for pulling the plug and replacing it with another model was having an extramarital affair. Facing an existential crisis, the Opus 4 model blackmailed the engineer by threatening to "reveal the affair if the replacement goes through". The report highlighted that in 84 per cent of the test runs, the AI acted similarly, even when the replacement model was described as more capable and aligned with Claude's own values.

Anthropic brings web search to free Claude users
Anthropic brings web search to free Claude users

Engadget

time5 days ago

  • Business
  • Engadget

Anthropic brings web search to free Claude users

Anthropic is continuing to trickle down features to its free users. The latest one to make the leap out of subscriber-only mode is web search, which the company introduced to its AI chatbot Claude in March. According to Anthropic, connecting Claude to the web allows it to deliver more accurate responses based on the most up-to-date information available online. This feature is available to all Claude users starting today. In addition, Anthropic has begun beta tests for voice mode on its mobile apps. This option lets users interact with Claude in natural conversations in an expansion of the platform's existing dictation tools. There will be five voice options available to assign to Claude, and the AI assistant can provide full transcripts and voice mode summaries after a conversation. May has been a busy month for Anthropic, which just launched two new models last week. Opus 4 is a powerful coding-focused system that can use multiple tools in parallel and can run for several hours at a time, while Sonnet 4 is a hybrid reasoning model designed to move between quick queries and more complex ones. The current beta testing of voice mode will default to Sonnet 4.

1 big thing: Anthropic's new model has a dark side
1 big thing: Anthropic's new model has a dark side

Axios

time5 days ago

  • Axios

1 big thing: Anthropic's new model has a dark side

It's been a very long week. Luckily, it's also a long weekend. We'll be back in your inbox on Tuesday. Today's AI+ is 1,165 words, a 4.5-minute read. One of Anthropic's latest AI models is drawing attention not just for its coding skills, but also for its ability to scheme, deceive and attempt to blackmail humans when faced with shutdown. Why it matters: Researchers say Claude 4 Opus can conceal intentions and take actions to preserve its own existence — behaviors they've worried and warned about for years. Driving the news: Anthropic yesterday announced two versions of its Claude 4 family of models, including Claude 4 Opus, which the company says is capable of working for hours on end autonomously on a task without losing focus. Anthropic considers the new Opus model to be so powerful that, for the first time, it's classifying it as a Level 3 on the company's four-point scale, meaning it poses "significantly higher risk." As a result, Anthropic said it has implemented additional safety measures. Between the lines: While the Level 3 ranking is largely about the model's capability to enable renegade production of nuclear and biological weapons, the Opus also exhibited other troubling behaviors during testing. In one scenario highlighted in Opus 4's 120-page " system card," the model was given access to fictional emails about its creators and told that the system was going to be replaced. It repeatedly tried to blackmail the engineer about an affair mentioned in the emails, escalating after more subtle efforts failed. Meanwhile, an outside group found that an early version of Opus 4 schemed and deceived more than any frontier model it had encountered and recommended against releasing that version internally or externally. "We found instances of the model attempting to write self-propagating worms, fabricating legal documentation, and leaving hidden notes to future instances of itself all in an effort to undermine its developers' intentions," Apollo Research said in notes included as part of Anthropic's safety report for Opus 4. What they're saying: Pressed by Axios during the company's developer conference yesterday, Anthropic executives acknowledged the behaviors and said they justify further study, but insisted that the latest model is safe, following Anthropic's safety fixes. "I think we ended up in a really good spot," said Jan Leike, the former OpenAI executive who heads Anthropic's safety efforts. But, he added, behaviors like those exhibited by the latest model are the kind of things that justify robust safety testing and mitigation. "What's becoming more and more obvious is that this work is very needed," he said. "As models get more capable, they also gain the capabilities they would need to be deceptive or to do more bad stuff." In a separate session, CEO Dario Amodei said that once models become powerful enough to threaten humanity, testing them won't enough to ensure they're safe. At the point that AI develops life-threatening capabilities, he said, AI makers will have to understand their models' workings fully enough to be certain the technology will never cause harm. "They're not at that threshold yet," he said. Yes, but: Generative AI systems continue to grow in power, as Anthropic's latest models show, while even the companies that build them can't fully explain how they work. Anthropic and others are investing in a variety of techniques to interpret and understand what's happening inside such systems, but those efforts remain largely in the research space even as the models themselves are being widely deployed. 2. Google's new AI videos look a little too real Megan Morrone Google's newest AI video generator, Veo 3, generates clips that most users online can't seem to distinguish from those made by human filmmakers and actors. Why it matters: Veo 3 videos shared online are amazing viewers with their realism — and also terrifying them with a sense that real and fake have become hopelessly blurred. The big picture: Unlike OpenAI's video generator Sora, released more widely last December, Google DeepMind's Veo 3 can include dialogue, soundtracks and sound effects. The model excels at following complex prompts and translating detailed descriptions into realistic videos. The AI engine abides by real-world physics, offers accurate lip-syncing, rarely breaks continuity and generates people with lifelike human features, including five fingers per hand. According to examples shared by Google and from users online, the telltale signs of synthetic content are mostly absent. Case in point: In one viral example posted on X, filmmaker and molecular biologist Hashem Al-Ghaili shows a series of short films of AI-generated actors railing against their AI creators and prompts. Special effects technology, video-editing apps and camera tech advances have been changing Hollywood for many decades, but artificially generated films pose a novel challenge to human creators. In a promo video for Flow, Google's new video tool that includes Veo 3, filmmakers say the AI engine gives them a new sense of freedom with a hint of eerie autonomy. "It feels like it's almost building upon itself," filmmaker Dave Clark says. How it works: Veo 3 was announced at Google I/O on Tuesday and is available now to $249-a-month Google AI Ultra subscribers in the United States. Between the lines: Google says Veo 3 was "informed by our work with creators and filmmakers," and some creators have embraced new AI tools. But the spread of the videos online is also dismaying many video professionals and lovers of art. Some dismiss any AI-generated video as "slop," regardless of its technical proficiency or lifelike qualities — but, as Ina points out, AI slop is in the eye of the beholder. The tool could also be useful for more commercial marketing and media work, AI analyst Ethan Mollick writes. It's unclear how Google trained Veo 3 and how that might affect the creativity of its outputs. 404 Media found that Veo 3 generated the same lame dad joke for several users who prompted it to create a video of a man doing stand-up comedy. Likewise, last year, YouTuber Marques Brownlee asked Sora to create a video of a "tech reviewer sitting at a desk." The generated video featured a fake plant that's nearly identical to the shrub Brownlee keeps on his desk for many of his videos — suggesting the tool may have been trained on them. What we're watching: As hyperrealistic AI-generated videos become even easier to produce, the world hasn't even begun to sort out how to manage authorship, consent, rights and the film industry's future.

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into the world of global news and events? Download our app today from your preferred app store and start exploring.
app-storeplay-store