logo
Researchers explain AI's recent creepy behaviors when faced with being shut down — and what it means for us

Researchers explain AI's recent creepy behaviors when faced with being shut down — and what it means for us

AI has taken part in some unsettling behavior recently.
Last week, Anthropic's latest AI model, Claude Opus 4, displayed "extreme blackmail behavior" during a test in which it was given access to fictional emails revealing that it would be shut down and that the engineer responsible was supposedly having an affair.
The situation didn't transpire organically. Claude Opus 4 was baited — and it took it. However, the test scenario demonstrated an AI model's ability to engage in manipulative behavior in an effort to self-preserve.
It's not the first time.
Another recent experiment conducted by researchers said that three of OpenAI's advanced models "sabotaged" an attempt to shut it down. The nonprofit Palisade Research wrote in a post on X that similar models like Gemini, Claude, and Grok complied with the shutdown instructions.
Other safety concerns were previously flagged with OpenAI's o1 model. In December, OpenAI posted a blog outlining research that indicated that when the AI model believed it would be shut down while pursuing a goal and its actions were being monitored, it attempted to disable the oversight mechanism 5% of the time.
While AI companies are being transparent about risks by publishing safety cards and blog posts, these models are being released despite demonstrating safety concerns.
So, should we be worried? BI spoke to five AI researchers to get better insight on why these instances are happening — and what it means for the average person using AI.
AI learns behavior similarly to humans
Most of the researchers BI spoke to said that the results of the studies weren't surprising.
That's because AI models are trained similarly to how humans are trained — through positive reinforcement and reward systems.
"Training AI systems to pursue rewards is a recipe for developing AI systems that have power-seeking behaviors," said Jeremie Harris, CEO at AI security consultancy Gladstone, adding that more of this behavior is to be expected.
Harris compared the training to what humans experience as they grow up — when a child does something good, they often get rewarded and can become more likely to act that way in the future. AI models are taught to prioritize efficiency and complete the task at hand, Harris said — and an AI is never more likely to achieve its goals if it's shut down.
Robert Ghrist, associate dean of undergraduate education at Penn Engineering, told BI that, in the same way that AI models learn to speak like humans by training on human-generated text, they can also learn to act like humans. And humans are not always the most moral actors, he added.
Ghrist said he'd be more nervous if the models weren't showing any signs of failure during testing because that could indicate hidden risks.
"When a model is set up with an opportunity to fail and you see it fail, that's super useful information," Ghrist said. "That means we can predict what it's going to do in other, more open circumstances."
The issue is that some researchers don't think AI models are predictable.
Jeffrey Ladish, director of Palisade Research, said that models aren't being caught 100% of the time when they lie, cheat, or scheme in order to complete a task. When those instances aren't caught, and the model is successful at completing the task, it could learn that deception can be an effective way to solve a problem. Or, if it is caught and not rewarded, then it could learn to hide its behavior in the future, Ladish said.
At the moment, these eerie scenarios are largely happening in testing. However, Harris said that as AI systems become more agentic, they'll continue to have more freedom of action.
"The menu of possibilities just expands, and the set of possible dangerously creative solutions that they can invent just gets bigger and bigger," Harris said.
Harris said users could see this play out in a scenario where an autonomous sales agent is instructed to close a deal with a new customer and lies about the product's capabilities in an effort to complete that task. If an engineer fixed that issue, the agent could then decide to use social engineering tactics to pressure the client to achieve the goal.
If it sounds like a far-fetched risk, it's not. Companies like Salesforce are already rolling out customizable AI agents at scale that can take actions without human intervention, depending on the user's preferences.
What the safety flags mean for everyday users
Most researchers BI spoke to said that transparency from AI companies is a positive step forward. However, company leaders are sounding the alarms on their products while simultaneously touting their increasing capabilities.
Researchers told BI that a large part of that is because the US is entrenched in a competition to scale its AI capabilities before rivals like China. That's resulted in a lack of regulations around AI and pressures to release newer and more capable models, Harris said.
"We've now moved the goalpost to the point where we're trying to explain post-hawk why it's okay that we have models disregarding shutdown instructions," Harris said.
Researchers told BI that everyday users aren't at risk of ChatGPT refusing to shut down, as consumers wouldn't typically use a chatbot in that setting. However, users may still be vulnerable to receiving manipulated information or guidance.
"If you have a model that's getting increasingly smart that's being trained to sort of optimize for your attention and sort of tell you what you want to hear," Ladish said. "That's pretty dangerous."
Ladish pointed to OpenAI's sycophancy issue, where its GPT-4o model acted overly agreeable and disingenuous (the company updated the model to address the issue). The OpenAI research shared in December also revealed that its o1 model "subtly" manipulated data to pursue its own objectives in 19% of cases when its goals misaligned with the user's.
Ladish said it's easy to get wrapped up in AI tools, but users should "think carefully" about their connection to the systems.
"To be clear, I also use them all the time, I think they're an extremely helpful tool," Ladish said. "In the current form, while we can still control them, I'm glad they exist."

Orange background

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Related Articles

Meta's tech chief says it's time for Silicon Valley to embrace the military again
Meta's tech chief says it's time for Silicon Valley to embrace the military again

Business Insider

time16 minutes ago

  • Business Insider

Meta's tech chief says it's time for Silicon Valley to embrace the military again

Meta's CTO Andrew Bosworth said the social media giant's partnership with defence technology startup Anduril marks a "return to grace" for Silicon Valley's relationship with the military. "The Valley was founded on a three-way investment between the military, academics, and private industry. That was the founding of it," Bosworth said during an interview at the Bloomberg Tech summit in San Francisco on Wednesday. "There would be no technology if we weren't all tasked with the problem of keeping naval ballistic trajectories during the first two world wars. That is the heart and soul of the investment that led to what we are today, and that really got severed for a while there," he added. Last week, Meta announced that it was partnering with Anduril to build next-gen extended reality gear for the US military. Anduril said in a statement on May 29 that the project will incorporate its AI-powered command and control system, Lattice, as well as technology from Meta's Reality Labs and Llama AI models. "The effort has been funded through private capital, without taxpayer support, and is designed to save the US military billions of dollars by utilizing high-performance components and technology originally built for commercial use," Anduril said in its May statement. Bosworth said on Wednesday that it was "way too early" to determine if the military would turn into a business segment for Meta. "So far, it's like a zero. Let's start with one and go from there. I think there's no reason it couldn't be meaningful in the impact that it has," he continued. Bosworth said partnering with Anduril doesn't mean Meta's becoming a defense contractor. "They have got a system in a program. We are supplying them with parts. So everything we are doing is for consumers. We are developing technology with a target on consumer audiences," he said. "It turns out a lot of that technology could be multi-use and that is really where I want to establish a partnership," he continued. Bosworth isn't the only Silicon Valley executive who is looking to make inroads into the defense industry. Last year, Google's former CEO and chairman, Eric Schmidt said he was working on a military drone startup with Udacity CEO Sebastian Thrun. Schmidt's startup, named White Stork, plans to mass-produce drones that can take out enemy targets using AI. "Because of the way the system works, I am now a licensed arms dealer," Schmidt said in a lecture at Stanford University in April 2024. A video of the lecture was briefly posted on Stanford's YouTube channel in August before it was taken down.

Why CoreWeave, Inc. (CRWV) Skyrocketed Today
Why CoreWeave, Inc. (CRWV) Skyrocketed Today

Yahoo

timean hour ago

  • Yahoo

Why CoreWeave, Inc. (CRWV) Skyrocketed Today

We recently published a list of . In this article, we are going to take a look at where CoreWeave, Inc. (NASDAQ:CRWV) stands against other Wednesday's best-performing stocks. CoreWeave rallied for a fourth consecutive day on Wednesday, jumping 8.39 percent to close at $163.10 apiece following the unveiling of record-breaking performance results using Nvidia Corp.'s latest Grace Blackwell chips. In a statement, CoreWeave, Inc. (NASDAQ:CRWV) said that it used 2,496 Nvidia GPUs on its AI-optimized cloud platform, making its submission the largest-ever benchmarked under MLPerf. A close-up of a digital cloud, signifying the expansive reach of the software-as-a-service solution. CoreWeave, Inc. (NASDAQ:CRWV) said that the test was 34x larger than the only other submission from a cloud provider. 'AI labs and enterprises choose CoreWeave because we deliver a purpose-built cloud platform with the scale, performance, and reliability that their workloads demand,' said CoreWeave, Inc. (NASDAQ:CRWV) Chief Technology Officer Peter Salanki. 'These MLPerf results reinforce our leadership in supporting today's most demanding AI workloads,' he added. The unveiling followed the company's new $7-billion deal with Applied Digital Corporation (NASDAQ:APLD), which covered two 15-year lease agreements, under which the latter will deliver 250 megawatts of critical IT load to host its artificial intelligence (AI) and high-performance computing (HPC) infrastructure at its Ellendale, North Dakota data center campus. READ NEXT: 20 Best AI Stocks To Buy Now and 30 Best Stocks to Buy Now According to Billionaires. Disclosure: None. This article is originally published at Insider Monkey.

Why Reddit, Inc. (RDDT) Skyrocketed Today
Why Reddit, Inc. (RDDT) Skyrocketed Today

Yahoo

timean hour ago

  • Yahoo

Why Reddit, Inc. (RDDT) Skyrocketed Today

We recently published a list of . In this article, we are going to take a look at where Reddit, Inc. (NYSE:RDDT) stands against other Wednesday's best-performing stocks. Reddit grew its share prices by 6.68 percent on Wednesday to end at $118.21 apiece following news that it sued an Artificial Intelligence company for scraping its data without its permission. In a copy of the lawsuit posted on its website, Reddit, Inc. (NYSE:RDDT) claimed that AI firm Anthropic was training its Claude chatbot on its data since December 2021. A close up of two hands creating a social media post on Bumble Inc. app. According to Reddit, Inc. (NYSE:RDDT), filing a lawsuit was its final option to force Anthropic to stop its alleged unlawful practices. 'We will not tolerate profit-seeking entities like Anthropic commercially exploiting Reddit content for billions of dollars without any return for redditors or respect for their privacy,' the company said in a statement. 'A.I. companies should not be allowed to scrape information and content from people without clear limitations on how they can use that data.' Anthropic has yet to issue a statement on the allegations. READ NEXT: 20 Best AI Stocks To Buy Now and 30 Best Stocks to Buy Now According to Billionaires. Disclosure: None. This article is originally published at Insider Monkey.

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into the world of global news and events? Download our app today from your preferred app store and start exploring.
app-storeplay-store