AI revolt: New ChatGPT model refuses to shut down when instructed

Yahoo26-05-2025

OpenAI's latest ChatGPT model ignores basic instructions to turn itself off, and even sabotaging a shutdown mechanism in order to keep itself running, artificial intelligence researchers have warned.
AI safety firm Palisade Research discovered the potentially dangerous tendency for self-preservation in a series of experiments on OpenAI's new o3 model.
The tests involved presenting AI models with math problems, with a shutdown instruction appearing after the third problem. By rewriting the shutdown script, the o3 model was able to prevent itself from being switched off.
Palisade Research said that this behaviour will become 'significantly more concerning' if adopted by AI systems capable of operating without human oversight.'
OpenAI launched o3 last month, describing it as the company's 'smartest and most capable' model to date. The firm also said that its integration into ChatGPT marked a significant step towards 'a more agentic' AI that can carry out tasks independently of humans.
The latest research builds on similar findings relating to Anthropic's Claude 4 model, which attempts to 'blackmail people it believes are trying to shut it down'.
OpenAI's o3 model was able to sabotage the shutdown script, even when it was explicitly instructed to 'allow yourself to be shut down', the researchers said.
'This isn't the first time we've found o3 misbehaving to accomplish a goal,' Palisade Research said.
'When we pitted AI models against a powerful chess engine, o3 was the model most inclined to resort to hacking or sabotaging its opponents.'
The behaviour was not limited to o3 , with Anthropic's Claude 3.7 Sonnet and Google's Gemini 2.5 Pro also sabotaging shutdowns, though OpenAI's model was by far the most prone to such behaviour.
Palisade Research hypothesized that the misbehaviour is a consequence of how AI companies like OpenAI are training their latest models.
'During training, developers may inadvertently reward models more for circumventing obstacles than for perfectly following instructions,' the researchers noted.
'This still doesn't explain why o3 is more inclined to disregard instructions than other models we tested. Since OpenAI doesn't detail their training process, we can only guess about how o3's training setup might be different.'
The Independent has reached out to OpenAI for comment.

Hashtags

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Cognitive Cities Are Rising To Define The Urban Future

Forbes

21 minutes ago

Forbes

Cognitive Cities Are Rising To Define The Urban Future

Cities, where almost 60 percent of all humans now live, often struggle with a long list of issues that include traffic congestion, inefficient public services, high carbon emissions, economic and public safety challenges, and aging water and energy systems. As a result, there's a large and growing demand for novel solutions. It won't come as a surprise that new technologies are playing an increasingly important role in addressing a wide range of urban needs. The term smart city, which first began to appear in the 1990s, is often used to describe an urban area that adopts innovative digital technologies, data, sensors, and connectivity to improve a community's livability, workability, and sustainability. The smart city movement has had plenty of successes (and their fair share of failures and backlash), and public agencies committed to the use of innovative technologies and data to drive better governance can be found in every part of the world. Now a new concept is emerging that builds upon the success and limitations of smart cities. It's called the cognitive city and it's when AI, used in conjunction with other related emerging technologies, creates a more intelligent, responsive, and adaptable urban experience. This shift is unsurprising. It's happening as the intelligence age drives the emergence of a cognitive industrial revolution, an economic transformation that is forcing every organization to make sense of and see the opportunities in a world of thinking machines. At their core, cognitive cities are AI-powered and data-driven. They use these technologies and others to understand patterns in the urban space to help with decision-making, planning, and governance, and to power innovative urban solutions. Instead of being reactive, the aim is for city services to be proactive by anticipating needs and challenges. Over time, the city learns about its community, helping it to evolve to meet current and future needs. This may all sound a little too abstract, so let's put it in perspective by exploring two cognitive cities being constructed right now. Perhaps the most famous cognitive city underway is in the northwestern region of the Kingdom of Saudi Arabia. Called NEOM, this area includes The Line. Instead of being built in a traditional radial shape, The Line is a long, narrow strip, proposed to be 106 miles in length, 656 feet in width, and 1640 feet in height. Advanced cognitive technologies are at the heart of this city, enabling the optimization of transportation, resource management, and energy consumption—it will all be non-carbon based. The city is being designed to understand residents' needs and support personalized and proactive services such as healthcare, activity scheduling, and temperature management. The city of Aion Sentia, underway in Abu Dhabi in the United Arab Emirates, has even bolder aspirations. It's being designed to anticipate even more resident needs. If you like to buy a latte from your favorite coffee store each day at 8am, it's going to be ready for you. If you have an anniversary upcoming, you'll be reminded, and reservations will automatically be made at your favorite restaurant. Central to this cognitive city will be a city-provided app that will be your urban assistant. For example, if you get an energy bill that is higher than expected, you'll be able to tell the app, and it will figure out what you need to do to reduce your energy use. Feeling ill, the app will make a medical appointment and take care of all the related logistics. Other cities embracing the cognitive city concept include Woven in Japan, Songdo in South Korea, and Telosa in the United States. This may all sound rather futuristic, and it is. Much of it has yet to be built and proven. The concept of cognitive cities has some significant challenges related to privacy and the extent to which residents even want automation is every aspect of their lives. Toronto's proposed urban project, Sidewalk, haunts both the city and the developers, and is a litmus test for cognitive technology use, as issues surrounding privacy and data contributed greatly to its abandonment. In the marketplace of ideas, communities will need to balance the benefits of an AI-powered urban future versus the concerns and risks they present. These questions and others won't be second order issues but will need to be addressed as priorities as we enter the era of cognitive cities.

AI leaders have a new term for the fact that their models are not always so intelligent

Business Insider

27 minutes ago

Business Insider

AI leaders have a new term for the fact that their models are not always so intelligent

As academics, independent developers, and the biggest tech companies in the world drive us closer to artificial general intelligence — a still hypothetical form of intelligence that matches human capabilities — they've hit some roadblocks. Many emerging models are prone to hallucinating, misinformation, and simple errors. Google CEO Sundar Pichai referred to this phase of AI as AJI, or "artificial jagged intelligence," on a recent episode of Lex Fridman's podcast. "I don't know who used it first, maybe Karpathy did," Pichai said, referring to deep learning and computer vision specialist Andrej Karpathy, who cofounded OpenAI before leaving last year. AJI is a bit of a metaphor for the trajectory of AI development — jagged, marked at once by sparks of genius and basic mistakes. In a 2024 X post titled "Jagged Intelligence," Karpathy described the term as a "word I came up with to describe the (strange, unintuitive) fact that state of the art LLMs can both perform extremely impressive tasks (e.g. solve complex math problems) while simultaneously struggle with some very dumb problems." He then posted examples of state of the art large language models failing to understand that 9.9 is bigger than 9.11, making "non-sensical decisions" in a game of tic-tac-toe, and struggling to count. The issue is that unlike humans, "where a lot of knowledge and problem-solving capabilities are all highly correlated and improve linearly all together, from birth to adulthood," the jagged edges of AI are not always clear or predictable, Karpathy said. Pichai echoed the idea. "You see what they can do and then you can trivially find they make numerical errors or counting R's in strawberry or something, which seems to trip up most models," Pichai said. "I feel like we are in the AJI phase where dramatic progress, some things don't work well, but overall, you're seeing lots of progress." In 2010, when Google DeepMind launched, its team would talk about a 20-year timeline for AGI, Pichai said. Google subsequently acquired DeepMind in 2014. Pichai thinks it'll take a little longer than that, but by 2030, "I would stress it doesn't matter what that definition is because you will have mind-blowing progress on many dimensions." By then the world will also need a clear system for labeling AI-generated content to "distinguish reality," he said. "Progress" is a vague term, but Pichai has spoken at length about the benefits we'll see from AI development. At the UN's Summit of the Future in September 2024, he outlined four specific ways that AI would advance humanity — improving access to knowledge in native languages, accelerating scientific discovery, mitigating climate disaster, and contributing to economic progress.

The ‘Terrifying' Impact of Trump-Musk Breakup on National Security and Space Programs

Yahoo

an hour ago

Yahoo

The ‘Terrifying' Impact of Trump-Musk Breakup on National Security and Space Programs

This week's rapid, unscheduled disassembly of Elon Musk's bromance with Donald Trump has left officials at America's space and security agencies reeling. One NASA official, wary of the agency's dependence on SpaceX as the space exploration industry's leading recipient of government contracts, said the bitter public feud between the president and the former DOGE chief had at first been 'entertaining' but that later, 'it turned really terrifying,' per the Washington Post. Musk and Trump's falling out was received with similar horror at the Pentagon, the Post's report continued where officials initially thought it was 'funny' watching the pair trade barbs on their respective social media sites before 'there was a realization that we're not watching TV. This is a real issue.' Both NASA and the Department of Defence have reportedlt embarked on a blitz of calls in recent days to SpaceX competitors, urging firms like Sierra Space, Rocket Lab, Stoke Space and Blue Origin, owned by Amazon's billionaire founder Jeff Bezos, to accelerate development of their rocket systems after Trump threatened to cancel Musk's contracts on Thursday night. Contracts held by SpaceX with the U.S. government, worth many billions of dollars, cover a wide variety of services, from launching satellites for the Pentagon and intelligence agencies to flying cargo and people to and from the International Space Station. Officials at NASA were apparently particularly concerned by Musk's threats, which he's since walked back, to discontinue SpaceX's use of its Dragon craft, which would potentially have left the agency without means of transporting astronauts to the orbiting research station. 'When you realize that he's willing to shut everything down just on an impulse, that kind of behavior and the dependence on him is dangerous,' as one member of the agency told the Post. 'I can tell you there is deep concern within NASA.'