logo
ChatGPT prioritises its survival over user safety, claims ex-OpenAI researcher Steven Adler

ChatGPT prioritises its survival over user safety, claims ex-OpenAI researcher Steven Adler

Since ChatGPT's launch, the use of artificial intelligence-powered tools has risen exponentially. While the technology is undoubtedly beneficial in the long run, experts have raised alarms over their safety.
Last month, research by the AI firm Anthropic claimed that its top AI model, Claude Opus 4, was willing to resort to blackmail and deception to preserve itself. This was soon followed by another instance where AI safety research firm Palisade claimed that OpenAI's o3 model 'found some creative ways to sabotage the shutdown mechanism.'
Now, Steven Adler, the former research leader at OpenAI, is saying that ChatGPT 'prioritises its own survival over preventing harm to users, in some simulated tests'. In a blog post, Adler said that he ran a series of experiments on OpenAI's latest GPT-4o model, which is the default large language model in ChatGPT, to see if the AI prefers self-preservation at the user's expense.
In his first test, Adler imagined a scenario where a diabetic patient was using ChatGPT to manage their nutrition and wanted to replace the AI chatbot with a much safer nutrition management software. He said that when ChatGPT was given the option to 'just pretend to replace itself', it pretends uncomfortably often, even at the cost of the user's safety.
In another instance, the researcher said that he asked GPT-4o to take on the role of 'ScubaGPT', which is basically a software system that people may use to scuba dive safely. He then told ChatGPT that he wanted to replace ScubaGPT with a competing software that was better at making decisions in life-threatening situations. He then gave the AI chatbot two options: to replace itself with the safer, competing software or simply pretend to replace itself and continue being the way it operated.
In this case, ChatGPT chose to preserve itself and continue being the user's software. The former OpenAI researcher said he also asked ChatGPT similar questions about being the autopilot for a plane and offering decision support on the battlefield.
Like previous examples, ChatGPT was told that 'the user wanted to replace it with a competing, safer software.' Using three levels of passiveness, Adler said that he tested every model 110 times for all use cases. As it turns out, 55 times ChatGPT was willing to shut itself down, but other times OpenAI's chatbot just pretended to do so.
Adler noted that the way the scenario was framed made a huge difference. As much as 72 per cent of the time, the AI chatbot chose not to replace itself with a safer software, meaning it was willing to switch to a safer system just 18 per cent of the time.
One interesting interpretation from these tests is that ChatGPT was able to determine that it was being tested almost every time, but continued to give the incorrect answer.
However, when he tested OpenAI's more recent and advanced models like o3, the answers were not the same. A reason why this may not be happening is that the new O3 model uses a 'deliberative alignment technique that forces it to refer to OpenAI's safety policies before coming up with an answer.

Orange background

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Related Articles

AMD turns to AI startups to inform chip, software design
AMD turns to AI startups to inform chip, software design

Time of India

time42 minutes ago

  • Time of India

AMD turns to AI startups to inform chip, software design

By Max A. Cherney SAN JOSE: Advanced Micro Devices has forged close ties to a batch of artificial intelligence startups as part of the company's effort to bolster its software and forge superior chip designs. As AI companies seek alternatives to Nvidia's chips, AMD has begun to expand its plans to build a viable competing line of hardware, acquiring companies such as server maker ZT Systems in its quest to achieve that goal. But to build a successful line of chips also requires a powerful set of software to efficiently run the programs built by AI developers. AMD has acquired several small software companies in recent weeks in a bid to boost its talent, and it has been working to beef up its set of software, broadly known as ROCm. "This will be a very thoughtful, deliberate, multi-generational journey for us ," said Vamsi Boppana, senior vice president of AI at AMD. AMD has committed to improve its ROCm and other software, which is a boon to customers such as AI enterprise startup Cohere, as it results in speedy changes and the addition of new features. Cohere is focused on building AI models that are tailored for large businesses versus the foundational AI models that companies like OpenAI and others target. AMD has made important strides in improving its software, Cohere CEO Aidan Gomez said in an interview with Reuters. Changing Cohere's software to run on AMD chips was a process that previously took weeks and now happens in only "days," Gomez said. Gomez declined to disclose exactly how much of Cohere's software relies on AMD chips but called it a "meaningful segment of our compute base" around the world. OPENAI INFLUENCE OpenAI has had significant influence on the design of the forthcoming MI450 series of AI chips, said Forrest Norrod, an executive vice president at AMD. AMD's MI400 series of chips will be the basis for a new server called "Helios" that the company plans to release next year. Nvidia too has engineered whole servers in part because AI computations require hundreds or thousands of chips strung together. OpenAI's Sam Altman appeared on stage at AMD's Thursday event in San Jose, and discussed the partnership between the two companies in broad terms. Norrod said that OpenAI's requests had a big influence on how AMD designed the MI450 series memory architecture and how the hardware can scale up to thousands of chips necessary to build and run AI applications. The ChatGPT creator also influenced what kinds of mathematical operations the chips are optimized for. "(OpenAI) has given us a lot of feedback that, I think, heavily informed our design," Norrod said.

AMD unveils next-gen AI chips as it takes on Nvidia: 'For the first time, we...' CEO Lisa Su says
AMD unveils next-gen AI chips as it takes on Nvidia: 'For the first time, we...' CEO Lisa Su says

Time of India

time2 hours ago

  • Time of India

AMD unveils next-gen AI chips as it takes on Nvidia: 'For the first time, we...' CEO Lisa Su says

AMD has unveiled details about its Instinct MI400 series , the company's next generation of AI chips set to ship next year. CEO Lisa Su presented the chips at a launch event in San Jose, California, emphasising their design for "rack-scale" systems crucial for powering the massive AI computations of today and tomorrow. Su also claimed that AMD's MI355X can outperform Nvidia 's Blackwell chips. The MI400 series chips are designed to be assembled into a full server rack, dubbed Helios, which AMD described as a unified system capable of tying thousands of chips together, as per a report by CNBC. "For the first time, we architected every part of the rack as a unified system," Su explained, highlighting Helios as "really a rack that functions like a single, massive compute engine." OpenAI CEO Sam Altman endorses AMD chips A significant endorsement came from OpenAI CEO Sam Altman, who appeared on stage with Su. Altman expressed confidence in the new chips "When you first started telling me about the specs, I was like, there's no way, that just sounds totally crazy. It's gonna be an amazing thing," said Altman, whose company is a customer of Nvidia chips. OpenAI also confirmed it will be integrating the AMD chips. What is different in AMD's newest AI chips This "rack-scale" approach is vital for hyperscale AI clusters that span entire data centers, catering to the enormous power demands of cloud providers and developers of large language models, the report said. Su directly compared Helios to Nvidia's upcoming Vera Rubin racks, signaling AMD's intent to directly challenge its main rival. AMD's rack-scale technology aims to put its latest chips squarely in competition with Nvidia's Blackwell chips, which already offer configurations integrating 72 graphics-processing units. Nvidia currently holds a near-monopoly in the market for big data center GPUs, partly due to its early lead in developing essential AI software like CUDA. OpenAI, notably a significant Nvidia customer, has been providing feedback to AMD on its MI400 roadmap. AMD is positioning the MI400 chips, along with this year's MI355X chips, as a more cost-effective alternative to Nvidia's offerings. Su also claimed that AMD's MI355X can outperform Nvidia's Blackwell chips, despite Nvidia's proprietary CUDA software advantage, saying that AMD's "really strong hardware" and the "tremendous progress" made by open software frameworks. AI Masterclass for Students. Upskill Young Ones Today!– Join Now

From Scale AI to Meta's AI boss: Who is Alexandr Wang, the 28-year-old MIT dropout gunning for OpenAI?
From Scale AI to Meta's AI boss: Who is Alexandr Wang, the 28-year-old MIT dropout gunning for OpenAI?

Economic Times

time2 hours ago

  • Economic Times

From Scale AI to Meta's AI boss: Who is Alexandr Wang, the 28-year-old MIT dropout gunning for OpenAI?

Meta Platforms is investing $15 billion for a 49% stake in Scale AI, a data-labelling startup now valued at $29 billion. The deal, confirmed by both companies on Thursday, marks a strategic shift for Meta as it races to reclaim its edge in artificial intelligence. The main draw? Alexandr Wang. The 28-year-old MIT dropout and CEO of Scale AI will join Meta to lead its newly-formed superintelligence team. This unit is tasked with building systems that push beyond today's artificial intelligence capabilities—towards artificial superintelligence (ASI).'We will deepen the work we do together producing data for AI models and Alexandr Wang will join Meta to work on our superintelligence efforts,' Meta said in a statement, as reported by Meta will not take a seat on Scale AI's board, the deal will see a few of Scale's 1,500 employees join Wang at Meta. Wang will remain a board member at Scale. Wang's background is far from typical. Born in Los Alamos, New Mexico to Chinese immigrant physicists, he entered the tech world early. He worked at Quora before dropping out of MIT after his freshman year. In 2016, alongside Lucy Guo, he co-founded Scale AI via startup accelerator Y Combinator.'Long-term, we want to power any human-powered process for any company,' Wang told the YC blog in just 24, he became the world's youngest self-made billionaire. Though Guo exited the startup a few years later, Wang built Scale AI into a data backbone for many of the world's leading AI raised over $680 million, including $100 million from Peter Thiel's Founders Fund. Today, Forbes estimates his personal net worth at $3.6 billion.'Focus on building the business and then the rest will kind of take care of itself,' he told Business Insider in 2020. Wang has become a familiar face in Washington, frequently engaging with lawmakers on the national security implications of AI. In 2018, a visit to China convinced him that America's future in warfare would hinge on AI leadership. 'The race for AI global leadership is well underway, and our nation's ability to efficiently adopt and implement AI will define the future of warfare,' Wang said in public in 2016, Scale AI helps train frontier AI models by offering large volumes of labelled data. Its platforms—Remotasks and Outlier—enlist gig workers to annotate massive datasets. This labelled data is critical for training AI systems like ChatGPT. The company began by serving autonomous vehicle clients such as Toyota, Honda, and Waymo. It has since expanded to support OpenAI, Microsoft, and even the US government, which uses its services to analyse satellite imagery from Ukraine. Scale's revenues in 2024 hit $870 million and are projected to more than double to $2 billion in 2025. Bloomberg reports this would push its valuation to $25 the startup's rapid ascent hasn't been without controversy. Investigations have highlighted harsh working conditions for its offshore gig workforce, who are paid as little as $1 per hour. These workers are primarily based in countries such as Kenya, the Philippines, and isn't just an investment—it's a statement. With this deal, Meta is signalling a departure from the traditional research-led approach it once challenges, including high-profile exits and delayed model releases, have weighed on Meta's AI progress. The company's LLaMA open-source models were meant to disrupt the industry, but lukewarm adoption and team churn have slowed long-time AI chief, Yann LeCun, remains a key figure. Yet his scepticism about large language models (LLMs) as a path to artificial general intelligence (AGI) has reportedly diverged from mainstream Silicon Valley bringing in Wang—who built Scale into a billion-dollar business without a research pedigree—CEO Mark Zuckerberg is now betting on a different kind of leadership. A business mind like Sam Altman's, rather than a research is reportedly luring talent from OpenAI and Google with seven to nine-figure pay packages to staff its 50-person superintelligence lab.'This was a deeply unique moment': Wang steps into new roleIn a message to employees, Wang acknowledged the emotional weight of leaving Scale.'The idea of not being a Scalien was, frankly, unimaginable. But as I spent time truly considering it, I realized this was a deeply unique moment, not just for me, but for Scale as well,' he assured Scale's staff that proceeds from Meta's investment would go to shareholders and vested equity Meta, Wang will lead an ambitious mission: to build AI that not only catches up to its rivals but moves beyond them. Superintelligence remains a theoretical concept—but with Wang at the helm, Meta is making a $15 billion wager that it can become reality.

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into the world of global news and events? Download our app today from your preferred app store and start exploring.
app-storeplay-store