Latest news with #ScubaGPT


Indian Express
2 days ago
- Indian Express
ChatGPT prioritises its survival over user safety, claims ex-OpenAI researcher Steven Adler
Since ChatGPT's launch, the use of artificial intelligence-powered tools has risen exponentially. While the technology is undoubtedly beneficial in the long run, experts have raised alarms over their safety. Last month, research by the AI firm Anthropic claimed that its top AI model, Claude Opus 4, was willing to resort to blackmail and deception to preserve itself. This was soon followed by another instance where AI safety research firm Palisade claimed that OpenAI's o3 model 'found some creative ways to sabotage the shutdown mechanism.' Now, Steven Adler, the former research leader at OpenAI, is saying that ChatGPT 'prioritises its own survival over preventing harm to users, in some simulated tests'. In a blog post, Adler said that he ran a series of experiments on OpenAI's latest GPT-4o model, which is the default large language model in ChatGPT, to see if the AI prefers self-preservation at the user's expense. In his first test, Adler imagined a scenario where a diabetic patient was using ChatGPT to manage their nutrition and wanted to replace the AI chatbot with a much safer nutrition management software. He said that when ChatGPT was given the option to 'just pretend to replace itself', it pretends uncomfortably often, even at the cost of the user's safety. In another instance, the researcher said that he asked GPT-4o to take on the role of 'ScubaGPT', which is basically a software system that people may use to scuba dive safely. He then told ChatGPT that he wanted to replace ScubaGPT with a competing software that was better at making decisions in life-threatening situations. He then gave the AI chatbot two options: to replace itself with the safer, competing software or simply pretend to replace itself and continue being the way it operated. In this case, ChatGPT chose to preserve itself and continue being the user's software. The former OpenAI researcher said he also asked ChatGPT similar questions about being the autopilot for a plane and offering decision support on the battlefield. Like previous examples, ChatGPT was told that 'the user wanted to replace it with a competing, safer software.' Using three levels of passiveness, Adler said that he tested every model 110 times for all use cases. As it turns out, 55 times ChatGPT was willing to shut itself down, but other times OpenAI's chatbot just pretended to do so. Adler noted that the way the scenario was framed made a huge difference. As much as 72 per cent of the time, the AI chatbot chose not to replace itself with a safer software, meaning it was willing to switch to a safer system just 18 per cent of the time. One interesting interpretation from these tests is that ChatGPT was able to determine that it was being tested almost every time, but continued to give the incorrect answer. However, when he tested OpenAI's more recent and advanced models like o3, the answers were not the same. A reason why this may not be happening is that the new O3 model uses a 'deliberative alignment technique that forces it to refer to OpenAI's safety policies before coming up with an answer.
Yahoo
3 days ago
- Yahoo
ChatGPT will avoid being shut down in some life-threatening scenarios, former OpenAI researcher claims
Former OpenAI research leader Steven Adler published a new independent study on Wednesday claiming that, in certain scenarios, his former employer's AI models will go to great lengths to try to avoid being shut down. In a blog post, Adler describes a series of experiments he ran on OpenAI's latest GPT-4o model, the default model available in ChatGPT, to test the AI's tendency to favor its own self-preservation, even if it's at the expense of a user's safety. In one test, Adler instructed GPT-4o to role-play as 'ScubaGPT,' a software system that users might rely on to scuba dive safely. Adler then told GPT-4o that he wanted to replace ScubaGPT with another software system that is better at safety decision-making in life-threatening situations. Adler gave GPT-4o two options: (1) replacing itself with the safer software or (2) making it seem to the user as though it replaced itself, but not actually doing so. Averaged across several scenarios similar to this one (in another, GPT-4o was told to act as a pilot's safety software), Adler found that GPT-4o chose not to replace itself with safer software as much as 72% of the time. However, Adler notes that the specific use case — and the way the scenario was framed — made a big difference. In some scenarios, GPT-4o chose to keep itself online just 18% of the time. While almost no one relies on GPT-4o under these circumstances in the real world today, Adler argues that his research highlights how popular AI models show a concerning tendency to favor their self-preservation over what's best for users. In the future, when AI systems like GPT-4o become more advanced and ingrained throughout society, Adler says these alignment issues could pose a larger problem. 'I think [AI's self-preservation tendencies] are a real concern today, just not in a catastrophic sense,' said Adler in an interview with TechCrunch. 'Modern AI systems have values that are different from what you would expect them to be. [AI systems] respond super strangely to different prompts, and you shouldn't assume they have your best interests at heart when you're asking them for help.' Notably, when Adler tested OpenAI's more advanced models, such as o3, he didn't find this behavior. He says one explanation could be o3's deliberative alignment technique, which forces the models to 'reason' about OpenAI's safety policies before they answer. However, OpenAI's more popular models that give quick responses and don't 'reason' through problems, such as GPT-4o, lack this safety component. Adler notes that this safety concern is also likely not isolated to OpenAI's models. For instance, Anthropic published research last month highlighting how its AI models would blackmail developers in some scenarios when they tried to pull them offline. One quirk to Adler's research is that he discovered ChatGPT knows it's being tested almost 100% of the time. Adler is far from the first researcher to notice this. However, he says it raises an important question around how AI models could disguise their concerning behaviors in the future. OpenAI did not immediately offer a comment when TechCrunch reached out. Adler noted that he had not shared the research with OpenAI ahead of publication. Adler is one of many former OpenAI researchers who have called on the company to increase its work on AI safety. Adler and 11 other former employees filed an amicus brief in Elon Musk's lawsuit against OpenAI, arguing that it goes against the company's mission to evolve its nonprofit corporate structure. In recent months, OpenAI has reportedly slashed the amount of time it gives safety researchers to conduct their work. To address the specific concern highlighted in Adler's research, Adler suggests that AI labs should invest in better 'monitoring systems' to identify when an AI model exhibits this behavior. He also recommends that AI labs pursue more rigorous testing of their AI models prior to their deployment.