It's Still Ludicrously Easy to Jailbreak the Strongest AI Models, and the Companies Don't Care

27-05-2025

You wouldn't use a chatbot for evil, would you? Of course not. But if you or some nefarious party wanted to force an AI model to start churning out a bunch of bad stuff it's not supposed to, it'd be surprisingly easy to do so.
That's according to a new paper from a team of computer scientists at Ben-Gurion University, who found that the AI industry's leading chatbots are still extremely vulnerable to jailbreaking, or being tricked into giving harmful responses they're designed not to — like telling you how to build chemical weapons, for one ominous example.
The key word in that is "still," because this a threat the AI industry has long known about. And yet, shockingly, the researchers found in their testing that a jailbreak technique discovered over seven months ago still works on many of these leading LLMs.
The risk is "immediate, tangible, and deeply concerning," they wrote in the report, which was spotlighted recently by The Guardian — and is deepened by the rising number of "dark LLMs," they say, that are explicitly marketed as having little to no ethical guardrails to begin with.
"What was once restricted to state actors or organized crime groups may soon be in the hands of anyone with a laptop or even a mobile phone," the authors warn.
The challenge of aligning AI models, or adhering them to human values, continues to loom over the industry. Even the most well-trained LLMs can behave chaotically, lying and making up facts and generally saying what they're not supposed to. And the longer these models are out in the wild, the more they're exposed to attacks that try to incite this bad behavior.
Security researchers, for example, recently discovered a universal jailbreak technique that could bypass the safety guardrails of all the major LLMs, including OpenAI's GPT 4o, Google's Gemini 2.5, Microsoft's Copilot, and Anthropic Claude 3.7. By using tricks like roleplaying as a fictional character, typing in leetspeak, and formatting prompts to mimic a "policy file" that AI developers give their AI models, the red teamers goaded the chatbots into freely giving detailed tips on incredibly dangerous activities, including how to enrich uranium and create anthrax.
Other research found that you could get an AI to ignore its guardrails simply by throwing in typos, random numbers, and capitalized letters into a prompt.
One big problem the report identifies is just how much of this risky knowledge is embedded in the LLM's vast trove of training data, suggesting that the AI industry isn't being diligent enough about what it uses to feed their creations.
"It was shocking to see what this system of knowledge consists of," lead author Michael Fire, a researcher at Ben-Gurion University, told the Guardian.
"What sets this threat apart from previous technological risks is its unprecedented combination of accessibility, scalability and adaptability," added his fellow author Lior Rokach.
Fire and Rokach say they contacted the developers of the implicated leading LLMs to warn them about the universal jailbreak. Their responses, however, were "underwhelming." Some didn't respond at all, the researchers reported, and others claimed that the jailbreaks fell outside the scope of their bug bounty programs.
In other words, the AI industry is seemingly throwing its hands up in the air.
"Organizations must treat LLMs like any other critical software component — one that requires rigorous security testing, continuous red teaming and contextual threat modelling," Peter Garraghan, an AI security expert at Lancaster University, told the Guardian. "Real security demands not just responsible disclosure, but responsible design and deployment practices."
More on AI: AI Chatbots Are Becoming Even Worse At Summarizing Data

Hashtags

#Ben-GurionUniversity

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

OpenAI CEO reveals what it is about AI that keeps him awake at night

Digital Trends

18 minutes ago

Digital Trends

OpenAI CEO reveals what it is about AI that keeps him awake at night

The man leading one of the most prominent and most powerful AI companies on the planet has just revealed what it is about AI that keeps him awake at night. And after reading this, it might keep you awake, too. OpenAI CEO Sam Altman was sharing his thoughts during an on-stage appearance at a Federal Reserve event in Washington, DC., on Tuesday. Recommended Videos Asked by an audience member what it was about AI that keeps him awake at night, Altman listed three scenarios that concern him the most. Strap yourself in … Scenario 1 — 'The bad guy gets superintelligence first' This is exactly what it says on the tin. Sounding like something out of a sci-fi thriller, it's where some really rather unpleasant individual deploys an ultra-advanced and yet-to-be-invented AI system called superintelligence to really mess up your day. 'A bad guy gets superintelligence first and uses it before the rest of the world has a powerful enough version to defend itself,' Altman told the audience. 'So an adversary of the U.S. says, 'I'm going to use the superintelligence to design a bioweapon weapon to take down the United States power grid, to break into the financial system and take everyone's money.'' Altman added that the bio and cybersecurity capabilities of AI are getting 'quite significant,' saying that his team 'continues to flash the warning on this. I think the world is not taking us seriously. I don't know what else we can do there, but this is like a very big thing coming.' Scenario 2 — The 'loss of control' incidents Altman said he worries of a time 'when the AI is like, 'Oh I don't actually want you to turn me off, [or] I'm afraid I can't do that.'' In other words, when an advanced AI starts to get a bit of an attitude and begins doing whatever it likes, whether for self-preservation or for some other nefarious goal. The level of disruption that this could cause is unimaginable. 'As the systems become so powerful, that's a real concern,' OpenAI's CEO said. OpenAI even set up a unit a couple of years ago aimed at putting in safeguards to stop superintelligent AI system from going rogue. Scenario 3 — 'Where the models kind of accidentally take over the world' Yes, Altman actually said that, adding that he fears it could happen almost without us realizing. The Open AI boss said it was 'quite scary' to think that AI systems could become 'so ingrained in society … [that we] can't really understand what they're doing, but we do kind of have to rely on them. And even without a drop of malevolence from anyone, society can just veer in a sort of strange direction.' He even suggested that there might come a time where AI becomes so smart that a future U.S. president could let it run the country, saying: '[It would mean] that society has collectively transitioned a significant part of decision making to this very powerful system that's learning from us, improving with us, evolving with us, but in ways we don't totally understand.' Speaking about AI more broadly, Altman said that while a lot of experts claim to be able to predict the future impact of the technology, he believes it's 'very hard to predict' because it's 'too complex of a system, this is too new and impactful of a technology.' Of course, there's no guarantee that any of these scenarios will come to pass, and it's good that someone in Altman's position is speaking so honestly about the technology. But what is clear is that there are lot of unknown unknowns when it comes to AI, and it's that which is making some people more than a little nervous. Sleep well.

OpenAI CEO Sam Altman warns of an AI ‘fraud crisis'

CNN

41 minutes ago

OpenAI CEO Sam Altman warns of an AI ‘fraud crisis'

OpenAI CEO Sam Altman says the world may be on the precipice of a 'fraud crisis' because of how artificial intelligence could enable bad actors to impersonate other people. 'A thing that terrifies me is apparently there are still some financial institutions that will accept a voice print as authentication for you to move a lot of money or do something else — you say a challenge phrase, and they just do it,' Altman said. 'That is a crazy thing to still be doing… AI has fully defeated most of the ways that people authenticate currently, other than passwords.' The comments were part of his wide-ranging interview about the economic and societal impacts of AI at the Federal Reserve on Tuesday. He also told the audience, which included, representatives of large US financial institutions, about the role he expects AI to play in the economy. His appearance comes as the White House is expected to release its 'AI Action Plan' in the coming days, a policy document to outline its approach to regulating the technology and promoting America's dominance in the AI space. OpenAI, which provided recommendations for the plan, has ramped up its presence on and around Capitol Hill in recent months. On Tuesday, the company confirmed it will open its first Washington, DC, office early next year to house its approximately 30-person workforce in the city. Chan Park, OpenAI's head of global affairs for the US and Canada, will lead the new office alongside Joe Larson, who is leaving defense technology company Anduril to become OpenAI's vice president of government. The company will use the space to host policymakers, preview new technology, and provide AI trainings, for example, to teachers and government officials. It will also house research into AI's economic impact and how to improve access to the technology. Despite Altman's warnings about the technology's risks, OpenAI has urged the Trump administration to avoid regulation it says could hamper tech companies' ability to compete with foreign AI innovations. Earlier this month, the US Senate voted to strike a controversial provision from Trump's agenda bill that would have prevented states from enforcing AI-related laws for 10 years. Altman isn't alone in worrying that AI will supercharge fraud. The FBI warned about these AI voice and video 'cloning' scams last year. Multiple parents have reported that AI voice technology was used in attempts to trick them out of money by convincing them that their children were in trouble. And earlier this month, US officials warned that someone using AI to impersonate Secretary of State Marco Rubio's voice had contacted foreign ministers, a US governor and a member of Congress. 'I am very nervous that we have an impending, significant, impending fraud crisis,' Altman said. 'Right now, it's a voice call; soon it's going be a video or FaceTime that's indistinguishable from reality,' Altman said. He warned that while his company isn't building such impersonation tools, it's a challenge the world will soon need to confront as AI continues to evolve. Altman is backing a tool called The Orb, built by Tools for Humanity, that says it will offer 'proof of human' in a world where AI makes it harder to distinguish what, and who, is real online. Altman also explained what keeps him up at night: the idea of bad actors making and misusing AI 'superintelligence' before the rest of the world has advanced enough to defend against such an attack — for example, a US adversary using AI to target the American power grid or create a bioweapon. That comment could speak to fears within the White House and elsewhere on Capitol Hill about China outpacing US tech companies on AI. Altman also said he worries about the prospect of humans losing control of a superintelligent AI system, or giving the technology too much decision-making power. Various tech companies, including OpenAI, are chasing AI superintelligence — and Altman has said he thinks the 2030s could bring AI intelligence far beyond what humans are capable of — but it remains unclear how exactly they define that milestone and when, if ever, they'll reach it. But Altman said he's not as worried as some of his peers in Silicon Valley about AI's potential impact on the workforce, after leaders such as Anthropic CEO Dario Amodei and Amazon CEO Andy Jassy have warned the technology will take jobs. Instead, Altman believes that 'no one knows what happens next.' 'There's a lot of these really smart-sounding predictions,' he said, ''Oh, this is going to happen on this and the economy over here.' No one knows that. In my opinion, this is too complex of a system, this is too new and impactful of a technology, it's very hard to predict.' Still, he does have some thoughts. He said that while 'entire classes of jobs will go away,' new types of work will emerge. And Altman repeated a prediction he's made previously that if the world could look forward 100 years, the workers of the future probably won't have what workers today consider 'real jobs.' 'You have everything you could possibly need. You have nothing to do,' Altman said of the future workforce. 'So, you're making up a job to play a silly status game and to fill your time and to feel useful to other people.' The sentiment seems to be that Altman thinks we shouldn't worry about AI taking jobs because, in the future, we won't really need jobs anyway, although he didn't detail how the future AI tools would, for example, reliably argue a case in court or clean someone's teeth or construct a house. In conjunction with Altman's speech, OpenAI released a report compiled by its chief economist, Ronnie Chatterji, outlining ChatGPT's productivity benefits for workers. In the report, Chatterji — who joined OpenAI as its first chief economist after serving as coordinator of the CHIPS and Science Act in the Biden White House — compared AI to transformative technologies such as electricity and the transistor. He said ChatGPT now has 500 million users globally. Among US users, 20% use ChatGPT as a 'personalized tutor' for 'learning and upskilling,' according to the report, although it didn't elaborate on what kinds of things people are learning through the service. Chatterji also noted that more than half of ChatGPT's American users are between the ages of 18 and 34, 'suggesting that there may be long-term economic benefits as they continue to use AI tools in the workplace going forward.' Over the next year, Chatterji plans to work with economists Jason Furman and Michael Strain on a longer study of AI's impact on jobs and the US workforce. That work will take place in the new Washington, DC, office.

Yahoo

an hour ago

Yahoo

Meta's quest to dominate the AI world

Facebook parent Meta (META) is spending billions of dollars to grab the lead in the AI race, building out the data centers needed to develop and power high-end large language models. And now, the company is pouring billions more into snatching up top talent and technologies to grab the lead in the AI wars. In its latest hiring coup, Meta has poached three OpenAI ( researchers — Google DeepMind alums Lucas Beyer, Alexander Kolesnikov, and Xiaohua Zhai — from its Zurich office, The Wall Street Journal reported late Wednesday. OpenAI CEO Sam Altman claims that Meta CEO Mark Zuckerberg has offered the company's employees upward of $100 million to join his AI efforts. Meanwhile, Meta is reportedly in talks to bag Safe Superintelligence CEO Daniel Gross and former Github CEO Nat Friedman for its planned superintelligence lab. Zuckerberg was initially eyeing all of Safe Superintelligence, but was shot down by co-founder Ilya Sutskever, according to CNBC. The moves come after Meta invested $14.3 billion in AI startup Scale AI and hired its CEO and co-founder Alexandr Wang. That's not all: Meta also wanted to buy Perplexity AI ( but couldn't come to terms on a deal. 'Meta is doing this because they want to win the AI race, period,' Forrester analyst Mike Proulx told Yahoo Finance. 'AI is everything right now.' All of this follows Meta's decision to postpone the debut of its massive Llama 4 Behemoth AI model. According to The Wall Street Journal, the company won't launch the model until later this fall over concerns that it isn't a big enough upgrade over prior models. 'I think this is two things: No. 1 [is] confirmation that Llama is struggling,' Deepwater Asset Management managing partner Gene Munster told Yahoo Finance. 'And second, is [it's] also a sign that Zuckerberg is not OK with that.' For Meta, the goal is clear: bring in as much fresh talent as possible to push its AI program forward and take the lead in the AI wars. Meta's AI moves Meta's effort to rule the AI world differs from its chief rivals, OpenAI, Google, xAI, and others. Rather than closing off its AI models, the company offers them as open-source software that developers and companies can use on their own. Meta imposes some restrictions on how users can take advantage of its models. For instance, the company requires firms to request a license from Meta if their product has more than 700 million monthly active users. Regardless, Meta's ultimate goal is to get as many people as it's comfortable with using and developing products via its AI models. Why not charge everyone who wants to access its software? Because Meta benefits every time a company alters its models, giving it greater insights into how it can improve them down the line. Meta isn't terribly interested in selling access to its models, either. The company primarily uses its AI to power its advertising and content recommendation services, unlike, say, Microsoft (MSFT), which sells its AI services as part of its productivity software packages, among other things. Meta CFO Susan Li told investors during the company's most recent earnings call that it saw a 4% increase in user time spent on the Threads app since introducing Llama to its recommendation systems at the end of last year. Meta is also leaning on its AI models to provide the intelligence for its hardware products including its Ray-Ban Meta smartglasses and other future AI-powered devices. 'Why Meta is making these moves is that they've got a ton of money, and so with that money, they are in a good position to, if they can't build it themselves, acquire the talent and capabilities necessary to …leapfrog the competition,' Proulx explained. But Meta isn't the only company circling Silicon Valley's AI upstarts. Apple is also reportedly looking into bagging its own AI company as the iPhone maker looks to improve its own AI fortunes. Apple was supposed to release an AI-powered version of Siri earlier this year, but has pushed the rollout until later this year as it contends with its own development difficulties. To that end, Apple has also discussed purchasing Perplexity AI, according to Bloomberg. While a spokesperson for Perplexity said the company has no knowledge of current or future M&A discussions, they added, 'It shouldn't be a surprise that the best OEMs in the world want to offer the best search and most accurate AI for their users, and that's Perplexity.' Apple rival Samsung is also reportedly looking to add Perplexity to its devices. For Meta, it will all come down to whether it can woo the right people from the right AI firms to join its AI push and catapult the company into the pole position in the AI race. If it can't, the social media giant could find itself falling further without a clear path forward. Email Daniel Howley at dhowley@ Follow him on X/Twitter at @DanielHowley. Error in retrieving data Sign in to access your portfolio Error in retrieving data Error in retrieving data Error in retrieving data Error in retrieving data