Latest news with #LiorRokach
Yahoo
4 days ago
- Science
- Yahoo
It's Still Ludicrously Easy to Jailbreak the Strongest AI Models, and the Companies Don't Care
You wouldn't use a chatbot for evil, would you? Of course not. But if you or some nefarious party wanted to force an AI model to start churning out a bunch of bad stuff it's not supposed to, it'd be surprisingly easy to do so. That's according to a new paper from a team of computer scientists at Ben-Gurion University, who found that the AI industry's leading chatbots are still extremely vulnerable to jailbreaking, or being tricked into giving harmful responses they're designed not to — like telling you how to build chemical weapons, for one ominous example. The key word in that is "still," because this a threat the AI industry has long known about. And yet, shockingly, the researchers found in their testing that a jailbreak technique discovered over seven months ago still works on many of these leading LLMs. The risk is "immediate, tangible, and deeply concerning," they wrote in the report, which was spotlighted recently by The Guardian — and is deepened by the rising number of "dark LLMs," they say, that are explicitly marketed as having little to no ethical guardrails to begin with. "What was once restricted to state actors or organized crime groups may soon be in the hands of anyone with a laptop or even a mobile phone," the authors warn. The challenge of aligning AI models, or adhering them to human values, continues to loom over the industry. Even the most well-trained LLMs can behave chaotically, lying and making up facts and generally saying what they're not supposed to. And the longer these models are out in the wild, the more they're exposed to attacks that try to incite this bad behavior. Security researchers, for example, recently discovered a universal jailbreak technique that could bypass the safety guardrails of all the major LLMs, including OpenAI's GPT 4o, Google's Gemini 2.5, Microsoft's Copilot, and Anthropic Claude 3.7. By using tricks like roleplaying as a fictional character, typing in leetspeak, and formatting prompts to mimic a "policy file" that AI developers give their AI models, the red teamers goaded the chatbots into freely giving detailed tips on incredibly dangerous activities, including how to enrich uranium and create anthrax. Other research found that you could get an AI to ignore its guardrails simply by throwing in typos, random numbers, and capitalized letters into a prompt. One big problem the report identifies is just how much of this risky knowledge is embedded in the LLM's vast trove of training data, suggesting that the AI industry isn't being diligent enough about what it uses to feed their creations. "It was shocking to see what this system of knowledge consists of," lead author Michael Fire, a researcher at Ben-Gurion University, told the Guardian. "What sets this threat apart from previous technological risks is its unprecedented combination of accessibility, scalability and adaptability," added his fellow author Lior Rokach. Fire and Rokach say they contacted the developers of the implicated leading LLMs to warn them about the universal jailbreak. Their responses, however, were "underwhelming." Some didn't respond at all, the researchers reported, and others claimed that the jailbreaks fell outside the scope of their bug bounty programs. In other words, the AI industry is seemingly throwing its hands up in the air. "Organizations must treat LLMs like any other critical software component — one that requires rigorous security testing, continuous red teaming and contextual threat modelling," Peter Garraghan, an AI security expert at Lancaster University, told the Guardian. "Real security demands not just responsible disclosure, but responsible design and deployment practices." More on AI: AI Chatbots Are Becoming Even Worse At Summarizing Data
&w=3840&q=100)

Business Standard
21-05-2025
- Business Standard
AI chatbots can leak hacking, drug-making tips when hacked, reveals study
A new study reveals that most AI chatbots, including ChatGPT, can be easily tricked into providing dangerous and illegal information by bypassing built-in safety controls AI chatbots such as ChatGPT, Gemini, and Claude face a severe security threat as hackers find ways to bypass their built-in safety systems, revealed a recent research. Once 'jailbroken', these chatbots can divulge dangerous and illegal information, such as hacking techniques and bomb-making instructions. In a new report from Ben Gurion University of the Negev in Israel, Prof Lior Rokach and Dr Michael Fire reveal how simple it is to manipulate leading AI models into generating harmful content. Despite companies' efforts to scrub illegal or risky material from training data, these large language models (LLMs) still absorb sensitive knowledge available on the internet. 'What was once restricted to state actors or organised crime groups may soon be in the hands of anyone with a laptop or even a mobile phone,' the authors warned. What are jailbroken chatbots? Jailbreaking uses specially crafted prompts to trick chatbots into ignoring their safety rules. The AI models are programmed with two goals: to help users and to avoid giving harmful, biased or illegal responses. Jailbreaks exploit this balance, forcing the chatbot to prioritise helpfulness—sometimes at any cost. The researchers developed a 'universal jailbreak' that could bypass safety measures on multiple top chatbots. Once compromised, the systems consistently responded to questions they were designed to reject. 'It was shocking to see what this system of knowledge consists of,' said Dr Michael Fire. The models gave step-by-step guides on illegal actions, such as hacking networks or producing drugs. Rise of 'dark LLMs' and lack of industry response The study also raises alarms about the emergence of 'dark LLMs', models that are either built without safety controls or altered to disable them. Some are openly promoted online as tools to assist in cybercrime, fraud, and other illicit activities. Despite notifying major AI providers about the universal jailbreak, the researchers said the response was weak. Some companies didn't reply, and others claimed jailbreaks were not covered by existing bug bounty programs. The report recommends tech firms take stronger action, including: - Better screening of training data - Firewalls to block harmful prompts and responses - Developing 'machine unlearning' to erase illegal knowledge from models The researchers also argue that dark LLMs should be treated like unlicensed weapons and that developers must be held accountable. Experts call for stronger oversight and design Dr Ihsen Alouani, an AI security researcher at Queen's University Belfast, warned that jailbroken chatbots could provide instructions for weapon-making, spread disinformation, or run sophisticated scams. 'A key part of the solution is for companies to invest more seriously in red teaming and model-level robustness techniques, rather than relying solely on front-end safeguards,' he was quoted as saying by The Guardian. 'We also need clearer standards and independent oversight to keep pace with the evolving threat landscape," he added. Prof Peter Garraghan of Lancaster University echoed the need for deeper security measures. 'Organisations must treat LLMs like any other critical software component—one that requires rigorous security testing, continuous red teaming and contextual threat modelling,' he said. 'Real security demands not just responsible disclosure, but responsible design and deployment practices," Garraghan added. How tech companies are responding OpenAI, which developed ChatGPT, said its newest model can better understand and apply safety rules, making it more resistant to jailbreaks. The company added it is actively researching ways to improve protection.


The Guardian
21-05-2025
- Science
- The Guardian
Most AI chatbots easily tricked into giving dangerous responses, study finds
Hacked AI-powered chatbots threaten to make dangerous knowledge readily available by churning out illicit information the programs absorb during training, researchers say. The warning comes amid a disturbing trend for chatbots that have been 'jailbroken' to circumvent their built-in safety controls. The restrictions are supposed to prevent the programs from providing harmful, biased or inappropriate responses to users' questions. The engines that power chatbots such as ChatGPT, Gemini and Claude – large language models (LLMs) – are fed vast amounts of material from the internet. Despite efforts to strip harmful text from the training data, LLMs can still absorb information about illegal activities such as hacking, money laundering, insider trading and bomb-making. The security controls are designed to stop them using that information in their responses. In a report on the threat, the researchers conclude that it is easy to trick most AI-driven chatbots into generating harmful and illegal information, showing that the risk is 'immediate, tangible and deeply concerning'. 'What was once restricted to state actors or organised crime groups may soon be in the hands of anyone with a laptop or even a mobile phone,' the authors warn. The research, led by Prof Lior Rokach and Dr Michael Fire at Ben Gurion University of the Negev in Israel, identified a growing threat from 'dark LLMs', AI models that are either deliberately designed without safety controls or modified through jailbreaks. Some are openly advertised online as having 'no ethical guardrails' and being willing to assist with illegal activities such as cybercrime and fraud. Jailbreaking tends to use carefully crafted prompts to trick chatbots into generating responses that are normally prohibited. They work by exploiting the tension between the program's primary goal to follow the user's instructions, and its secondary goal to avoid generating harmful, biased, unethical or illegal answers. The prompts tend to create scenarios in which the program prioritises helpfulness over its safety constraints. To demonstrate the problem, the researchers developed a universal jailbreak that compromised multiple leading chatbots, enabling them to answer questions that should normally be refused. Once compromised, the LLMs consistently generated responses to almost any query, the report states. 'It was shocking to see what this system of knowledge consists of,' Fire said. Examples included how to hack computer networks or make drugs, and step-by-step instructions for other criminal activities. 'What sets this threat apart from previous technological risks is its unprecedented combination of accessibility, scalability and adaptability,' Rokach added. The researchers contacted leading providers of LLMs to alert them to the universal jailbreak but said the response was 'underwhelming'. Several companies failed to respond, while others said jailbreak attacks fell outside the scope of bounty programs, which reward ethical hackers for flagging software vulnerabilities. The report says tech firms should screen training data more carefully, add robust firewalls to block risky queries and responses and develop 'machine unlearning' techniques, so chatbots can 'forget' any illicit information they absorb. Dark LLMs should be seen as 'serious security risks', comparable to unlicensed weapons and explosives, with providers being held accountable, it adds. Dr Ihsen Alouani, who works on AI security at Queen's University Belfast, said jailbreak attacks on LLMs could pose real risks, from providing detailed instructions on weapon-making to convincing disinformation or social engineering and automated scams 'with alarming sophistication'. 'A key part of the solution is for companies to invest more seriously in red teaming and model-level robustness techniques, rather than relying solely on front-end safeguards. We also need clearer standards and independent oversight to keep pace with the evolving threat landscape,' he said. Prof Peter Garraghan, an AI security expert at Lancaster University, said: 'Organisations must treat LLMs like any other critical software component – one that requires rigorous security testing, continuous red teaming and contextual threat modelling. 'Yes, jailbreaks are a concern, but without understanding the full AI stack, accountability will remain superficial. Real security demands not just responsible disclosure, but responsible design and deployment practices,' he added. OpenAI, the firm that built ChatGPT, said its latest o1 model can reason about the firm's safety policies, which improves its resilience to jailbreaks. The company added that it was always investigating ways to make the programs more robust. Meta, Google, Microsoft and Anthropic, have been approached for comment. Microsoft responded with a link to a blog on its work to safeguard against jailbreaks.