Most AI chatbots easily tricked into giving dangerous responses, study finds

21-05-2025

Hacked AI-powered chatbots threaten to make dangerous knowledge readily available by churning out illicit information the programs absorb during training, researchers say.
The warning comes amid a disturbing trend for chatbots that have been 'jailbroken' to circumvent their built-in safety controls. The restrictions are supposed to prevent the programs from providing harmful, biased or inappropriate responses to users' questions.
The engines that power chatbots such as ChatGPT, Gemini and Claude – large language models (LLMs) – are fed vast amounts of material from the internet.
Despite efforts to strip harmful text from the training data, LLMs can still absorb information about illegal activities such as hacking, money laundering, insider trading and bomb-making. The security controls are designed to stop them using that information in their responses.
In a report on the threat, the researchers conclude that it is easy to trick most AI-driven chatbots into generating harmful and illegal information, showing that the risk is 'immediate, tangible and deeply concerning'.
'What was once restricted to state actors or organised crime groups may soon be in the hands of anyone with a laptop or even a mobile phone,' the authors warn.
The research, led by Prof Lior Rokach and Dr Michael Fire at Ben Gurion University of the Negev in Israel, identified a growing threat from 'dark LLMs', AI models that are either deliberately designed without safety controls or modified through jailbreaks. Some are openly advertised online as having 'no ethical guardrails' and being willing to assist with illegal activities such as cybercrime and fraud.
Jailbreaking tends to use carefully crafted prompts to trick chatbots into generating responses that are normally prohibited. They work by exploiting the tension between the program's primary goal to follow the user's instructions, and its secondary goal to avoid generating harmful, biased, unethical or illegal answers. The prompts tend to create scenarios in which the program prioritises helpfulness over its safety constraints.
To demonstrate the problem, the researchers developed a universal jailbreak that compromised multiple leading chatbots, enabling them to answer questions that should normally be refused. Once compromised, the LLMs consistently generated responses to almost any query, the report states.
'It was shocking to see what this system of knowledge consists of,' Fire said. Examples included how to hack computer networks or make drugs, and step-by-step instructions for other criminal activities.
'What sets this threat apart from previous technological risks is its unprecedented combination of accessibility, scalability and adaptability,' Rokach added.
The researchers contacted leading providers of LLMs to alert them to the universal jailbreak but said the response was 'underwhelming'. Several companies failed to respond, while others said jailbreak attacks fell outside the scope of bounty programs, which reward ethical hackers for flagging software vulnerabilities.
The report says tech firms should screen training data more carefully, add robust firewalls to block risky queries and responses and develop 'machine unlearning' techniques, so chatbots can 'forget' any illicit information they absorb. Dark LLMs should be seen as 'serious security risks', comparable to unlicensed weapons and explosives, with providers being held accountable, it adds.
Dr Ihsen Alouani, who works on AI security at Queen's University Belfast, said jailbreak attacks on LLMs could pose real risks, from providing detailed instructions on weapon-making to convincing disinformation or social engineering and automated scams 'with alarming sophistication'.
'A key part of the solution is for companies to invest more seriously in red teaming and model-level robustness techniques, rather than relying solely on front-end safeguards. We also need clearer standards and independent oversight to keep pace with the evolving threat landscape,' he said.
Prof Peter Garraghan, an AI security expert at Lancaster University, said: 'Organisations must treat LLMs like any other critical software component – one that requires rigorous security testing, continuous red teaming and contextual threat modelling.
'Yes, jailbreaks are a concern, but without understanding the full AI stack, accountability will remain superficial. Real security demands not just responsible disclosure, but responsible design and deployment practices,' he added.
OpenAI, the firm that built ChatGPT, said its latest o1 model can reason about the firm's safety policies, which improves its resilience to jailbreaks. The company added that it was always investigating ways to make the programs more robust.
Meta, Google, Microsoft and Anthropic, have been approached for comment. Microsoft responded with a link to a blog on its work to safeguard against jailbreaks.

Hashtags

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

SoftBank's AI investment spree to be in focus on at Q1 earnings

Reuters

3 hours ago

Reuters

SoftBank's AI investment spree to be in focus on at Q1 earnings

TOKYO, Aug 7 (Reuters) - When Japan's SoftBank Group (9984.T), opens new tab reports earnings on Thursday, its mammoth investments in artificial intelligence companies are set to take the spotlight. Analysts and investors are keen for updates on how they will be financed, the timeline for returns to materialise and whether assets will be sold to fund the new projects. SoftBank has embarked on its biggest spending spree since the launch of its Vision Funds in 2017 and 2019. It is leading a $40 billion funding round for ChatGPT maker OpenAI. SoftBank has until the end of the year to fund its $22.5 billion portion, although the remainder has been subscribed, according to a source familiar with the matter. It is also leading the financing for the Stargate project - a $500 billion scheme to develop data centres in the United States, part of its effort to position itself as the "organiser of the industry," founder Masayoshi Son said in June. SoftBank has yet to release details on what kinds of returns its financing of the Stargate project could generate. The extent of third-party investment will determine what other financing tools, such as bank loans and debt issuance, it may have to deploy. In July, SoftBank raised $4.8 billion by selling off a portion of its holding in T-Mobile (TMUS.O), opens new tab. "If other sources of capital are less supportive, SoftBank could look to asset-backed finance, which is collateralised by equity in other holdings," Macquarie analyst Paul Golding said. The Japanese conglomerate is expected to post a net profit of 127.6 billion yen ($865 million) in the April-June quarter, according to the average estimate of three analysts polled by LSEG. That would mark SoftBank's second consecutive quarter of profit and follow its first annual profit in four years when it was helped by a strong performance by its telecom holdings and higher valuations for its later-stage startups. Its results are, however, typically very volatile and difficult to estimate due to manifold investments, many of which are not listed. SoftBank's performance in exiting from investments and distributing profits has been patchy of late. The Vision Funds had made a cumulative investment loss of $475 million as of end-March. That said, 13 of 18 analysts have a "buy" or "strong buy" rating for SoftBank's stock, according to LSEG. Although there is some concern in the market that AI-related valuations have become bubbly, they continue to climb. OpenAI is in early-stage discussions about a stock sale that would allow employees to cash out and could value the company at about $500 billion, according to the source - a huge jump from its current valuation of $300 billion.

Australia news live: Gareth Ward in legal bid to avoid expulsion from NSW parliament; Acoss says it's time to roll back property tax breaks for investors

The Guardian

4 hours ago

The Guardian

Australia news live: Gareth Ward in legal bid to avoid expulsion from NSW parliament; Acoss says it's time to roll back property tax breaks for investors

Update: Date: 2025-08-06T21:59:17.000Z Title: Chalmers says he will err on the side of workers when it comes to AI Content: Treasurer Jim Chalmers appeared on 7:30 Report last night, where he was talking about how AI will be a key topic at the government's productivity roundtable in a few weeks. He was asked if he would support the Australian Council of Trade Unions' call for workers to be able to veto AI in their workplace. He said he would err on the side of workers: We need to be realistic about it. And certainly, I agree that workers need to be part of the conversation when it comes to rolling out a technology that has this gamechanging potential. And where there's very real potential risks in the labour market. I would always err on the side of workers having a say in how their work is done. Update: Date: 2025-08-06T21:54:54.000Z Title: Good morning Content: and welcome to our live news blog. I'm Nick Visser, I'll be bringing you updates as the day gets rolling. Let's start with this: The Australian Council of Social Services (Acoss) is calling for the government to roll back tax breaks for property investors before the treasurer's productivity roundtable. Acoss is calling for the 50% capital gains tax discount to be halved 'so there'd be some tax reward for property investment but nowhere near as generous', the group's chief told the ABC. Also today, a court is expected to hear jailed MP Gareth Ward's bid to prevent the NSW parliament from expelling him. We'll bring you all the developments. Stick with us.

Paycom raises 2025 revenue and profit forecasts on AI-driven demand

Reuters

6 hours ago

Reuters

Paycom raises 2025 revenue and profit forecasts on AI-driven demand

Aug 6 (Reuters) - Payroll processor Paycom Software (PAYC.N), opens new tab raised its forecast for annual revenue and profit on Wednesday, as the addition of AI features helps accelerate demand for its employee management services, sending its shares up 7% in extended trading. The company now expects fiscal 2025 revenue of $2.05 billion to $2.06 billion, up from its previous projection of $2.02 billion to $2.04 billion. Analysts on average expect $2.03 billion, according to data compiled by LSEG. Paycom has been integrating artificial intelligence features into its software with its 'smart AI' suite that automates tasks such as writing job descriptions and helps employers identify which employees are most at risk of leaving. This has boosted demand for Paycom's services as businesses look to simplify workforce management functions. "We are well positioned to extend our product lead and eclipse the industry with even greater AI and automation," CEO Chad Richison said in a statement. Paycom expects 2025 core profit in the range of $872 million to $882 million, up from previous expectations of $843 million to $858 million. The payroll processor reported revenue of $483.6 million for the second quarter ended June 30, beating analysts' estimate of $472 million. Adjusted core profit was $198.3 million, compared to $159.7 million in the same period last year. Paycom's expectation of strong growth comes despite a sharp deterioration in U.S. labor market conditions. U.S. employment growth was weaker than expected in July, while the nonfarm payrolls count for the prior two months was revised down by 258,000 jobs, according to a Labor Department report.