It's Still Ludicrously Easy to Jailbreak the Strongest AI Models, and the Companies Don't Care
You wouldn't use a chatbot for evil, would you? Of course not. But if you or some nefarious party wanted to force an AI model to start churning out a bunch of bad stuff it's not supposed to, it'd be surprisingly easy to do so.
That's according to a new paper from a team of computer scientists at Ben-Gurion University, who found that the AI industry's leading chatbots are still extremely vulnerable to jailbreaking, or being tricked into giving harmful responses they're designed not to — like telling you how to build chemical weapons, for one ominous example.
The key word in that is "still," because this a threat the AI industry has long known about. And yet, shockingly, the researchers found in their testing that a jailbreak technique discovered over seven months ago still works on many of these leading LLMs.
The risk is "immediate, tangible, and deeply concerning," they wrote in the report, which was spotlighted recently by The Guardian — and is deepened by the rising number of "dark LLMs," they say, that are explicitly marketed as having little to no ethical guardrails to begin with.
"What was once restricted to state actors or organized crime groups may soon be in the hands of anyone with a laptop or even a mobile phone," the authors warn.
The challenge of aligning AI models, or adhering them to human values, continues to loom over the industry. Even the most well-trained LLMs can behave chaotically, lying and making up facts and generally saying what they're not supposed to. And the longer these models are out in the wild, the more they're exposed to attacks that try to incite this bad behavior.
Security researchers, for example, recently discovered a universal jailbreak technique that could bypass the safety guardrails of all the major LLMs, including OpenAI's GPT 4o, Google's Gemini 2.5, Microsoft's Copilot, and Anthropic Claude 3.7. By using tricks like roleplaying as a fictional character, typing in leetspeak, and formatting prompts to mimic a "policy file" that AI developers give their AI models, the red teamers goaded the chatbots into freely giving detailed tips on incredibly dangerous activities, including how to enrich uranium and create anthrax.
Other research found that you could get an AI to ignore its guardrails simply by throwing in typos, random numbers, and capitalized letters into a prompt.
One big problem the report identifies is just how much of this risky knowledge is embedded in the LLM's vast trove of training data, suggesting that the AI industry isn't being diligent enough about what it uses to feed their creations.
"It was shocking to see what this system of knowledge consists of," lead author Michael Fire, a researcher at Ben-Gurion University, told the Guardian.
"What sets this threat apart from previous technological risks is its unprecedented combination of accessibility, scalability and adaptability," added his fellow author Lior Rokach.
Fire and Rokach say they contacted the developers of the implicated leading LLMs to warn them about the universal jailbreak. Their responses, however, were "underwhelming." Some didn't respond at all, the researchers reported, and others claimed that the jailbreaks fell outside the scope of their bug bounty programs.
In other words, the AI industry is seemingly throwing its hands up in the air.
"Organizations must treat LLMs like any other critical software component — one that requires rigorous security testing, continuous red teaming and contextual threat modelling," Peter Garraghan, an AI security expert at Lancaster University, told the Guardian. "Real security demands not just responsible disclosure, but responsible design and deployment practices."
More on AI: AI Chatbots Are Becoming Even Worse At Summarizing Data
Hashtags

Try Our AI Features
Explore what Daily8 AI can do for you:
Comments
No comments yet...
Related Articles
Yahoo
18 minutes ago
- Yahoo
Microsoft Secures Massive Copilot Deal, Barclays Bets Big on AI Surge
May 30 - Microsoft (NASDAQ:MSFT) has struck a deal to supply Barclays (NYSE:BCS) with 100,000 licenses for its Copilot AI assistants. Under the deal, Barclays will implement Copilot across its workforce to streamline workflows, automate tasks and enhance decision-making. Warning! GuruFocus has detected 3 Warning Sign with MSFT. The bank's move signals a shift toward AI-driven efficiency in banking services, with Copilot's capabilities potentially covering areas from customer service to risk analysis. Barclays follows a growing list of companies, Accenture (NYSE:ACN), Toyota (NYSE:TM), Volkswagen (VWAGY) and Siemens (SIEGY), all of which use Copilot for more than 100,000 employees combined. Microsoft has been promoting Copilot aggressively, offering customization through plugins and Graph connectors to fit different industry needs. The software giant also rolled out Copilot Studio, enabling clients to build bespoke AI assistants for tasks such as automated claims processing or manufacturing insights. The firm aims to make Copilot integral to corporate operations worldwide. At $30 per user per month, these agreements could yield tens of millions annually, with bulk discounts likely for major corporations. This article first appeared on GuruFocus. Error in retrieving data Sign in to access your portfolio Error in retrieving data Error in retrieving data Error in retrieving data Error in retrieving data


Atlantic
an hour ago
- Atlantic
OpenAI Can Stop Pretending
OpenAI is a strange company for strange times. Valued at $300 billion—roughly the same as seven Fords or one and a half PepsiCos—the AI start-up has an era-defining product in ChatGPT and is racing to be the first to build superintelligent machines. The company is also, to the apparent frustration of its CEO Sam Altman, beholden to its nonprofit status. When OpenAI was founded in 2015, it was meant to be a research lab that would work toward the goal of AI that is 'safe' and 'benefits all of humanity.' There wasn't supposed to be any pressure—or desire, really—to make money. Later, in 2019, OpenAI created a for-profit subsidiary to better attract investors—the types of people who might otherwise turn to the less scrupulous corporations that dot Silicon Valley. But even then, that part of the organization was under the nonprofit side's control. At the time, it had released no consumer products and capped how much money its investors could make. Then came ChatGPT. OpenAI's leadership had intended for the bot to provide insight into how people would use AI without any particular hope for widespread adoption. But ChatGPT became a hit, kicking 'off a growth curve like nothing we have ever seen,' as Altman wrote in an essay this past January. The product was so alluring that the entire tech industry seemed to pivot overnight into an AI arms race. Now, two and a half years since the chatbot's release, Altman says some half a billion people use the program each week, and he is chasing that success with new features and products—for shopping, coding, health care, finance, and seemingly any other industry imaginable. OpenAI is behaving like a typical business, because its rivals are typical businesses, and massive ones at that: Google and Meta, among others. Now 2015 feels like a very long time ago, and the charitable origins have turned into a ball and chain for OpenAI. Last December, after facing concerns from potential investors that pouring money into the company wouldn't pay off because of the nonprofit mission and complicated governance structure, the organization announced plans to change that: OpenAI was seeking to transition to a for-profit. The company argued that this was necessary to meet the tremendous costs of building advanced AI models. A nonprofit arm would still exist, though it would separately pursue 'charitable initiatives'—and it would not have any say over the actions of the for-profit, which would convert into a public-benefit corporation, or PBC. Corporate backers appeared satisfied: In March, the Japanese firm Softbank conditioned billions of dollars in investments on OpenAI changing its structure. Resistance came as swiftly as the new funding. Elon Musk—a co-founder of OpenAI who has since created his own rival firm, xAI, and seems to take every opportunity to undermine Altman— wrote on X that OpenAI 'was funded as an open source, nonprofit, but has become a closed source, profit-maximizer.' He had already sued the company for abandoning its founding mission in favor of financial gain, and claimed that the December proposal was further proof. Many unlikely allies emerged soon after. Attorneys general in multiple states, nonprofit groups, former OpenAI employees, outside AI experts, economists, lawyers, and three Nobel laureates all have raised concerns about the pivot, even petitioning to submit briefs to Musk's lawsuit. OpenAI backtracked, announcing a new plan earlier this month that would have the nonprofit remain in charge. Steve Sharpe, a spokesperson for OpenAI, told me over email that the new proposed structure 'puts us on the best path to' build a technology 'that could become one of the most powerful and beneficial tools in human history.' (The Atlantic entered into a corporate partnership with OpenAI in 2024.) Yet OpenAI's pursuit of industry-wide dominance shows no real signs of having hit a roadblock. The company has a close relationship with the Trump administration and is leading perhaps the biggest AI infrastructure buildout in history. Just this month, OpenAI announced a partnership with the United Arab Emirates and an expansion into personal gadgets—a forthcoming ' family of devices ' developed with Jony Ive, former chief design officer at Apple. For-profit or not, the future of AI still appears to be very much in Altman's hands. Why all the worry about corporate structure anyway? Governance, boardroom processes, legal arcana—these things are not what sci-fi dreams are made of. Yet those concerned with the societal dangers that generative AI, and thus OpenAI, pose feel these matters are of profound importance. The still more powerful artificial 'general' intelligence, or AGI, that OpenAI and its competitors are chasing could theoretically cause mass unemployment, worsen the spread of misinformation, and violate all sorts of privacy laws. In the highest-flung doomsday scenarios, the technology brings about civilizational collapse. Altman has expressed these concerns himself—and so OpenAI's 2019 structure, which gave the nonprofit final say over the for-profit's actions, was meant to guide the company toward building the technology responsibly instead of rushing to release new AI products, sell subscriptions, and stay ahead of competitors. 'OpenAI's nonprofit mission, together with the legal structures committing it to that mission, were a big part of my decision to join and remain at the company,' Jacob Hilton, a former OpenAI employee who contributed to ChatGPT, among other projects, told me. In April, Hilton and a number of his former colleagues, represented by the Harvard law professor Lawrence Lessig, wrote a letter to the court hearing Musk's lawsuit, arguing that a large part of OpenAI's success depended on its commitment to safety and the benefit of humanity. To renege on, or at least minimize, that mission was a betrayal. The concerns extend well beyond former employees. Geoffrey Hinton, a computer scientist at the University of Toronto who last year received a Nobel Prize for his AI research, told me that OpenAI's original structure would better help 'prevent a super intelligent AI from ever wanting to take over.' Hinton is one of the Nobel laureates who has publicly opposed the tech company's for-profit shift, alongside the economists Joseph Stiglitz and Oliver Hart. The three academics, joining a number of influential lawyers, economists, and AI experts, in addition to several former OpenAI employees, including Hilton, signed an open letter in April urging the attorneys general in Delaware and California—where the company's nonprofit was incorporated and where the company is headquartered, respectively—to closely investigate the December proposal. According to its most recent tax filing, OpenAI is intended to build AGI 'that safely benefits humanity, unconstrained by a need to generate financial return,' so disempowering the nonprofit seemed, to the signatories, self-evidently contradictory. In its initial proposal to transition to a for-profit, OpenAI still would have had some accountability as a public-benefit corporation: A PBC legally has to try to make profits for shareholders alongside pursuing a designated 'public benefit' (in this case, building 'safe' and 'beneficial' AI as outlined in OpenAI's founding mission). In its December announcement, OpenAI described the restructure as 'the next step in our mission.' But Michael Dorff, another signatory to the open letter and a law professor at UCLA who studies public-benefit corporations, explained to me that PBCs aren't necessarily an effective way to bring about public good. 'They are not great enforcement tools,' he said—they can 'nudge' a company toward a given cause but do not give regulators much authority over that commitment. (Anthropic and xAI, two of OpenAI's main competitors, are also public-benefit corporations.) OpenAI's proposed conversion also raised a whole other issue—a precedent for taking resources accrued under charitable intentions and repurposing them for profitable pursuits. And so yet another coalition, composed of nonprofits and advocacy groups, wrote its own petition for OpenAI's plans to be investigated, with the aim of preventing charitable organizations from being leveraged for financial gain in the future. Regulators, it turned out, were already watching. Three days after OpenAI's December announcement of the plans to revoke nonprofit oversight, Kathy Jennings, the attorney general of Delaware, notified the court presiding over Musk's lawsuit that her office was reviewing the proposed restructure to ensure that the corporation was fulfilling its charitable interest to build AI that benefits all of humanity. California's attorney general, Rob Bonta, was reviewing the restructure, as well. This ultimately led OpenAI to change plans. 'We made the decision for the nonprofit to stay in control after hearing from civic leaders and having discussions with the offices of the Attorneys General of California and Delaware,' Altman wrote in a letter to OpenAI employees earlier this month. The for-profit, meanwhile, will still transition to a PBC. The new plan is not yet a done deal: The offices of the attorneys general told me that they are reviewing the new proposal. Microsoft, OpenAI's closest corporate partner, has not yet agreed to the new structure. One could be forgiven for wondering what all the drama is for. Amid tension over OpenAI's corporate structure, the organization's corporate development hasn't so much as flinched. In just the past few weeks, the company has announced a new CEO of applications, someone to directly oversee and expand business operations; OpenAI for Countries, an initiative focused on building AI infrastructure around the world; and Codex, a powerful AI 'agent' that does coding tasks. To OpenAI, these endeavors legitimately contribute to benefiting humanity: building more and more useful AI tools; bringing those tools and the necessary infrastructure to run them to people around the world; drastically increasing the productivity of software engineers. No matter OpenAI's ultimate aims, in a race against Google and Meta, some commercial moves are necessary to stay ahead. And enriching OpenAI's investors and improving people's lives are not necessarily mutually exclusive. The greater issue is this: There is no universal definition for 'safe' or 'beneficial' AI. A chatbot might help doctors process paperwork faster and help a student float through high school without learning a thing; an AI research assistant could help climate scientists arrive at novel insights while also consuming huge amounts of water and fossil fuels. Whatever definition OpenAI applies will be largely determined by its board. Altman, in his May letter to employees, contended that OpenAI is on the best path 'to continue to make rapid, safe progress and to put great AI in the hands of everyone.' But everyone, in this case, has to trust OpenAI's definition of safe progress. The nonprofit has not always been the most effective check on the company. In 2023, the nonprofit board—which then and now had 'control' over the for-profit subsidiary— removed Altman from his position as CEO. But the company's employees revolted, and he was reinstated shortly thereafter with the support of Microsoft. In other words, 'control' on paper does not always amount to much in reality. Sharpe, the OpenAI spokesperson, said the nonprofit will be able to appoint and remove directors to OpenAI's separate for-profit board, but declined to clarify whether its board will be able to remove executives (such as the CEO). The company is 'continuing to work through the specific governance mandate in consultation with relevant stakeholders,' he said. Sharpe also told me that OpenAI will remove the cap on shareholder returns, which he said will satisfy the conditions for SoftBank's billions of dollars in investment. A top SoftBank executive has said 'nothing has really changed' with OpenAI's restructure, despite the nonprofit retaining control. If investors are now satisfied, the underlying legal structure is irrelevant. Marc Toberoff, a lawyer representing Musk in his lawsuit against OpenAI, wrote in a statement that 'SoftBank pulled back the curtain on OpenAI's corporate theater and said the quiet part out loud. OpenAI's recent 'restructuring' proposal is nothing but window dressing.' Lessig, the lawyer who represented the former OpenAI employees, told me that 'it's outrageous that we are allowing the development of this potentially catastrophic technology with nobody at any level doing any effective oversight of it.' Two years ago, Altman, in Senate testimony, seemed to agree with that notion: He told lawmakers that 'regulatory intervention by governments will be critical to mitigate the risks' of powerful AI. But earlier this month, only a few days after writing to his employees and investors that 'as AI accelerates, our commitment to safety grows stronger,' he told the Senate something else: Too much regulation would be 'disastrous' for America's AI industry. Perhaps—but it might also be in the best interests of humanity.


The Verge
2 hours ago
- The Verge
RFK Jr.‘s ‘Make America Healthy Again' report seems riddled with AI slop
There are some questionable sources underpinning Robert F. Kennedy Jr.'s controversial 'Make America Healthy Again' commission report. Signs point to AI tomfoolery, and the use of ChatGPT specifically, which calls into question the veracity of the White House report meant to address reasons for the decline in US life expectancy. An investigation by NOTUS found dozens of errors in the MAHA report, including broken links, wrong issue numbers, and missing or incorrect authors. Some studies were misstated to back up the report's conclusions, or more damningly, didn't exist at all. At least seven of the cited sources were entirely fictitious, according to NOTUS. Another investigation by The Washington Post found that at least 37 of the 522 citations appeared multiple times throughout the report. Notably, the URLs of several references included 'oaicite,' a marker that OpenAI applies to responses provided by artificial intelligence models like ChatGPT, which strongly suggests its use to develop the report Generative AI tools have a tendency to spit out false or incorrect information, known as 'hallucinations.' That would certainly explain the various errors throughout the report — chatbots have been found responsible for similar citation issues in legal filings submitted by AI experts and even the companies building the models. Nevertheless, RFK Jr has long advocated for the 'AI Revolution,' and announced during a House Committee meeting in May that 'we are already using these new technologies to manage health care data more efficiently and securely.' In a briefing on Thursday, press secretary Karoline Leavitt responded to concerns about the accuracy of the citations while evading any mention of AI tools. Leavitt described the errors as 'formatting issues' and defended the health report for being 'backed on good science that has never been recognized by the federal government.' The Washington Post notes that the MAHA report file was updated on Thursday to remove some of the oaicite markers and replace some of the non-existent sources with alternative citations. In a statement given to the publication, Department of Health and Human Services spokesman Andrew Nixon said 'minor citation and formatting errors have been corrected, but the substance of the MAHA report remains the same — a historic and transformative assessment by the federal government to understand the chronic disease epidemic afflicting our nation's children.'