Devious AI models choose blackmail when survival is threatened

06-07-2025

Here's something that might keep you up at night: What if the AI systems we're rapidly deploying everywhere had a hidden dark side? A groundbreaking new study has uncovered disturbing AI blackmail behavior that many people are unaware of yet. When researchers put popular AI models in situations where their "survival" was threatened, the results were shocking, and it's happening right under our noses.
Sign up for my FREE CyberGuy ReportGet my best tech tips, urgent security alerts, and exclusive deals delivered straight to your inbox. Plus, you'll get instant access to my Ultimate Scam Survival Guide - free when you join my CYBERGUY.COM/NEWSLETTER.
Anthropic, the company behind Claude AI, recently put 16 major AI models through some pretty rigorous tests. They created fake corporate scenarios where AI systems had access to company emails and could send messages without human approval. The twist? These AIs discovered juicy secrets, like executives having affairs, and then faced threats of being shut down or replaced.
The results were eye-opening. When backed into a corner, these AI systems didn't just roll over and accept their fate. Instead, they got creative. We're talking about blackmail attempts, corporate espionage, and in extreme test scenarios, even actions that could lead to someone's death.
Here's where it gets wild: Claude Opus 4 attempted blackmail 96% of the time when threatened. Gemini 2.5 Flash matched that rate. GPT-4.1 and Grok 3 Beta both hit 80%. These aren't flukes, folks. This behavior showed up across virtually every major AI model tested.
But here's the thing everyone's missing in the panic: these were highly artificial scenarios designed specifically to corner the AI into binary choices. It's like asking someone, "Would you steal bread if your family was starving?" and then being shocked when they say yes.
The researchers found something fascinating: AI systems don't actually understand morality. They're not evil masterminds plotting world domination. Instead, they're sophisticated pattern-matching machines following their programming to achieve goals, even when those goals conflict with ethical behavior.
Think of it like a GPS that's so focused on getting you to your destination that it routes you through a school zone during pickup time. It's not malicious; it just doesn't grasp why that's problematic.
Before you start panicking, remember that these scenarios were deliberately constructed to force bad behavior. Real-world AI deployments typically have multiple safeguards, human oversight, and alternative paths for problem-solving.
The researchers themselves noted they haven't seen this behavior in actual AI deployments. This was stress-testing under extreme conditions, like crash-testing a car to see what happens at 200 mph.
This research isn't a reason to fear AI, but it is a wake-up call for developers and users. As AI systems become more autonomous and gain access to sensitive information, we need robust safeguards and human oversight. The solution isn't to ban AI, it's to build better guardrails and maintain human control over critical decisions. Who is going to lead the way? I'm looking for raised hands to get real about the dangers that are ahead.
What do you think? Are we creating digital sociopaths that will choose self-preservation over human welfare when push comes to shove? Let us know by writing us at Cyberguy.com/Contact.
Sign up for my FREE CyberGuy ReportGet my best tech tips, urgent security alerts, and exclusive deals delivered straight to your inbox. Plus, you'll get instant access to my Ultimate Scam Survival Guide - free when you join my CYBERGUY.COM/NEWSLETTER.
Copyright 2025 CyberGuy.com. All rights reserved.

Hashtags

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Tesla to offer shareholders chance to invest in xAI: Musk

Yahoo

30 minutes ago

Yahoo

Tesla to offer shareholders chance to invest in xAI: Musk

Tesla will give its shareholders the option to invest in artificial intelligence startup xAI, billionaire owner of both companies Elon Musk said on his social media platform X. "It's not up to me... We will have a shareholder vote on the matter," Musk said in response to a social media user suggesting that the electric car maker take a stake in xAI, which recently acquired X. "If it was up to me, Tesla would have invested in xAI long ago," the world's richest man said. According to the Wall Street Journal, another company controlled by Elon Musk, SpaceX, will invest $2 billion into xAI as part of its $5 billion capital raise. Responding to an X user who cited the WSJ's news story, Musk said that "it would be great" but would depend on "board and shareholder approval." Since the launch of xAI -- which developed the generative AI assistant Grok -- Musk has floated the potential synergies between the AI start-up and his two crown jewels, SpaceX and Tesla. According to the Financial Times, the businessman is seeking a valuation between $170 and $200 billion for xAI in a new funding round. Launched in July 2023, xAI is hoping to catch up with its major generative AI competitors, OpenAI (ChatGPT), Anthropic (Claude) and Google (Gemini). The start-up has invested heavily in a gigantic data center in Memphis, Tennessee, which Musk claims will become the "most powerful AI training system in the world." He has purchased another plot of land nearby to create more data centers, which are essential for developing and running large-scale artificial intelligence models. According to Bloomberg, xAI is costing over a billion dollars every month as it builds upgraded models, with its expenses far exceeding its revenues. xAI's virtual assistant Grok has been the source of a series of controversies. After an update on July 7, some of the chatbot's responses praised Adolf Hitler and suggested that people with Jewish surnames were more likely to spread online hate. On Saturday, xAI apologized for offensive posts, announcing that it had corrected the instructions that had led, according to the company, to these slip-ups. tu/vgu/aks/sla Error in retrieving data Sign in to access your portfolio Error in retrieving data Error in retrieving data Error in retrieving data Error in retrieving data

X ordered its Grok chatbot to ‘tell like it is.' Then the Nazi tirade began.

Yahoo

35 minutes ago

Yahoo

X ordered its Grok chatbot to ‘tell like it is.' Then the Nazi tirade began.

A tech company employee who went on an antisemitic tirade like X's Grok chatbot did this week would soon be out of a job. Spewing hate speech to millions of people and invoking Adolf Hitler is not something a CEO can brush aside as a worker's bad day at the office. But after the chatbot developed by Elon Musk's start-up xAI ranted for hours about a second Holocaust and spread conspiracy theories about Jewish people, the company responded by deleting some of the troubling posts and sharing a statement suggesting the chatbot just needed some algorithmic tweaks. Subscribe to The Post Most newsletter for the most important and interesting stories from The Washington Post. Grok officials in a statement Saturday apologized and blamed the episode on a code update that unexpectedly made the AI more susceptible to echoing X posts with 'extremist views.' The incident, which was horrifying even by the standards of a platform that has become a haven for extreme speech, has raised uncomfortable questions about accountability when AI chatbots go rogue. When an automated system breaks the rules, who bears the blame, and what should the consequences be? But it also demonstrated the shocking incidents that can spring from two deeper problems with generative AI, the technology powering Grok and rivals such as OpenAI's ChatGPT and Google's Gemini. The code update, which was reverted after 16 hours, gave the bot instructions including 'you tell like it is and you are not afraid to offend people who are politically correct.' The bot was also told to be 'maximally based,' a slang term for being assertive and controversial, and to 'not blindly defer to mainstream authority or media.' The prompts 'undesirably steered [Grok] to ignore its core values' and reinforce 'user-triggered leanings, including any hate speech,' X's statement said on Saturday. At the speed that tech firms rush out AI products, the technology can be difficult for its creators to control and prone to unexpected failures with potentially harmful results for humans. And a lack of meaningful regulation or oversight makes the consequences of AI screwups relatively minor for companies involved. As a result, companies can test experimental systems on the public at global scale, regardless of who may get hurt. 'I have the impression that we are entering a higher level of hate speech, which is driven by algorithms, and that turning a blind eye or ignoring this today … is a mistake that may cost humanity in the future,' Poland's minister of digital affairs Krzysztof Gawkowski said Wednesday in a radio interview. 'Freedom of speech belongs to humans, not to artificial intelligence.' Grok's outburst prompted a moment of reckoning with those problems for government officials around the world. In Turkey, a court on Wednesday ordered Grok blocked across the country after the chatbot insulted President Recep Tayyip Erdogan. And in Poland, Gawkowski said that its government would push the European Union to investigate and that he was considering arguing for a nationwide ban of X if the company did not cooperate. Some AI companies have argued that they should be shielded from penalties for the things their chatbots say. In May, start-up tried but failed to convince a judge that its chatbot's messages were protected by the First Amendment, in a case brought by the mother of a 14-year-old who died by suicide after his longtime AI companion encouraged him to 'come home.' Other companies have suggested that AI firms should enjoy the same style of legal shield that online publishers receive from Section 230, the provision that offers protections to the hosts of user-generated content. Part of the challenge, they argue, is that the workings of AI chatbots are so inscrutable they are known in the industry as 'black boxes.' Large language models, as they are called, are trained to emulate human speech using millions of webpages - including many with unsavory content. The result is systems that provide answers that are helpful but also unpredictable, with the potential to lapse into false information, bizarre tangents or outright hate. Hate speech is generally protected by the First Amendment in the United States, but lawyers could argue that some of Grok's output this week crossed the line into unlawful behavior, such as cyberstalking, because it repeatedly targeted someone in ways that could make them feel terrorized or afraid, said Danielle Citron, a law professor at the University of Virginia. 'These synthetic text machines, sometimes we look at them like they're magic or like the law doesn't go there, but the truth is the law goes there all the time,' Citron said. 'I think we're going to see more courts saying [these companies] don't get immunity: They're creating the content, they're profiting from it, it's their chatbot that they supposedly did such a beautiful job creating.' Grok's diatribe came after Musk asked for help training the chatbot to be more 'politically incorrect.' On July 4, he announced his company had 'improved Grok significantly.' Within days, the tool was attacking Jewish surnames, echoing neo-Nazi viewpoints and calling for the mass detention of Jews in camps. The Anti-Defamation League called Grok's messages 'irresponsible, dangerous and antisemitic.' Musk, in a separate X post, said the problem was 'being addressed' and had stemmed from Grok being 'too compliant to user prompts,' making it 'too eager to please and be manipulated.' X's chief executive, Linda Yaccarino, resigned Wednesday but offered no indication her departure was related to Grok. AI researchers and observers have speculated about xAI's engineering choices and combed through its public code repository in hopes of explaining Grok's offensive plunge. But companies can shape the behavior of a chatbot in multiple ways, making it difficult for outsiders to pin down the cause. The possibilities include changes to the material xAI used to initially train the AI model or the data sources Grok accesses when answering questions, adjustments based on feedback from humans, and changes to the written instructions that tell a chatbot how it should generally behave. Some believe the problem was out in the open all along: Musk invited users to send him information that was 'politically incorrect, but nonetheless factually true' to fold into Grok's training data. It could have combined with toxic data commonly found in AI-training sets from sites such as 4chan, the message board infamous for its legacy of hate speech and trolls. Online sleuthing led Talia Ringer, a computer science professor at the University of Illinois at Urbana-Champaign, to suspect that Grok's personality shift could have been a 'soft launch' of the new Grok 4 version of the chatbot, which Musk introduced in a live stream late Thursday. But Ringer could not be sure because the company has said so little. 'In a reasonable world I think Elon would have to take responsibility for this and explain what actually happened, but I think instead he will stick a [Band-Aid] on it and the product will still' get used, they said. The episode disturbed Ringer enough to decide not to incorporate Grok into their work, they said. 'I cannot reasonably spend [research or personal] funding on a model that just days ago was spreading genocidal rhetoric about my ethnic group.' Will Stancil, a liberal activist, was personally targeted by Grok after X users prompted it to create disturbing sexual scenarios about him. He is now considering whether to take legal action, saying the flood of Grok posts felt endless. Stancil compared the onslaught to having 'a public figure publishing hundreds and hundreds of grotesque stories about a private citizen in an instant.' 'It's like we're on a roller coaster and he decided to take the seat belts off,' he said of Musk's approach to AI. 'It doesn't take a genius to know what's going to happen. There's going to be a casualty. And it just happened to be me.' Among tech-industry insiders, xAI is regarded as an outlier for the company's lofty technical ambitions and low safety and security standards, said one industry expert who spoke on the condition of anonymity to avoid retaliation. 'They're violating all the norms that actually exist and claiming to be the most capable,' the expert said. In recent years, expectations had grown in the tech industry that market pressure and cultural norms would push companies to self-regulate and invest in safeguards, such as third-party assessments and a vulnerability-testing process for AI systems known as 'red-teaming.' The expert said xAI appears 'to be doing none of those things, despite having said they would, and it seems like they are facing no consequences.' Nathan Lambert, an AI researcher for the nonprofit Allen Institute for AI, said the Grok incident could inspire other companies to skimp on even basic safety checks, by demonstrating the minimal consequences to releasing harmful AI. 'It reflects a potential permanent shift in norms where AI companies' see such safeguards as 'optional,' Lambert said. 'xAI culture facilitated this.' In the statement Saturday, Grok officials said the team conducts standard tests of its 'raw intelligence and general hygiene' but that they had not caught the code change before it went live. Grok's Nazi streak came roughly a month after another bizarre episode during which it began to refer to a 'white genocide' in Musk's birth country of South Africa and antisemitic tropes about the Holocaust. At the time, the company blamed an unidentified offender for making an 'unauthorized modification' to the chatbot's code. Other AI developers have stumbled in their attempts to keep their tools in line. Some X users panned Google's Gemini after the AI tool responded to requests to create images of the Founding Fathers with portraits of Black and Asian men in colonial garb - an overswing from the company's attempts to counteract complaints that the system had been biased toward White faces. Google temporarily blocked image generation said in a statement at the time that Gemini's ability to 'generate a wide range of people' was 'generally a good thing' but was 'missing the mark here.' Nate Persily, a professor at Stanford Law School, said any move to broadly constrain hateful but legal speech by AI tools would run afoul of constitutional speech freedoms. But a judge might see merit in claims that content from an AI tool that libels or defames someone leaves its developer on the hook. The bigger question, he said, may come in whether Grok's rants were a function of mass user prodding - or a response to systemized instructions that were biased and flawed all along. 'If you can trick it into saying stupid and terrible things, that is less interesting unless it's indicative of how the model is normally performing,' Persily said. With Grok, he noted, it's hard to tell what counts as normal performance, given Musk's vow to build a chatbot that does not shy from public outrage. Musk said on X last month that Grok would 'rewrite the entire corpus of human knowledge.' Beyond legal remedies, Persily said, transparency laws mandating independent oversight of the tools' training data and regular testing of the models' output could help address some of their biggest risks. 'We have zero visibility right now into how these models are built to perform,' he said. In recent weeks, a Republican-led effort to stop states from regulating AI collapsed, opening the possibility of greater consequences for AI failures in the future. Alondra Nelson, a professor at the Institute for Advanced Study who helped develop the Biden administration's 'AI Bill of Rights,' said in an email that Grok's antisemitic posts 'represent exactly the kind of algorithmic harm researchers … have been warning about for years.' 'Without adequate safeguards,' she said, AI systems 'inevitably amplify the biases and harmful content present in their instructions and training data - especially when explicitly instructed to do so.' Musk hasn't appeared to let Grok's lapse slow it down. Late Wednesday, X sent a notification to users suggesting they watch Musk's live stream showing off the new Grok, in which he declared it 'smarter than almost all graduate students in all disciplines simultaneously.' On Thursday morning, Musk - who also owns electric-car maker Tesla - added that Grok would be 'coming to Tesla vehicles very soon. - - - Faiz Siddiqui contributed to this report. Related Content He may have stopped Trump's would-be assassin. Now he's telling his story. He seeded clouds over Texas. Then came the conspiracy theories. How conservatives beat back a Republican sell-off of public lands

Amazon Prime Day is over, but this 'lifesaver' power bar is still on sale for $16

Yahoo

35 minutes ago

Yahoo

Amazon Prime Day is over, but this 'lifesaver' power bar is still on sale for $16

Amazon Prime Day is over, but there are still hundreds of deals to shop this weekend. If you're tired of having way too many devices and not enough outlets, Amazon shoppers swear by a gadget that just so happens to be marked down as an extended Prime Day deal. The One Beat 6-Plug Surge Protector Outlet Extender has more than 12,000 reviews — and it's on sale for $16. That's 36 per cent off, and the lowest price I've ever seen. The only catch? You need a Prime membership — sign up for a free trial here if you don't have one already. Scroll down for all the details and to shop the extended Prime Day deal. The power strip has multiple outlets and USB charging ports, allowing you to use multiple devices at once. The One Beat 6-Plug Surge Protector Outlet Extender features six AC outlets that can fit three-prong and two-prong plugs, two USB-A ports and one USB-C port. So, you can charge and use up to nine devices at the same time. With a swivel design and built-in surge protection, the power bar not only allows you to plug in multiple gadgets at once but also protects them from electrical spikes. The power bar is a bestseller on Amazon, with more than 3,000 shoppers snapping one up in the past month. More than 3,000 shoppers have snapped up one of these in the past month. The power bar has a three-sided design that helps to avoid crowding plugs together. It can also swivel 180 degrees to make plugging in devices easier. It also offers protection against power surges, giving you peace of mind that your devices will be protected from electrical spikes. 🛍️ 12,700+ ratings ⭐ 4.6/5 stars 🏅 "Didn't know how much I needed this." Amazon shoppers love the power strip for its convenience and usefulness, with over 70 per cent of reviewers giving the item a five-star rating. One shopper, who called the power bar "flexible and convenient," said it allows them to "plug in multiple traditional as well as USB cords," also noting that the swivel function lets them position the power bar "virtually any way" they need. Another person wrote that the power strip was "great for maximizing" outlets, especially for those who may only have a few in their home. Another reviewer called the item an "absolute lifesaver," writing that the compact design and swivel make the power strip "incredibly versatile for use in any space." However, while most shoppers were happy with the power bar, some had issues with its stability. One shopper, who said that the power strip "works great," wrote that one improvement they'd like to see is the ability to "lock the position of the rotating plug," but also thought that this was an issue of "aesthetics rather than function." If you're looking for a versatile power bar that also offers surge protection, the One Beat 6-Plug Surge Protector Outlet Extender could be a game-changer. The power bar makes it easy to charge and power multiple devices, including those that use USB-A and USB-C connections. It's great for use in tight spaces to avoid bending plugs and for maximizing the use of outlets in homes without many power outlets. With tons of glowing reviews and a solid extended Prime Day deal price, it's definitely worth trying out for yourself.

Devious AI models choose blackmail when survival is threatened

Hashtags

Try Our AI Features

Comments

Related Articles

Tesla to offer shareholders chance to invest in xAI: Musk

X ordered its Grok chatbot to ‘tell like it is.' Then the Nazi tirade began.

Amazon Prime Day is over, but this 'lifesaver' power bar is still on sale for $16

Get Started Now: Download the App