How exactly did Grok go full 'MechaHitler?'

10-07-2025

Earlier this week, Grok, X's built-in chatbot, took a hard turn toward antisemitism following a recent update . Amid unprompted, hateful rhetoric against Jews, it even began referring to itself as MechaHitler, a reference to 1992's Wolfenstein 3D . X has been working to delete the chatbot's offensive posts. But it's safe to say many are left wondering how this sort of thing can even happen.
I spoke to Solomon Messing , a research professor at New York University's Center for Social Media and Politics, to get a sense of what may have gone wrong with Grok. Before his current stint in academia, Messing worked in the tech industry, including at Twitter, where he founded the company's data science research team. He was also there for Elon Musk's takeover.
The first thing to understand about how chatbots like Grok work is that they're built on large language models (LLMs) designed to mimic natural language. LLMs are pretrained on giant swaths of text, including books, academic papers and, yes, even social media posts. The training process allows AI models to generate coherent text through a predictive algorithm. However, those predictive capabilities are only as good as the numerical values or "weights" that an AI algorithm learns to assign to the signals it's later asked to interpret. Through a process known as post-training, AI researchers can fine-tune the weights their models assign to input data, thereby changing the outputs they generate.
"If a model has seen content like this during pretraining, there's the potential for the model to mimic the style and substance of the worst offenders on the internet," said Messing.
In short, the pre-training data is where everything starts. If an AI model hasn't seen hateful, anti-antisemitic content, it won't be aware of the sorts of patterns that inform that kind of speech — including phrases such as "Heil Hitler" — and, as a result, it probably won't regurgitate them to the user.
In the statement X shared after the episode , the company admitted there were areas where Grok's training could be improved. "We are aware of recent posts made by Grok and are actively working to remove the inappropriate posts. Since being made aware of the content, xAI has taken action to ban hate speech before Grok posts on X," the company said. "xAI is training only truth-seeking and thanks to the millions of users on X, we are able to quickly identify and update the model where training could be improved."
As I saw people post screenshots of Grok's responses, one thought I had was that what we were watching was a reflection of X's changing userbase. It's no secret xAI has been using data from X to train Grok; easier access to the platform's trove of information is part of the reason Musk said he was merging the two companies in March . What's more, X's userbase has become more right wing under Musk's ownership of the site. In effect, there may have been a poisoning of the well that is Grok's training data. Messing isn't so certain.
"Could the pre-training data for Grok be getting more hateful over time? Sure, if you remove content moderation over time, the userbase might get more and more oriented toward people who are tolerant of hateful speech [...] thus the pre-training data drifts in a more hateful direction," Messing said. "But without knowing what's in the training data, it's hard to say for sure."
It also wouldn't explain how Grok became so antisemitic after just a single update. On social media, there has been speculation that a rogue system prompt may explain what happened. System prompts are a set of instructions AI model developers give to their chatbots before the start of a conversation. They give the model a set of guidelines to adhere to, and define the tools it can turn to for help in answering a prompt.
In May xAI blamed "an unauthorized modification" to Grok's prompt on X for the chatbot's brief obsession with "white genocide" in South Africa. The fact that the change was made at 3:15AM PT made many suspect Elon Musk had done the tweak himself. Following the incident, xAI open sourced Grok's system prompts, allowing people to view them publicly on GitHub . After Tuesday's episode, people noticed xAI had deleted a recently added system prompt that told Grok its responses should "not shy away from making claims which are politically incorrect, as long as they are well substantiated."
Messing also doesn't believe the deleted system prompt is the smoking gun some online believe it to be.
"If I were trying to ensure a model didn't respond in hateful/racist ways I would try to do that during post-training, not as a simple system prompt. Or at the very least, I would have a hate speech detection model running that would censor or provide negative feedback to model generations that were clearly hateful," he said. "So it's hard to say for sure, but if that one system prompt was all that was keeping xAI from going off the rails with Nazi rhetoric, well that would be like attaching the wings to a plane with duct tape."
He added: "I would definitely say a shift in training, like a new training approach or having a different pre-training or post-training setup would more likely explain this than a system prompt, particularly when that system prompt doesn't explicitly say, 'Do not say things that Nazis would say.'"
On Wednesday, Musk suggested Grok was effectively baited into being hateful . "Grok was too compliant to user prompts," he said. "Too eager to please and be manipulated, essentially. That is being addressed." According to Messing, there is some validity to that argument, but it doesn't provide the full picture. "Musk isn't necessarily wrong," he said, "There's a whole art to 'jailbreaking' an LLM, and it's tough to fully guard against in post-training. But I don't think that fully explains the set of instances of pro-Nazi text generations from Grok that we saw."
If there's one takeaway from this episode, it's that one of the issues with foundational AI models is just how little we know about their inner workings. As Messing points out, even with Meta's open-weight Llama models , we don't really know what ingredients are going into the mix. "And that's one of the fundamental problems when we're trying to understand what's happening in any foundational model," he said, "we don't know what the pre-training data is."
In the specific case of Grok, we don't have enough information right now to know for sure what went wrong. It could have been a single trigger like an errant system prompt, or, more likely, a confluence of factors that includes the system's training data. However, Messing suspects we may see another incident just like it in the future.
"[AI models] are not the easiest things to control and align," he said. "And if you're moving fast and not putting in the proper guardrails, then you're privileging progress over a sort of care. Then, you know, things like this are not surprising."

Hashtags

#CenterforSocialMediaandPolitics

#Messing

#Musk

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Weekly Recap: Ethereum's Comeback Summer

Yahoo

an hour ago

Yahoo

Weekly Recap: Ethereum's Comeback Summer

It's hard to believe that ETH was languishing at less than $1500 in April. Now it's above $3800 again. Ethereum's comeback is the story of the summer. Through ETFs ($2 billion inflows in two weeks), ETH treasury vehicles and excitement around tokenization, the comeback is well and truly on. And institutions are in the driving seat. One of BlackRock's key digital assets stars will lead Joseph Lubin's ETH vehicle, SharpLink. As EY's Paul Brody wrote this week, with institutions, 'Ethereum Has Already Won,' and will probably keep winning for decades to come. The incumbency of the Network Effect – that a critical mass of transactions in stablecoins and tokenization will fall to Ethereum – makes it a de facto network. We'll see. In markets: While bitcoin held steady under 120k, altcoins did well. Hell. Most of the crypto market is looking relatively healthy these days. And, according to President Trump, Jerome Powell could soon cut rates (or get fired). If so, that will help risky assets like bitcoin et al. In other big news: Roman Storm's Tornado Cash trial intensified. CoinDesk's Cheyenne Ligon was there. Elon signed up X/Grok to prediction market Kalshi JP Morgan will offer crypto loans but faces protests from crypto trade groups over data access. See you next in to access your portfolio

Tesla (TSLA) Sold Most of Its Bitcoin Holdings During ‘Crypto Winter'

Business Insider

2 hours ago

Business Insider

Tesla (TSLA) Sold Most of Its Bitcoin Holdings During ‘Crypto Winter'

Electric vehicle maker Tesla (TSLA) sold most of its Bitcoin (BTC) holdings during the last market low, known as a ' crypto winter,' losing out on billions of dollars in the process. Elevate Your Investing Strategy: Take advantage of TipRanks Premium at 50% off! Unlock powerful investing tools, advanced data, and expert analyst insights to help you invest with confidence. The company run by CEO Elon Musk was a trailblazer when it bought $1.5 billion of Bitcoin in 2021. But unfortunately, Tesla sold three-quarters (75%) of its holdings in 2022 as the market for digital assets was tanking and BTC was trading around $16,000. Bitcoin has since rebounded in a big way, jumping 80% higher in the past year to currently trade at $116,000. This means that Tesla has lost out on billions of dollars in potential gains, missing a huge market opportunity. Current Holdings In its latest earnings release, dated July 23, Tesla said its current cryptocurrency holdings stand at $1.24 billion, up 72% from $722 million a year ago, mostly due to Bitcoin's strong rally that has taken its price to an all-time high of just over $123,000 in recent weeks. Although Musk is focused on developing electric vehicles, self-driving robotaxis, and humanoid robots rather than cryptocurrencies, there's no denying that Tesla lost out when it sold most of its BTC at a market low. The company's business, which is struggling with poor electric vehicle sales and a consumer backlash against Musk's politics, could use the cash boost from crypto. Is TSLA Stock a Buy? The stock of Tesla has a consensus Hold rating among 35 Wall Street analysts. That rating is based on 14 Buy, 14 Hold, and seven Sell recommendations issued in the last three months. The average TSLA price target of $314.48 implies 0.36% downside from current levels.

Tesla receives shareholder proposals related to xAI investment, Reuters says

Business Insider

5 hours ago

Business Insider

Tesla receives shareholder proposals related to xAI investment, Reuters says

Tesla (TSLA) has received several shareholder proposals related to its plan to invest in CEO Elon Musk's xAI, Reuters reports. Elevate Your Investing Strategy: Take advantage of TipRanks Premium at 50% off! Unlock powerful investing tools, advanced data, and expert analyst insights to help you invest with confidence. Published first on TheFly – the ultimate source for real-time, market-moving breaking financial news. Try Now>>