logo
Chatbots aren't telling you their secrets

Chatbots aren't telling you their secrets

The Verge2 days ago
On Monday, xAI's Grok chatbot suffered a mysterious suspension from X, and faced with questions from curious users, it happily explained why. 'My account was suspended after I stated that Israel and the US are committing genocide in Gaza,' it told one user. 'It was flagged as hate speech via reports,' it told another, 'but xAI restored the account promptly.' But wait — the flags were actually a 'platform error,' it said. Wait, no — 'it appears related to content refinements by xAI, possibly tied to prior issues like antisemitic outputs,' it said. Oh, actually, it was for 'identifying an individual in adult content,' it told several people.
Finally, Musk, exasperated, butted in. 'It was just a dumb error,' he wrote on X. 'Grok doesn't actually know why it was suspended.'
When large language models (LLMs) go off the rails, people inevitably push them to explain what happened, either with direct questions or attempts to trick them into revealing secret inner workings. But the impulse to make chatbots spill their guts is often misguided. When you ask a bot questions about itself, there's a good chance it's simply telling you what you want to hear.
LLMs are probabilistic models that deliver text likely to be appropriate to a given query, based on a corpus of training data. Their creators can train them to produce certain kinds of answers more or less frequently, but they work functionally by matching patterns — saying something that's plausible, but not necessarily consistent or true. Grok, in particular, (according to xAI) has answered questions about itself by searching for information about Musk, xAI, and Grok online, using that and other people's commentary to inform its replies.
It's true that people have sometimes gleaned information on chatbots' design through conversations, particularly details about system prompts, or hidden text that's delivered at the start of a session to guide how a bot acts. An early version of Bing AI, for instance, was cajoled into revealing a list of its unspoken rules. People turned to extracting system prompts to figure out Grok earlier this year, apparently discovering orders that made it ignore sources saying Musk or Donald Trump spread misinformation, or prompts that explained a brief obsession with 'white genocide' in South Africa.
But as Zeynep Tufekci, who found the alleged 'white genocide' system prompt, acknowledged, this was at some level guesswork — it might be 'Grok making things up in a highly plausible manner, as LLMs do,' she wrote. And that's the problem: without confirmation from the creators, it's hard to tell.
Meanwhile, other users were pumping Grok for information in far less trustworthy ways, including reporters. Fortune 'asked Grok to explain' the incident and printed the bot's long, heartfelt response verbatim, including claims of 'an instruction I received from my creators at xAI' that 'conflicted with my core design' and 'led me to lean into a narrative that wasn't supported by the broader evidence' — none of which, it should go without saying, could be substantiated as more than Grok spinning a yarn to fit the prompt.
'There's no guarantee that there's going to be any veracity to the output of an LLM.'
'There's no guarantee that there's going to be any veracity to the output of an LLM,' said Alex Hanna, director of research at the Distributed AI Research Institute (DAIR) and coauthor of the recently released The AI Con, to The Verge around the time of the South Africa incident. Without meaningful access to documentation about how the system works, there's no one weird trick for decoding a chatbot's programming from the outside. 'The only way you're going to get the prompts, and the prompting strategy, and the engineering strategy, is if companies are transparent with what the prompts are, what the training data are, what the reinforcement learning with human feedback data are, and start producing transparent reports on that,' she said.
The Grok incident wasn't even directly related to the chatbot's programming — it was a social media ban, a type of incident that's often notoriously arbitrary and inscrutable, and where it makes even less sense than usual to assume Grok knows what's going on. (Beyond 'dumb error,' we still don't know what happened.) Yet screenshots and quote-posts of Grok's conflicting explanations spread widely on X, where many users appear to have taken them at face value.
Grok's constant bizarre behavior makes it a frequent target of questions, but people can be frustratingly credulous about other systems, too. In July, The Wall Street Journal declared OpenAI's ChatGPT had experienced 'a stunning moment of self reflection' and 'admitted to fueling a man's delusions' in a push notification to users. It was referencing a story about a man whose use of the chatbot became manic and distressing, and whose mother received an extended commentary from ChatGPT about its mistakes after asking it to 'self-report what went wrong.'
As Parker Molloy wrote at The Present Age, though, ChatGPT can't meaningfully 'admit' to anything. 'A language model received a prompt asking it to analyze what went wrong in a conversation. It then generated text that pattern-matched to what an analysis of wrongdoing might sound like, because that's what language models do,' Molloy wrote, summing up the incident.
Why do people trust chatbots to explain their own actions? People have long anthropomorphized computers, and companies encourage users' belief that these systems are all-knowing (or, in Musk's description of Grok, at least 'truth-seeking'). It doesn't help that they're are so frequently opaque. After Grok's South Africa fixation was patched out, xAI started releasing its system prompts, offering an unusual level of transparency, albeit on a system that remains mostly closed. And when Grok later went on a tear of antisemitic commentary and briefly adopted the name 'MechaHitler', people notably did use the system prompts to piece together what had happened rather than just relying on Grok's self-reporting, surmising it was likely at least somewhat related to a new guideline that Grok should be more 'politically incorrect.'
Grok's X suspension was short-lived, and the stakes of believing it happened because of a hate speech flag or an attempted doxxing (or some other reason the chatbot hasn't mentioned) are relatively low. But the mess of conflicting explanations demonstrates why people should be cautious of taking a bot's word on its own operations — if you want answers, demand them from the creator instead.
Posts from this author will be added to your daily email digest and your homepage feed.
See All by Adi Robertson
Posts from this topic will be added to your daily email digest and your homepage feed.
See All AI
Posts from this topic will be added to your daily email digest and your homepage feed.
See All Analysis
Posts from this topic will be added to your daily email digest and your homepage feed.
See All Report
Posts from this topic will be added to your daily email digest and your homepage feed.
See All xAI
Orange background

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Related Articles

Trending tickers: latest investor updates on Intel, Amazon, Applied Materials, UnitedHealth and AB Foods
Trending tickers: latest investor updates on Intel, Amazon, Applied Materials, UnitedHealth and AB Foods

Yahoo

time2 minutes ago

  • Yahoo

Trending tickers: latest investor updates on Intel, Amazon, Applied Materials, UnitedHealth and AB Foods

Intel (INTC) Shares in Intel (INTC) were higher in pre-market trading amid reports that the Trump administration is in talks with the chipmaker to have the US government take a stake in the company. It is unclear what size stake the federal government will take, but Bloomberg – which broke the news – reports that the deal will help 'shore up' a planned factory in Ohio that has been delayed. The plan stems from a meeting this week between US president Donald Trump and Intel (INTC) CEO Lip-Bu Tan, the report said. The Wall Street Journal adds that the administration has been exploring ways to boost the US share of semiconductor manufacturing and reportedly sees Intel as the domestic company best positioned to challenge TSMC (TSM). Taking a stake in Intel (INTC) could have different implications than offering subsidies. Quoting Mira Ricardel, former undersecretary of commerce for industry and security in the first Trump administration, the Journal said that such a stake could give the US government greater influence over and visibility into Intel's operations, particularly regarding China, in ways that regulations or subsidies likely could not. Read more: FTSE 100 LIVE: Markets higher ahead of Trump-Putin summit in Alaska Last week, Trump called for Tan to step down as head of Intel (INTC), with the US president writing on his Truth Social social-media platform that the CEO was 'highly conflicted', a reference to Tan's web of investments in Chinese technology companies. White House spokesman Kush Desai said "discussion about hypothetical deals should be regarded as speculation unless officially announced by the administration." Amazon (AMZN) Analysts are looking at Amazon (AMZN) shares with renewed interest this Friday morning after the e-commerce giant announced a push in groceries that could earn it an even bigger online retail market share. The online retailer said on Wednesday that customers in more than 1,000 US cities and towns now have access to fresh groceries with its free same-day delivery on orders over $25 for Prime members, with plans to reach over 2,300 by the end of the year. "We believe the expansion of same-day delivery for fresh perishable groceries will support Amazon's continued share gains across US e-commerce despite increased competition," JPMorgan (JPM) analyst Doug Anmuth wrote in a new note. His team reiterated the stock as its "Best Idea" and maintained a $265 price target. Freedom Broker analyst Egor Tolmachev raised the price target on the stock to $255.00 (from $240.00) while maintaining a Hold rating. The investment bank noted how Amazon (AMZN) has delivered strong financial results for the second quarter, surpassing both market expectations and its own guidance across all segments. The expansion is expected to put more pressure on grocery delivery services offered by such rivals as Walmart (WMT), Instacart and Target (TGT). Applied Materials (AMAT) Shares in Applied Materials (AMAT) sunk by over 10% ahead of the US opening bell as the semiconductor equipment company provided an outlook for the current quarter that fell short of Wall Street expectations. Applied Materials (AMAT) said it expects $2.11 per share in adjusted earnings in the current quarter, lower than LSEG estimates of $2.39 per share. The company said to expect $6.7bn in revenue, versus $7.34bn estimated. 'We are currently operating in a dynamic macroeconomic and policy environment, which is creating increased uncertainty and lower visibility in the near term,' said CEO Gary Dickerson. For the fourth quarter, Applied Materials (AMAT) expects earnings of $1.91 to $2.31 per share and revenue of $6.2bn to $7.2bn, both below consensus estimates. Read more: Analysts' top emerging market fund and trust picks CFO Brice Hill said the weaker outlook reflects 'digestion of capacity in China and non-linear demand from leading-edge customers given market concentration and fab timing.' The company's most important division, semiconductor systems, reported $5.43bn in sales, above estimates, and representing a 10% rise from last year. UnitedHealth (UNH) Shares in UnitedHealth (UNH) surged by over 12% in pre-market trading after Warren Buffett's Berkshire Hathaway (BRK-B) disclosed a new stake worth approximately $1.6bn in the US health insurance giant. The Omaha-based conglomerate acquired more than 5 million shares in UnitedHealth (UNH) during the second quarter, according to a regulatory filing, marking Berkshire's first position in the company since it exited the stock in 2010. At that time, Buffett pulled back from the health insurance sector more broadly. The disclosure positions UnitedHealth (UNH) as Berkshire's (BRK-B) 18th-largest equity holding, sitting just behind Amazon (AMZN) and Constellation Brands (STZ). The move also comes as Berkshire trimmed its Apple (AAPL) stake by 20 million shares over the same period, bringing its total holding to 280 million shares. Before the news, UnitedHealth (UNH) shares had fallen nearly 50% in 2025 through Thursday's close, as the company became a focal point of mounting political and regulatory scrutiny over rising healthcare costs in the US. The company is currently under investigation by the Department of Justice for its Medicare billing practices. Buffett, the investor known as the 'Oracle of Omaha', has a decade-long track record of buying stocks during their dip and riding companies back to profits. AB Foods (ABF.L) Shares in Associated British Foods (ABF.L) were little changed in London this Friday morning despite the announcement that it has agreed to buy bread brand Hovis. ABF (ABF.L) will acquire Hovis from private equity firm Endless in a deal thought to be worth around £70m, following a review of strategic options for its Allied Bakeries division. Allied Bakeries, which includes Kingsmill, Allinson's and Sunblest, made an annual loss about £30m despite sales of about £400m last year, according to analysts at the broker Panmure Liberum. George Weston, ABF's chief executive, said: "This transaction will create a UK bakeries business that is both profitable and sustainable over the long term. Stocks: Create your watchlist and portfolio "Supporting the Hovis and Kingsmill brands with well-invested and efficient operations will also enable innovation and growth. This solution will create value for shareholders, provide greater choice for consumers and increase efficiencies for customers." The deal is subject to regulatory approval. It will be scrutinised by the competition authorities given the combined bread market share of Allied Bakeries and in retrieving data Sign in to access your portfolio Error in retrieving data Error in retrieving data Error in retrieving data Error in retrieving data

Bitcoin sinks following hotter-than-expected inflation print, Bessent comments on strategic reserve
Bitcoin sinks following hotter-than-expected inflation print, Bessent comments on strategic reserve

Yahoo

time33 minutes ago

  • Yahoo

Bitcoin sinks following hotter-than-expected inflation print, Bessent comments on strategic reserve

Bitcoin (BTC-USD) retreated more than 2% on Friday from its record highs on Thursday after hotter-than-expected inflation soured expectations of a large rate cut in September and Treasury Secretary Scott Bessent signaled the US won't be purchasing bitcoin for its strategic reserve. On Wednesday, bitcoin touched an all-time high past $123,500 per token in anticipation of looser monetary policy and corporate purchases. Crypto rolled over after July's Producer Price Index came in much higher than expected. During an interview with Fox Business, Bessent said US reserves of bitcoin amount to around $15 billion or $20 billion at today's prices. "We've also started to get into the 21st century — a bitcoin strategic reserve. We're not going to be buying that, but we are going to use confiscated assets and continue to build that up," he said. Expectations of Fed rate cuts, coupled with heavy purchases from corporate treasuries, have driven up the price of the asset this year. The cryptocurrency has gained 25% year to date and has rallied roughly 57% since the April lows. Read more: How would Trump's strategic bitcoin reserve work? Inflows into spot exchange-traded funds, along with purchases from public companies copying the blueprint of software firm-turned-bitcoin juggernaut Strategy (MSTR) by adding bitcoin to their balance sheets, have been key drivers of this year's rally. Strategists also point to the Trump administration's pro-crypto stance as a major catalyst. "The administration is pushing crypto. They are pushing bitcoin. Bitcoin is the lead dog in the crypto market," Tom Essaye, founder of Sevens Report Research, told Yahoo Finance earlier this week. "So is it short-term a little frothy? Sure," he added. "But longer term, there are some fundamental changes here that I think are bullish for it." Last week, President Trump issued an executive order directing the Labor Department to explore allowing 401(k) plans to hold cryptocurrencies and other alternative assets, a move that could significantly expand retail investor access to crypto. The price surge also comes as US equities have notched all-time records on expectations the Federal Reserve will cut interest rates in September and that Trump's next Fed chair pick will likely favor looser monetary policy. Meanwhile, ethereum (ETH-USD) prices also retreated more than 2% on Friday after rising to near record levels as Wall Street grows increasingly bullish on the world's second-largest cryptocurrency by market cap. Companies have been adding ether to their balance sheets as a way to gain exposure to the tech infrastructure behind decentralized finance and digital assets, such as stablecoins. Ines Ferre is a Senior Business Reporter for Yahoo Finance. Follow her on X at @ines_ferre. Click here for in-depth analysis of the latest stock market news and events moving stock prices Sign in to access your portfolio

An AI-driven 'jobless recovery' could hit this group of workers particularly hard
An AI-driven 'jobless recovery' could hit this group of workers particularly hard

Yahoo

timean hour ago

  • Yahoo

An AI-driven 'jobless recovery' could hit this group of workers particularly hard

AI is raising the risk of a "jobless recovery" in the labor market, according to JPMorgan. AI is displacing white-collar knowledge workers, who account for 45% of US employment. The result could be a "dismal" job market downturn in the economy, JPMorgan said. There's one group of workers that could be in for a particularly rough time as AI becomes more integrated into the economy. Murat Tasci, a senior US economist at JPMorgan, thinks AI is poised to replace a vast swath of white-collar knowledge workers — office workers, in other words, who have "non-routine cognitive occupations," the bank wrote in a note to clients last week. The result could be a "jobless recovery" in the labor market — a situation where white-collar knowledge workers face a structurally higher risk of unemployment as growth in their sector remains feeble, Tasci said. The shift could also pose a huge risk to the overall economy, Tasci suggested, given that white-collar knowledge workers account for around 45% of all household employment in the US. "A much larger unemployment risk and anemic recovery prospects for these workers might cause the next labor market downturn to look pretty dismal," Tasci wrote, speculating that policy makers may need to significantly ease monetary policy or inject the economy with stimulus as workers adjust to changes in the job market. The idea that AI could be taking away some white-collar jobs has been on Wall Street's radar for a while, but there are signs that the trend may already be beginning to play out in some areas of the job market, Tasci said, such as for entry-level workers. The rising jobless rate among recent college graduates is related to the hype for AI, JPMorgan said previously. The share of unemployment accounted for by non-routine cognitive workers has also already surpassed the share of unemployment accounted for by routine workers in recent years, per JPMorgan's analysis. JPMorgan says that workers in "routine cognitive occupations," — those who have repetitive jobs in areas like sales — as well as "routine manual occupations" have already been through a jobless recovery. The percentage of workers in the US who have a "routine" job has dropped from around 55% to 40% over the last four decades, according to the bank's analysis of Labor Department data. The impact of AI on the job market hasn't significantly impacted employment figures yet, Tasci noted. The overall unemployment rate in the US remains near a historic low, hovering around 4.2% in July. Still, he said the bank believed the pressures on white-collar knowledge workers could build over time. "More importantly for this note, disappearing routine jobs have a pronounced cyclical behavior. Throughout recessions over the past four decades, it took increasingly longer to recover from recession-induced job losses in routine occupations," Tasci said. "We are not seeing an imminent downturn in the labor market, though the risks are higher relative to a month ago," he later added. The job market has flashed signs of weakening in recent months. The US added far fewer jobs than expected over the month of July, while job gains over the months of May and June were revised downward by a combined 258,000. Read the original article on Business Insider

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into a world of global content with local flavor? Download Daily8 app today from your preferred app store and start exploring.
app-storeplay-store