A former OpenAI safety researcher makes sense of ChatGPT's sycophancy and Grok's South Africa obsession

Fast Company16-05-2025

It has been an odd few weeks for generative AI systems, with ChatGPT suddenly turning sycophantic, and Grok, xAI's chatbot, becoming obsessed with South Africa.
Fast Company spoke to Steven Adler, a former research scientist for OpenAI who until November 2024 led safety-related research and programs for first-time product launches and more-speculative long-term AI systems about both—and what he thinks might have gone wrong.
The interview has been edited for length and clarity.
What do you make of these two incidents in recent weeks—ChatGPT's sudden sycophancy and Grok's South Africa obsession—of AI models going haywire?
The high-level thing I make of it is that AI companies are still really struggling with getting AI systems to behave how they want, and that there is a wide gap between the ways that people try to go about this today—whether it's to give a really precise instruction in the system prompt or feed a model training data or fine-tuning data that you think surely demonstrate the behavior you want there—and reliably getting models to do the things you want and to not do the things you want to avoid.
Can they ever get to that point of certainty?
I'm not sure. There are some methods that I feel optimistic about—if companies took their time and were not under pressure to really speed through testing. One idea is this paradigm called control, as opposed to alignment. So the idea being, even if your AI 'wants' different things than you want, or has different goals than you want, maybe you can recognize that somehow and just stop it from taking certain actions or saying or doing certain things. But that paradigm is not widely adopted at the moment, and so at the moment, I'm pretty pessimistic.
What's stopping it being adopted?
Companies are competing on a bunch of dimensions, including user experience, and people want responses faster. There's the gratifying thing of seeing the AI start to compose its response right away. There's some real user cost of safety mitigations that go against that.
Another aspect is, I've written a piece about why it's so important for AI companies to be really careful about the ways that their leading AI systems are used within the company. If you have engineers using the latest GPT model to write code to improve the company's security, if a model turns out to be misaligned and wants to break out of the company or do some other thing that undermines security, it now has pretty direct access. So part of the issue today is AI companies, even though they're using AI in all these sensitive ways, haven't invested in actually monitoring and understanding how their own employees are using these AI systems, because it adds more friction to their researchers being able to use them for other productive uses.
I guess we've seen a lower-stakes version of that with Anthropic [where a data scientist working for the company used AI to support their evidence in a court case, which included a hallucinatory reference to an academic article].
I obviously don't know the specifics. It's surprising to me that an AI expert would submit testimony or evidence that included hallucinated court cases without having checked it. It isn't surprising to me that an AI system would hallucinate things like that. These problems are definitely far from solved, which I think points to a reason that it's important to check them very carefully.
You wrote a multi-thousand-word piece on ChatGPT's sycophancy and what happened. What did happen?
I would separate what went wrong initially versus what I found in terms of what still is going wrong. Initially, it seems that OpenAI started using new signals for what direction to push its AI into—or broadly, when users had given the chatbot a thumbs-up, they used this data to make the chatbot behave more in that direction, and it was penalized for thumb-down. And it happens to be that some people really like flattery. In small doses, that's fine enough. But in aggregate this produced an initial chatbot that was really inclined to blow smoke.
The issue with how it became deployed is that OpenAI's governance around what passes, what evaluations it runs, is not good enough. And in this case, even though they had a goal for their models to not be sycophantic—this is written in the company's foremost documentation about how their models should behave—they did not actually have any tests for this.
What I then found is that even this version that is fixed still behaves in all sorts of weird, unexpected ways. Sometimes it still has these behavioral issues. This is what's been called sycophancy. Other times it's now extremely contrarian. It's gone the other way. What I make of this is it's really hard to predict what an AI system is going to do. And so for me, the lesson is how important it is to do careful, thorough empirical testing.
And what about the Grok incident?
The type of thing I would want to understand to assess that is what sources of user feedback Grok collects, and how, if at all, those are used as part of the training process. And in particular, in the case of the South African white-genocide-type statements, are these being put forth by users and the model is agreeing with them? Or to what extent is the model blurting them out on its own, without having been touched?
It seems these small changes can escalate and amplify.

Hashtags

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Axios

31 minutes ago

Axios

Repricing Elon Musk and his empire

Just a few months ago, investors were willing to massively increase the valuations of Elon Musk's companies, in part because of his proximity to Donald Trump and the centers of Washington power. That ended yesterday. Why it matters: Markets and private investors attributed literally hundreds of billions of dollars of value to the idea that Musk's privileged access would flow to the success of Tesla, SpaceX, xAI and the like. Investors will now potentially have to reprice a Musk empire at odds with Trump as opposed to at his side. What they're saying: "The quickly deteriorating friendship and now 'major beef' between Musk and Trump is jaw dropping and a shock to the market and putting major fear for Tesla investors on what is ahead," Wedbush Securities analyst Dan Ives, one of Wall Street's most outspoken Tesla bulls, wrote as the feud erupted yesterday. By the numbers: Tesla, Musk's only publicly traded face, lost more than $150 billion in value yesterday. Musk himself personally lost almost $20 billion. (The stock is indicated to open sharply higher today, though, after both men took the temperature down overnight.) But that's just part of the puzzle. Only last December, a secondary share sale valued SpaceX at $350 billion, CNBC reported, $140 billion more than it was worth six months before. More recently, Musk's technology companies Neuralink and xAI were out raising money at lofty valuations as well. Roll up Tesla, SpaceX (including Starlink), xAI (including X), Neuralink and the Boring Company, and until a few days ago, you were nearing $2 trillion in valuations. Between the lines: It's impossible to say for certain those businesses would have been worth the same, or more, or less, if Musk's relationship with the president had been different, or never existed at all. But the association clearly didn't hurt. That's on top of billions of dollars in contracts that Musk's companies have with the White House, which Trump directly threatened on Thursday. Musk responded to Trump's threat by suggesting he would decommission NASA's SpaceX Dragon spacecraft — he later walked that back, but that contract alone is worth about $5 billion. Reality check: Musk's companies were worth a lot of money well before he endorsed Trump in July 2024. Tesla had traded like a meme stock, rather than a car company, for years. And the valuation of SpaceX had already risen 20% in six months prior to the political endorsement. Private markets are also infinitely less volatile than the stock market, so it's not as though any portfolio changes are imminent for his companies aside from Tesla. Those investors' problems are more long-term.

OpenAI's marketing head takes leave to undergo breast cancer treatment

TechCrunch

40 minutes ago

TechCrunch

OpenAI's marketing head takes leave to undergo breast cancer treatment

OpenAI's head of marketing, Kate Rouch, has announced she's stepping away from her role for three months while she undergoes treatment for invasive breast cancer. In a post on LinkedIn, Rouch says that Gary Briggs, Meta's former CMO, will serve as interim head of marketing during her leave. 'Earlier this year — just weeks into my dream job — I was diagnosed with invasive breast cancer,' wrote Rouch. 'For the past five months, I've been undergoing chemo at UCSF while leading our marketing team. It's been the hardest season of life — for me, my husband, and our two young kids.' Rouch says her prognosis is 'excellent' and that she's expected to make a full recovery. '1 in 8 American women will get invasive breast cancer,' she wrote in her post. '42,000 die every year. Rates are rising for young women. I'm sharing my story […] to get their attention and encourage them to prioritize their health over the demands of families and jobs. A routine exam saved my life. It could save yours too.' Rouch, who previously worked with Briggs at Meta, joined OpenAI in December. She formerly was Coinbase's CMO, and before that led brand and product marketing for platforms including Instagram, WhatsApp, Messenger and Facebook. Techcrunch event Save $200+ on your TechCrunch All Stage pass Build smarter. Scale faster. Connect deeper. Join visionaries from Precursor Ventures, NEA, Index Ventures, Underscore VC, and beyond for a day packed with strategies, workshops, and meaningful connections. Save $200+ on your TechCrunch All Stage pass Build smarter. Scale faster. Connect deeper. Join visionaries from Precursor Ventures, NEA, Index Ventures, Underscore VC, and beyond for a day packed with strategies, workshops, and meaningful connections. Boston, MA | REGISTER NOW

African Panel Challenges Fitch's Afreximbank Ratings Downgrade

Bloomberg

an hour ago

Bloomberg

African Panel Challenges Fitch's Afreximbank Ratings Downgrade

An African Union-backed panel challenged Fitch Ratings to review its downgrade of the African Export-Import Bank's rating, questioning the company's classification of loans to three countries. Fitch cut its assessment of Afreximbank's debt closer to junk earlier this week, citing the risk that money owed to the lender by sovereign borrowers might be included in those nations' debt restructurings. The downgrade reduced the bank's rating to BBB- — one step above the speculative level that would limit the pool of funds allowed to invest in its debt.