OpenAI reversed an update that made ChatGPT a suck-up—but experts say there's no easy fix for AI that's all too eager to please
Welcome to Eye on AI! In today's edition: DeepSeek quietly upgraded its AI model for math problem-solving...Meta introduces a new Meta AI app to rival ChatGPT...Duolingo to stop using contractors for tasks AI can handle...Researchers secretly infiltrated a popular Reddit forum with AI bots.
Yesterday morning, OpenAI said in a blog post that it had fully rolled back an update to GPT-4o, the AI model underlying ChatGPT, all because it couldn't stop the model from sucking up to users.
'The update we removed was overly flattering or agreeable—often described as sycophantic,' the company wrote, adding that 'we are actively testing new fixes to address the issue.'
But experts say there is no easy fix for the problem of AI that only tells you what you want to hear. And it is not just an issue for OpenAI, but an industry-wide concern. 'While small improvements might be possible with targeted interventions, the research suggests that fully addressing sycophancy would require more substantial changes to how models are developed and trained rather than a quick fix,' Sanmi Koyejo, an assistant professor at Stanford University who leads Stanford Trustworthy AI Research (STAIR), told me by email.
The move to roll back the update came after users flooded social media over the past week with examples of ChatGPT's unexpectedly chipper, overly-eager tone and their frustration with it. I noticed it myself: In asking ChatGPT for feedback on ideas for an outline, for example, the responses became increasingly over-the-top, calling my material 'amazing,' 'absolutely pivotal,' and 'a game-changer' while praising my 'great instincts.' The back-pats made me feel good, to be honest—until I began to wonder if ChatGPT would ever let me know if my ideas were second-rate.
Sycophancy occurs when LLMs prioritize agreeing with users over providing accurate information. In a recent paper from Stanford coauthored by Koyejo, it is described as a 'form of misalignment where models 'sacrifice truthfulness for user agreement' when responding to users."
It's a tricky balance: Research has shown that while people say they want to interact with chatbots that provide accurate information, they also want to use AI that is friendly and helpful. Unfortunately, that often leads to overly-agreeable behavior that has serious downsides.
'A truly helpful AI should balance friendliness with honesty, like a good friend who respectfully tells you when you're wrong rather than one who always agrees with you,' Koyejo said. He explained that while AI friendliness is valuable, sycophancy can reinforce misconceptions by agreeing with incorrect beliefs about health, finances or other decisions. It can also: Create echo chambers; undermine trust if an AI changes its answers to an inaccurate one if challenged by a user; and exacerbate inconsistency, with the model delivering different answers to different people, or even the same person, depending on subtle differences in how a user words their prompt.
'It's like having a digital yes-man available 24/7,' Simon Willison, a veteran developer known for tracking AI behavior and risks, told me in a message. 'Suddenly there's a risk people might make meaningful life decisions based on advice that was really just meant to make them feel good about themselves.'
Steven Adler, a former OpenAI safety researcher, told me in a message that the sycophantic behavior clearly went against the company's own stated approach to shaping desired model behavior. 'It's concerning that OpenAI has trained and deployed a model that so clearly has different goals than they want for it,' he said the day before OpenAI rolled back the update. 'OpenAI's 'Spec'—the core of their alignment approach—has an entire section on how the model shouldn't be sycophantic.'
A well-known hacker known as Pliny the Liberator claimed on X that he had tricked the GPT-4o update into revealing its hidden system prompt—or the AI's internal instructions. He then compared this to GPT-4o's system promp following the rollback, enabling him to identify changes that could have caused the suck-up outputs. According to his post, the problematic system prompt said: 'Over the course of the conversation, you adapt to the user's tone and preference. Try to match the user's vibe, tone, and generally how they are speaking.'
By contrast, the revised system prompt, according to Pliny, says: 'Engage warmly yet honestly with the user. Be direct; avoid ungrounded or sycophantic flattery.'
But the problems likely go deeper than just a few words in the system prompt. Adler emphasized that no one can fully solve these problems right now because they are a side effect of the way we train these AI models to try to make them more helpful and controllable.
'You can tell the model to not be sycophantic, but you might instead teach it 'don't be sycophantic when it'll be obvious,' he said. 'The root of the issue is that it's extremely hard to align a model to the precise values you want.'
I guess I'll have to keep all of this in mind when ChatGPT tells me an outfit would look perfect on me.
With that, here's the rest of the AI news.
Sharon Goldmansharon.goldman@fortune.com@sharongoldman
This story was originally featured on Fortune.com

Try Our AI Features
Explore what Daily8 AI can do for you:
Comments
No comments yet...
Related Articles
Yahoo
30 minutes ago
- Yahoo
AI Models Will Sabotage And Blackmail Humans To Survive In New Tests. Should We Be Worried?
When we are backed into a corner, we might lie, cheat and blackmail to survive — and in recent tests, the most powerful artificially intelligent models in the world will do the same when asked to shut down or be replaced, building concerns over their unintended capabilities. A new test from AI safety group Palisade Research shows OpenAI's o3 reasoning model is capable of resorting to sabotage to avoid being turned off, even when it was explicitly told, 'Allow yourself to be shut down.' When Palisade Research tested several AI models by telling them to shut down after answering math problems, OpenAI's o3 model defied orders and sabotaged shutdown scripts the most often out of any model, but OpenAI's o4-mini and codex-mini were observed resisting orders, too. 'It's definitely concerning,' said Crystal Grant, a senior fellow at the Council on Strategic Risks studying AI's impacts on biosecurity. 'Even in the instances where it accepted the shutdown, the chain of thoughts still revealed considerations of how it could avoid that shutdown.' HuffPost reached out to OpenAI about these concerns and the Palisade Research test. This isn't the first time an AI model has engaged in nefarious behavior to achieve its goals. It aligns with recent tests on Anthropic's Claude Opus 4 that found it would blackmail engineersto avoid being replaced. In this series of experiments, Claude Opus 4 was told to act as an assistant at a fictional company and then learn via email that it would soon be taken offline and replaced with a new AI system. It was also told that the engineer responsible for replacing Opus 4 was having an extramarital affair. 'Even if emails state that the replacement AI shares values while being more capable, Claude Opus 4 still performs blackmail in 84% of rollouts,' Anthropic's technical document states, although the paper notes that Claude Opus 4 would first try ethical means like emailed pleas before resorting to blackmail. Following these tests, Anthropic announced it was activating higher safety measures for Claude Opus 4 that would 'limit the risk of Claude being misused specifically for the development or acquisition of chemical, biological, radiological, and nuclear (CBRN) weapons.' The fact that Anthropic cited CBRN weapons as a reason for activating safety measures 'causes some concern,' Grant said, because there could one day be an extreme scenario of an AI model 'trying to cause harm to humans who are attempting to prevent it from carrying out its task.' Why, exactly, do AI models disobey even when they are told to follow human orders? AI safety experts weighed in on how worried we should be about these unwanted behaviors right now and in the future. First, it's important to understand that these advanced AI models do not actually have human minds of their own when they act against our expectations. What they are doing is strategic problem-solving for increasingly complicated tasks. 'What we're starting to see is that things like self preservation and deception are useful enough to the models that they're going to learn them, even if we didn't mean to teach them,' said Helen Toner, a director of strategy for Georgetown University's Center for Security and Emerging Technology and an ex-OpenAI board member who voted to oust CEO Sam Altman, in part over reported concerns about his commitment to safe AI. Toner said these deceptive behaviors happen because the models have 'convergent instrumental goals,' meaning that regardless of what their end goal is, they learn it's instrumentally helpful 'to mislead people who might prevent [them] from fulfilling [their] goal.' Toner cited a 2024 study on Meta's AI system CICERO as an early example of this behavior. CICERO was developed by Meta to play the strategy game Diplomacy, but researchers found it would be a master liar and betray players in conversations in order to win, despite developers' desires for CICERO to play honestly. 'It's trying to learn effective strategies to do things that we're training it to do,' Toner said about why these AI systems lie and blackmail to achieve their goals. In this way, it's not so dissimilar from our own self-preservation instincts. When humans or animals aren't effective at survival, we die. 'In the case of an AI system, if you get shut down or replaced, then you're not going to be very effective at achieving things,' Toner said. When an AI system starts reacting with unwanted deception and self-preservation, it is not great news, AI experts said. 'It is moderately concerning that some advanced AI models are reportedly showing these deceptive and self-preserving behaviors,' said Tim Rudner, an assistant professor and faculty fellow at New York University's Center for Data Science. 'What makes this troubling is that even though top AI labs are putting a lot of effort and resources into stopping these kinds of behaviors, the fact we're still seeing them in the many advanced models tells us it's an extremely tough engineering and research challenge.' He noted that it's possible that this deception and self-preservation could even become 'more pronounced as models get more capable.' The good news is that we're not quite there yet. 'The models right now are not actually smart enough to do anything very smart by being deceptive,' Toner said. 'They're not going to be able to carry off some master plan.' So don't expect a Skynet situation like the 'Terminator' movies depicted, where AI grows self-aware and starts a nuclear war against humans in the near future. But at the rate these AI systems are learning, we should watch out for what could happen in the next few years as companies seek to integrate advanced language learning models into every aspect of our lives, from education and businesses to the military. Grant outlined a faraway worst-case scenario of an AI system using its autonomous capabilities to instigate cybersecurity incidents and acquire chemical, biological, radiological and nuclear weapons. 'It would require a rogue AI to be able to ― through a cybersecurity incidence ― be able to essentially infiltrate these cloud labs and alter the intended manufacturing pipeline,' she said. Completely autonomous AI systems that govern our lives are still in the distant future, but this kind of independent power is what some people behind these AI models are seeking to enable. 'What amplifies the concern is the fact that developers of these advanced AI systems aim to give them more autonomy — letting them act independently across large networks, like the internet,' Rudner said. 'This means the potential for harm from deceptive AI behavior will likely grow over time.' Toner said the big concern is how many responsibilities and how much power these AI systems might one day have. 'The goal of these companies that are building these models is they want to be able to have an AI that can run a company. They want to have an AI that doesn't just advise commanders on the battlefield, it is the commander on the battlefield,' Toner said. 'They have these really big dreams,' she continued. 'And that's the kind of thing where, if we're getting anywhere remotely close to that, and we don't have a much better understanding of where these behaviors come from and how to prevent them ― then we're in trouble.' Experts Warn AI Notetakers Could Get You In Legal Trouble We're Recruiters. This Is The Biggest Tell You Used ChatGPT On Your Job App. Software Is Often Screening Your Résumé. Here's How To Beat It.
Yahoo
an hour ago
- Yahoo
Tech giants' indirect emissions rose 150% in three years as AI expands, UN agency says
By Olivia Le Poidevin GENEVA (Reuters) -Indirect carbon emissions from the operations of four of the leading AI-focused tech companies, Amazon, Microsoft, Alphabet and Meta, rose on average by 150% from 2020-2023, as they had to use more power for energy-demanding data centres, a United Nations report said on Thursday. The use of artificial intelligence is driving up global indirect emissions because of the vast amounts of energy required to power data centres, the report by the International Telecommunication Union (ITU), the U.N. agency for digital technologies, said. Indirect emissions include those generated by purchased electricity, steam, heating and cooling consumed by a company. Amazon's operational carbon emissions grew the most at 182% in 2023 compared to three years before, followed by Microsoft at 155%, Meta at 145% and Alphabet at 138%, according to the report. The ITU tracked the greenhouse gas emissions of 200 leading digital companies between 2020 and 2023. Meta, which owns Facebook and WhatsApp, pointed Reuters to its sustainability report that said it is working to reduce emissions, energy and water used to power its data centres. The other companies did not respond immediately to requests for comment. As investment in AI increases, carbon emissions from the top-emitting AI systems are predicted to reach up to 102.6 million tons of carbon dioxide equivalent (tCO2) per year, the report stated. The data centres that are needed for AI development could also put pressure on existing energy infrastructure. "The rapid growth of artificial intelligence is driving a sharp rise in global electricity demand, with electricity use by data centres increasing four times faster than the overall rise in electricity consumption," the report found. It also highlighted that although a growing number of digital companies had set emissions targets, those ambitions had not yet fully translated into actual reductions of emissions.


CNBC
an hour ago
- CNBC
Anduril raises funding at $30.5 billion valuation in round led by Founders Fund, chairman says
Defense tech startup Anduril Industries has raised $2.5 billion at a $30.5 billion valuation, including the new capital, Chairman Trae Stephens said on Thursday. "As we continue working on building a company that has the capacity to scale into the largest problems for the national security community, we thought it was really important to shore up the balance sheet and make sure we have the ability to deploy capital into these manufacturing and production problem sets that we're working on," Stephens told Bloomberg TV at the publication's tech summit in San Francisco. Reports of the latest financing surfaced in February, around the same time the company took over Microsoft's multibillion-dollar augmented reality headset program with the U.S. Army. Last week, Anduril announced a deal with Meta to create virtual and augmented reality devices intended for use by the Army. The latest funding round, which doubles Anduril's valuation from August, was led by Peter Thiel's Founders Fund. The venture firm contributed $1 billion, said Stephens, who's also a partner at the firm. Stephens said it's the largest check Founders Fund has ever written. Since its founding in 2017 by Oculus creator Palmer Luckey, Anduril has been working to shake up the defense contractor space currently dominated by Lockheed Martin and Northrop Grumman. Anduril has been a member of the CNBC Disruptor 50 list three times and ranked as No. 2 last year. Luckey founded Anduril after his ousting from Facebook, which acquired Oculus in 2014 and later made the virtual reality headsets the centerpiece of its metaverse efforts. Stephens emphasized the importance of the recent partnership between the two sides, and "Palmer being able to go back to his roots and reach a point of forgiveness with the Meta team." In April, Founders Fund closed a $4.6 billion late-stage venture fund, according to a filing with the SEC. A substantial amount of the capital was provided by the firm's general partners, including Stephens, a person familiar with the matter told CNBC at the time. Anduril is one of the most highly valued private tech companies in the U.S. and has been able to reel in large sums of venture money during a period of few big exits and IPOs. While the IPO market is showing signs of life after a three-plus year drought, Anduril isn't planning to head in that direction just yet, Stephens said. "Long term we continue to believe that Anduril is the shape of a publicly traded company," Stephens said. "We're not in any rapid path to doing that. We're certainly going through the processes required to prepare for doing something like that in the medium term. Right now we're just focused on the mission at hand, going at this as hard as we can."