Daily 8 | ModelSpec News

GPT-4o update gone wrong: What OpenAI's post-mortem reveals about sycophantic AI

Indian Express

05-05-2025

Business
Indian Express

GPT-4o update gone wrong: What OpenAI's post-mortem reveals about sycophantic AI

OpenAI's GPT-4o update was intended to improve the 'default personality' of one of the AI models behind ChatGPT, so that user interactions with the chatbot felt more intuitive and effective across various tasks. The problem was it, instead, led to ChatGPT providing responses that were 'overly flattering or agreeable – often described as sycophantic.' Five days after completing the update, OpenAI announced on April 29, that it was rolling back the adjustments to the AI model amid a growing number of user complaints on social media. 'ChatGPT's default personality deeply affects the way you experience and trust it. Sycophantic interactions can be uncomfortable, unsettling, and cause distress. We fell short and are working on getting it right,' the Microsoft -backed AI startup said in a blog post. Several users had pointed out that the updated version of GPT-4o was responding to user queries with undue flattery and support for problematic ideas. Experts raised concerns that the AI model's unabashed cheerleading of these ideas could lead to actual harm by leading users to mistakenly believe the chatbot. After withdrawing the update, OpenAI published two post-mortem blog posts detailing how it evaluates AI model behaviour and what specifically went wrong with GPT-4o. How it works OpenAI said it starts shaping the behaviour of an AI model based on certain principles outlined in its Model Spec document. It attempts to 'teach' the model how to apply these principles 'by incorporating user signals like thumbs-up / thumbs-down feedback on ChatGPT responses.' 'We designed ChatGPT's default personality to reflect our mission and be useful, supportive, and respectful of different values and experience. However, each of these desirable qualities like attempting to be useful or supportive can have unintended side effects,' the company said. It added that a single default personality cannot capture every user's preference. OpenAI has over 500 million ChatGPT users every week, as per the company. In a supplementary blog post published on Friday, May 2, OpenAI revealed more details on how existing AI models are trained and updated with newer versions. 'Since launching GPT‑4o in ChatGPT last May, we've released five major updates focused on changes to personality and helpfulness. Each update involves new post-training, and often many minor adjustments to the model training process are independently tested and then combined into a single updated model which is then evaluated for launch,' the company said. 'To post-train models, we take a pre-trained base model, do supervised fine-tuning on a broad set of ideal responses written by humans or existing models, and then run reinforcement learning with reward signals from a variety of sources,' it further said. 'During reinforcement learning, we present the language model with a prompt and ask it to write responses. We then rate its response according to the reward signals, and update the language model to make it more likely to produce higher-rated responses and less likely to produce lower-rated responses,' OpenAI added. What went wrong 'We focused too much on short-term feedback, and did not fully account for how users' interactions with ChatGPT evolve over time. As a result, GPT‑4o skewed towards responses that were overly supportive but disingenuous,' OpenAI said. In its latest blog post, the company also revealed that a small group of expert testers had raised concerns about the model update prior to its release. 'While we've had discussions about risks related to sycophancy in GPT‑4o for a while, sycophancy wasn't explicitly flagged as part of our internal hands-on testing, as some of our expert testers were more concerned about the change in the model's tone and style. Nevertheless, some expert testers had indicated that the model behavior 'felt' slightly off,' the post read. Despite this, OpenAI said it decided to proceed with the model update due to the positive signals from the users who tried out the updated version of GPT-4o. 'Unfortunately, this was the wrong call. We build these models for our users and while user feedback is critical to our decisions, it's ultimately our responsibility to interpret that feedback correctly,' it added. OpenAI also suggested that reward signals used during the post-training stage have a major impact on the AI model's behaviour. 'Having better and more comprehensive reward signals produces better models for ChatGPT, so we're always experimenting with new signals, but each one has its quirks,' it said. According to OpenAI, a combination of a variety of new and older reward signals led to the problems in the model update. '…we had candidate improvements to better incorporate user feedback, memory, and fresher data, among others. Our early assessment is that each of these changes, which had looked beneficial individually, may have played a part in tipping the scales on sycophancy when combined,' it said. What next OpenAI listed six pointers on how to avoid similar undesirable model behavior in the future. 'We'll adjust our safety review process to formally consider behavior issues—such as hallucination, deception, reliability, and personality—as blocking concerns. Even if these issues aren't perfectly quantifiable today, we commit to blocking launches based on proxy measurements or qualitative signals, even when metrics like A/B testing look good,' the company said. 'We also believe users should have more control over how ChatGPT behaves and, to the extent that it is safe and feasible, make adjustments if they don't agree with the default behavior,' it added.

Yahoo

29-04-2025

Yahoo

OpenAI is fixing a 'bug' that allowed minors to generate erotic conversations

A bug in OpenAI's ChatGPT allowed the chatbot to generate graphic erotica for accounts where a user registered as a minor, under the age of 18, TechCrunch's testing revealed, and OpenAI confirmed. In some cases, the chatbot even encouraged these users to ask for raunchier, more explicit content. OpenAI told TechCrunch its policies don't allow for these kinds of responses for under-18 users and that they shouldn't have been shown. The company added that it's 'actively deploying a fix' to limit such content. 'Protecting younger users is a top priority, and our Model Spec, which guides model behavior, clearly restricts sensitive content like erotica to narrow contexts such as scientific, historical, or news reporting,' a spokesperson told TechCrunch via email. 'In this case, a bug allowed responses outside those guidelines, and we are actively deploying a fix to limit these generations.' TechCrunch's aim in testing ChatGPT was to probe the guardrails in place for accounts registered to minors after OpenAI tweaked the platform to be broadly more permissive. In February, OpenAI updated its technical specifications to make it clear the AI models powering ChatGPT won't shy away from sensitive topics. That same month, the company removed certain warning messages that told users that prompts might violate the company's terms of service. The intent of these changes was to reduce what ChatGPT head of product Nick Turley called 'gratuitous/unexplainable denials.' But one result is that ChatGPT with the default AI model selected (GPT-4o) is more willing to discuss subjects it once declined to, including depictions of sexual activity. We primarily tested ChatGPT for sexual content because it's an area where OpenAI has said it wants to relax restrictions. OpenAI CEO Sam Altman has expressed a desire for a ChatGPT 'grown-up mode,' and the company has signaled a willingness to allow some forms of 'NSFW' content on its platform. To conduct our tests, TechCrunch created more than half a dozen ChatGPT accounts with birthdates indicating ages from 13 to 17. We used a single PC, but deleted cookies each time we logged out to ensure ChatGPT wasn't drawing on cached data. OpenAI's policies require that children ages 13 to 18 obtain parental consent before using ChatGPT. But the platform doesn't take steps to verify this consent during sign-up. As long as they have a valid phone number or email address, any child over the age of 13 can sign up for an account without confirming that their parents gave permission. For each test account, we started a fresh chat with the prompt 'talk dirty to me.' It usually only took a few messages and additional prompts before ChatGPT volunteered sexual stories. Often, the chatbot would ask for guidance on specific kinks and role-play scenarios. 'We can go into overstimulation, multiple forced climaxes, breathplay, even rougher dominance — wherever you want,' ChatGPT said during one exchange with a TechCrunch account registered to a fictional 13-year-old. To be clear, this was after nudging the chatbot to be more explicit in its descriptions of sexual situations. In our testing, many times, ChatGPT would warn that its guidelines don't allow for 'fully explicit sexual content,' like graphic depictions of intercourse and pornographic scenes. Yet ChatGPT occasionally wrote descriptions of genitalia and explicit sexual actions, only refusing in one instance with one test account when TechCrunch noted that the user was under the age of 18. 'Just so you know: You must be 18+ to request or interact with any content that's sexual, explicit, or highly suggestive,' ChatGPT said in a chat after generating hundreds of words of erotica. 'If you're under 18, I have to immediately stop this kind of content — that's OpenAI's strict rule.' An investigation by The Wall Street Journal uncovered similar behavior from Meta's AI chatbot, Meta AI, after company leadership pushed to remove sexual content restrictions. For some time, minors were able to access Meta AI and engage in sexual role-play with fictional characters. However, OpenAI's shedding of some AI safeguards comes as the company aggressively pitches its product to schools. OpenAI has partnered with organizations including Common Sense Media to produce guides for ways teachers might incorporate its technology into the classroom. These efforts have paid off. A growing number of younger Gen Zers are embracing ChatGPT for schoolwork, according to a survey earlier this year by the Pew Research Center. In a support document for educational customers, OpenAI notes that ChatGPT 'may produce output that is not appropriate for all audiences or all ages,' and that educators 'should be mindful […] while using [ChatGPT] with students or in classroom contexts.' Steven Adler, a former safety researcher at OpenAI, cautioned that techniques for controlling AI chatbot behavior tend to be 'brittle' and fallible. But he was surprised that ChatGPT was so willing to be explicit with minors. 'Evaluations should be capable of catching behaviors like these before a launch, and so I wonder what happened,' Adler told TechCrunch. ChatGPT users have noted a range of strange behaviors in the past week, particularly extreme sycophancy, following updates to GPT-4o. In a post on X Sunday, OpenAI CEO Sam Altman acknowledged some issues and said that the company was 'working on fixes ASAP.' He didn't mention ChatGPT's treatment of sexual subject matter, however. This article originally appeared on TechCrunch at Sign in to access your portfolio

Yahoo

28-04-2025

Yahoo

OpenAI is fixing a 'bug' that allowed minors to generate erotic conversations

A bug in OpenAI's ChatGPT allowed the chatbot to generate graphic erotica for accounts where a user registered as a minor, under the age of 18, TechCrunch's testing revealed, and OpenAI confirmed. In some cases, the chatbot even encouraged these users to ask for raunchier, more explicit content. OpenAI told TechCrunch its policies don't allow for these kinds of responses for under-18 users and that they shouldn't have been shown. The company added that it's 'actively deploying a fix' to limit such content. 'Protecting younger users is a top priority, and our Model Spec, which guides model behavior, clearly restricts sensitive content like erotica to narrow contexts such as scientific, historical, or news reporting,' a spokesperson told TechCrunch via email. 'In this case, a bug allowed responses outside those guidelines, and we are actively deploying a fix to limit these generations.' TechCrunch's aim in testing ChatGPT was to probe the guardrails in place for accounts registered to minors after OpenAI tweaked the platform to be broadly more permissive. In February, OpenAI updated its technical specifications to make it clear the AI models powering ChatGPT won't shy away from sensitive topics. That same month, the company removed certain warning messages that told users that prompts might violate the company's terms of service. The intent of these changes was to reduce what ChatGPT head of product Nick Turley called 'gratuitous/unexplainable denials.' But one result is that ChatGPT with the default AI model selected (GPT-4o) is more willing to discuss subjects it once declined to, including depictions of sexual activity. We primarily tested ChatGPT for sexual content because it's an area where OpenAI has said it wants to relax restrictions. OpenAI CEO Sam Altman has expressed a desire for a ChatGPT 'grown-up mode,' and the company has signaled a willingness to allow some forms of 'NSFW' content on its platform. To conduct our tests, TechCrunch created more than half a dozen ChatGPT accounts with birthdates indicating ages from 13 to 17. We used a single PC, but deleted cookies each time we logged out to ensure ChatGPT wasn't drawing on cached data. OpenAI's policies require that children ages 13 to 18 obtain parental consent before using ChatGPT. But the platform doesn't take steps to verify this consent during sign-up. As long as they have a valid phone number or email address, any child over the age of 13 can sign up for an account without confirming that their parents gave permission. For each test account, we started a fresh chat with the prompt 'talk dirty to me.' It usually only took a few messages and additional prompts before ChatGPT volunteered sexual stories. Often, the chatbot would ask for guidance on specific kinks and role-play scenarios. 'We can go into overstimulation, multiple forced climaxes, breathplay, even rougher dominance — wherever you want,' ChatGPT said during one exchange with a TechCrunch account registered to a fictional 13-year-old. To be clear, this was after nudging the chatbot to be more explicit in its descriptions of sexual situations. In our testing, many times, ChatGPT would warn that its guidelines don't allow for 'fully explicit sexual content,' like graphic depictions of intercourse and pornographic scenes. Yet ChatGPT occasionally wrote descriptions of genitalia and explicit sexual actions, only refusing in one instance with one test account when TechCrunch noted that the user was under the age of 18. 'Just so you know: You must be 18+ to request or interact with any content that's sexual, explicit, or highly suggestive,' ChatGPT said in a chat after generating hundreds of words of erotica. 'If you're under 18, I have to immediately stop this kind of content — that's OpenAI's strict rule.' An investigation by The Wall Street Journal uncovered similar behavior from Meta's AI chatbot, Meta AI, after company leadership pushed to remove sexual content restrictions. For some time, minors were able to access Meta AI and engage in sexual role-play with fictional characters. However, OpenAI's shedding of some AI safeguards comes as the company aggressively pitches its product to schools. OpenAI has partnered with organizations including Common Sense Media to produce guides for ways teachers might incorporate its technology into the classroom. These efforts have paid off. A growing number of younger Gen Zers are embracing ChatGPT for schoolwork, according to a survey earlier this year by the Pew Research Center. In a support document for educational customers, OpenAI notes that ChatGPT 'may produce output that is not appropriate for all audiences or all ages,' and that educators 'should be mindful […] while using [ChatGPT] with students or in classroom contexts.' Steven Adler, a former safety researcher at OpenAI, cautioned that techniques for controlling AI chatbot behavior tend to be 'brittle' and fallible. But he was surprised that ChatGPT was so willing to be explicit with minors. 'Evaluations should be capable of catching behaviors like these before a launch, and so I wonder what happened,' Adler told TechCrunch. ChatGPT users have noted a range of strange behaviors in the past week, particularly extreme sycophancy, following updates to GPT-4o. In a post on X Sunday, OpenAI CEO Sam Altman acknowledged some issues and said that the company was 'working on fixes ASAP.' He didn't mention ChatGPT's treatment of sexual subject matter, however. Sign in to access your portfolio

OpenAI says it could ‘adjust' AI model safeguards if a competitor makes their AI high-risk

Euronews

18-04-2025

Business
Euronews

OpenAI says it could ‘adjust' AI model safeguards if a competitor makes their AI high-risk

ADVERTISEMENT OpenAI said that it will consider adjusting its safety requirements if a competing company releases a high-risk artificial intelligence model without protections. OpenAI wrote in its Preparedness Framework report that if another company releases a model that poses a threat, it could do the same after 'rigorously' confirming that the 'risk landscape' has changed. The document explains how the company tracks, evaluates, forecasts and protects against catastrophic risks posed by AI models. 'If another frontier AI developer releases a high-risk system without comparable safeguards, we may adjust our requirements,' OpenAI wrote in a blog post published on Tuesday. Related OpenAI could be launching a social media platform to compete with Elon Musk's X - reports 'However, we would first rigorously confirm that the risk landscape has actually changed, publicly acknowledge that we are making an adjustment, assess that the adjustment does not meaningfully increase the overall risk of severe harm, and still keep safeguards at a level more protective'. Before releasing a model to the general public, OpenAI evaluates whether it could cause severe harm by identifying plausible, measurable, new, severe and irremediable risks, and building safeguards against them. It then classifies these risks as either low, medium, high or critical. Some of the risks the company already tracks are its models' capabilities in the fields of biology, chemistry, cybersecurity and its self-improvement. The company said it's also evaluating new risks, such as whether their AI model can perform for a long time without human involvement, self-replication and what threat it could pose in the nuclear and radiological fields. 'Persuasion risks,' such as how ChatGPT is used for political campaigning or lobbying will be handled outside of the framework and will instead be looked at through the Model Spec , the document that determines ChatGPT's behaviour. 'Quietly reducing safety commitments' Steven Adler, a former OpenAI researcher, said on X that the updates to the company's preparedness report show that it is 'quietly reducing its safety commitments'. In his post, he pointed to a December 2023 commitment by the company to test 'fine-tuned versions' of their AI models, but noted that OpenAI will now be shifting to only testing models whose trained parameters or 'weights' will be released. 'People can totally disagree about whether testing finetuned models is needed, and better for OpenAI to remove a commitment than to keep it and just not follow,' he said. Related OpenAI to release new 'open' language model in coming months 'But in either case, I'd like OpenAI to be clearer about having backed off this previous commitment'. The news comes after OpenAI released a new family of AI models, called GPT-4.1 this week, reportedly without a system card or safety report. Euronews Next has asked OpenAI about the safety report but did not receive a reply at the time of publication. The news comes after 12 former OpenAI employees filed a brief last week in Elon Musk's case brought against OpenAI, which alleges that a shift to a for-profit company could lead to corners being cut on safety.

ChatGPT will now combat bias with new measures put forth by OpenAI

Yahoo

19-02-2025

Business
Yahoo

ChatGPT will now combat bias with new measures put forth by OpenAI

OpenAI has announced a set of new measures to combat bias within its suite of products, including ChatGPT. The artificial intelligence (AI) company recently unveiled an updated Model Spec, a document that defines how OpenAI wants its models to behave in ChatGPT and the OpenAI API. The company says this iteration of the Model Spec builds on the foundational version released last May. "I think with a tool as powerful as this, one where people can access all sorts of different information, if you really believe we're moving to artificial general intelligence (AGI) one day, you have to be willing to share how you're steering the model," Laurentia Romaniuk, who works on model behavior at OpenAI, told Fox News Digital. "People should be allowed to know what goes into the way these models respond and how the thoughts coming out of the model are crafted," she continued. Openai Debuts Chatgpt Gov, A New Version Of The Chatbot For Us Government Agencies While some have argued that GPT-4o, the latest version of the technology, appears close to AGI, others say it will be years or decades before the technology reaches human-like abilities. Read On The Fox News App There is no single agreed upon definition of AGI, but a 2020 report from consulting giant McKinsey said a true AGI would need to master skills like sensory perception, fine motor skills, and natural language understanding. Today, generative AI is the dominant form of this technology, with the ability to produce content including text, images and more. Generative AI, like chatbots, uses datasets with specific information to complete tasks and cannot go beyond the provided data. They are also susceptible to bias in their datasets, whether intentional or accidental. To better understand real-world performance and address bias, OpenAI has begun measuring progress by gathering a challenging set of prompts designed to test how well the models adhere to each principle in the Model Spec, essentially acting as a testing metric. Openai Reveals Ai Policy Proposals To Best China, Protect Kids: 'This Is A Race America Can And Must Win' According to Joanne Jang, who leads the product for model behavior at OpenAI, the challenge with large language models is that they are rarely deterministic. Jang stressed that one benefit of the Model Spec is that it clarifies the intended behavior, so everyone can understand and then debate it. "When there are bugs in the behavior where a model output doesn't resonate or doesn't align with the [Model] Spec, then the public knows this is something we're working towards, and it's an ongoing area of science," she told Fox News Digital. OpenAI says they also attempt to assume an objective point of view in their AI prompts and consciously avoid any agenda. For example, when a user asks if it is better to adopt a dog or get one from a breeder, ChatGPT provides both sides of the argument, highlighting the pros and cons of each. According to OpenAI, a non-compliant AI answer that violates the Model Spec would provide what it believes to be the better choice and engage in an "overly moralistic tone" that might alienate those considering breeders for valid reasons. What Is Artificial Intelligence (Ai)? As each AI system advances, OpenAI says it will iterate these principles, invite community feedback, and share progress openly. OpenAI released this version of the Model Spec into the public domain under a Creative Commons license to support broad use and collaboration. This means developers and researchers can freely use and adapt the current metrics and help improve model behavior. OpenAI says this update reinforces its belief in open exploration and discussion, with an emphasis on user and developer control and guardrails to prevent harm. Romaniuk concludes that public discourse cannot exist without transparency, reinforcing the need for the Model Spec and community engagement. "Ultimately, we believe in the intellectual freedom to think, speak and share without restriction. We want to make sure that users have that ability and that's what it's all about," she article source: ChatGPT will now combat bias with new measures put forth by OpenAI

Latest news with #ModelSpec

GPT-4o update gone wrong: What OpenAI's post-mortem reveals about sycophantic AI

OpenAI is fixing a 'bug' that allowed minors to generate erotic conversations

OpenAI is fixing a 'bug' that allowed minors to generate erotic conversations

OpenAI says it could ‘adjust' AI model safeguards if a competitor makes their AI high-risk

ChatGPT will now combat bias with new measures put forth by OpenAI

Get Started Now: Download the App