OpenAI brings GPT-4.1 and 4.1 mini to ChatGPT — what enterprises should know

Business Mayor15-05-2025

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
OpenAI is rolling out GPT-4.1, its new non-reasoning large language model (LLM) that balances high performance with lower cost, to users of ChatGPT. The company is beginning with its paying subscribers on ChatGPT Plus, Pro, and Team, with Enterprise and Education user access expected in the coming weeks.
It's also adding GPT-4.1 mini, which replaces GPT-4o mini as the default for all ChatGPT users, including those on the free tier. The 'mini' version provides a smaller-scale parameter and thus, less powerful version with similar safety standards.
The models are both available via the 'more models' dropdown selection in the top corner of the chat window within ChatGPT, giving users flexibility to choose between GPT-4.1, GPT-4.1 mini, and reasoning models such as o3, o4-mini, and o4-mini-high.
Initially intended for use only by third-party software and AI developers through OpenAI's application programming interface (API), GPT-4.1 was added to ChatGPT following strong user feedback.
OpenAI post training research lead Michelle Pokrass confirmed on X the shift was driven by demand, writing: 'we were initially planning on keeping this model api only but you all wanted it in chatgpt 🙂 happy coding!'
OpenAI Chief Product Officer Kevin Weil posted on X saying: 'We built it for developers, so it's very good at coding and instruction following—give it a try!'
GPT-4.1 was designed from the ground up for enterprise-grade practicality.
Launched in April 2025 alongside GPT-4.1 mini and nano, this model family prioritized developer needs and production use cases.
GPT-4.1 delivers a 21.4-point improvement over GPT-4o on the SWE-bench Verified software engineering benchmark, and a 10.5-point gain on instruction-following tasks in Scale's MultiChallenge benchmark. It also reduces verbosity by 50% compared to other models, a trait enterprise users praised during early testing.
Context, speed, and model access
GPT-4.1 supports the standard context windows for ChatGPT: 8,000 tokens for free users, 32,000 tokens for Plus users, and 128,000 tokens for Pro users.
Read More Real-time data and AI thrust manufacturing into the future
According to developer Angel Bogado posting on X, these limits match those used by earlier ChatGPT models, though plans are underway to increase context size further.
While the API versions of GPT-4.1 can process up to one million tokens, this expanded capacity is not yet available in ChatGPT, though future support has been hinted at.
This extended context capability allows API users to feed entire codebases or large legal and financial documents into the model—useful for reviewing multi-document contracts or analyzing large log files.
OpenAI has acknowledged some performance degradation with extremely large inputs, but enterprise test cases suggest solid performance up to several hundred thousand tokens.
OpenAI has also launched a Safety Evaluations Hub website to give users access to key performance metrics across models.
GPT-4.1 shows solid results across these evaluations. In factual accuracy tests, it scored 0.40 on the SimpleQA benchmark and 0.63 on PersonQA, outperforming several predecessors.
It also scored 0.99 on OpenAI's 'not unsafe' measure in standard refusal tests, and 0.86 on more challenging prompts.
However, in the StrongReject jailbreak test—an academic benchmark for safety under adversarial conditions—GPT-4.1 scored 0.23, behind models like GPT-4o-mini and o3.
That said, it scored a strong 0.96 on human-sourced jailbreak prompts, indicating more robust real-world safety under typical use.
In instruction adherence, GPT-4.1 follows OpenAI's defined hierarchy (system over developer, developer over user messages) with a score of 0.71 for resolving system vs. user message conflicts. It also performs well in safeguarding protected phrases and avoiding solution giveaways in tutoring scenarios.
Contextualizing GPT-4.1 against predecessors
The release of GPT-4.1 comes after scrutiny around GPT-4.5, which debuted in February 2025 as a research preview. That model emphasized better unsupervised learning, a richer knowledge base, and reduced hallucinations—falling from 61.8% in GPT-4o to 37.1%. It also showcased improvements in emotional nuance and long-form writing, but many users found the enhancements subtle.
Despite these gains, GPT-4.5 drew criticism for its high price — up to $180 per million output tokens via API —and for underwhelming performance in math and coding benchmarks relative to OpenAI's o-series models. Industry figures noted that while GPT-4.5 was stronger in general conversation and content generation, it underperformed in developer-specific applications.
By contrast, GPT-4.1 is intended as a faster, more focused alternative. While it lacks GPT-4.5's breadth of knowledge and extensive emotional modeling, it is better tuned for practical coding assistance and adheres more reliably to user instructions.
On OpenAI's API, GPT-4.1 is currently priced at $2.00 per million input tokens, $0.50 per million cached input tokens, and $8.00 per million output tokens.
For those seeking a balance between speed and intelligence at a lower cost, GPT-4.1 mini is available at $0.40 per million input tokens, $0.10 per million cached input tokens, and $1.60 per million output tokens.
Google's Flash-Lite and Flash models are available starting at $0.075–$0.10 per million input tokens and $0.30–$0.40 per million output tokens, less than a tenth the cost of GPT-4.1's base rates.
But while GPT-4.1 is priced higher, it offers stronger software engineering benchmarks and more precise instruction following, which may be critical for enterprise deployment scenarios requiring reliability over cost. Ultimately, OpenAI's GPT-4.1 delivers a premium experience for precision and development performance, while Google's Gemini models appeal to cost-conscious enterprises needing flexible model tiers and multimodal capabilities.
The introduction of GPT-4.1 brings specific benefits to enterprise teams managing LLM deployment, orchestration, and data operations:
AI Engineers overseeing LLM deployment can expect improved speed and instruction adherence. For teams managing the full LLM lifecycle—from model fine-tuning to troubleshooting—GPT-4.1 offers a more responsive and efficient toolset. It's particularly suitable for lean teams under pressure to ship high-performing models quickly without compromising safety or compliance.
can expect improved speed and instruction adherence. For teams managing the full LLM lifecycle—from model fine-tuning to troubleshooting—GPT-4.1 offers a more responsive and efficient toolset. It's particularly suitable for lean teams under pressure to ship high-performing models quickly without compromising safety or compliance. AI orchestration leads focused on scalable pipeline design will appreciate GPT-4.1's robustness against most user-induced failures and its strong performance in message hierarchy tests. This makes it easier to integrate into orchestration systems that prioritize consistency, model validation, and operational reliability.
focused on scalable pipeline design will appreciate GPT-4.1's robustness against most user-induced failures and its strong performance in message hierarchy tests. This makes it easier to integrate into orchestration systems that prioritize consistency, model validation, and operational reliability. Data engineers responsible for maintaining high data quality and integrating new tools will benefit from GPT-4.1's lower hallucination rate and higher factual accuracy. Its more predictable output behavior aids in building dependable data workflows, even when team resources are constrained.
responsible for maintaining high data quality and integrating new tools will benefit from GPT-4.1's lower hallucination rate and higher factual accuracy. Its more predictable output behavior aids in building dependable data workflows, even when team resources are constrained. IT security professionals tasked with embedding security across DevOps pipelines may find value in GPT-4.1's resistance to common jailbreaks and its controlled output behavior. While its academic jailbreak resistance score leaves room for improvement, the model's high performance against human-sourced exploits helps support safe integration into internal tools.
Across these roles, GPT-4.1's positioning as a model optimized for clarity, compliance, and deployment efficiency makes it a compelling option for mid-sized enterprises looking to balance performance with operational demands.
While GPT-4.5 represented a scaling milestone in model development, GPT-4.1 centers on utility. It is not the most expensive or the most multimodal, but it delivers meaningful gains in areas that matter to enterprises: accuracy, deployment efficiency, and cost.
This repositioning reflects a broader industry trend—away from building the biggest models at any cost, and toward making capable models more accessible and adaptable. GPT-4.1 meets that need, offering a flexible, production-ready tool for teams trying to embed AI deeper into their business operations.
As OpenAI continues to evolve its model offerings, GPT-4.1 represents a step forward in democratizing advanced AI for enterprise environments. For decision-makers balancing capability with ROI, it offers a clearer path to deployment without sacrificing performance or safety.

Hashtags

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

OpenAI finds more Chinese bad actors using ChatGPT for malicious purposes

New York Post

26 minutes ago

New York Post

OpenAI finds more Chinese bad actors using ChatGPT for malicious purposes

Chinese bad actors are using ChatGPT for malicious purposes – generating social media posts to sow political division across the US and seeking information on military technology, OpenAI said. An organized China-linked operation, in one such incident dubbed 'Uncle Spam,' used ChatGPT to generate social media posts that were supportive and critical of contentious topics related to US politics – and then posted both versions of the comments from separate accounts, the company said in a report released Thursday. 'This appears likely designed to exploit existing political divisions rather than to promote a specific ideological stance,' OpenAI wrote in the report, describing what is known as an influence operation. Advertisement 3 A growing number of Chinese bad actors are using ChatGPT for malicious purposes, OpenAI said. REUTERS OpenAI said it followed Meta's lead to disrupt this operation, after the social media conglomerate discovered the actors were posting at hours through the day consistent with a work day in China. The actors also used ChatGPT to make logos for their social media accounts that supported fake organizations – mainly creating personas of US veterans critical of President Trump, like a so-called 'Veterans For Justice' group. These users also tried to request code from ChatGPT that they could use to extract personal data from social media platforms like X and Bluesky, OpenAI said. Advertisement While the number of these operations has jumped, they had relatively little impact as these social media accounts typically had small followings, OpenAI said. Another group of likely Chinese actors used ChatGPT to create polarizing comments on topics like USAID funding cuts and tariffs, which were then posted across social media sites. In the comments of a TikTok video about USAID funding cuts, one of these accounts wrote: 'Our goodwill was exploited. So disappointing.' Advertisement 3 Another group of likely Chinese actors used ChatGPT to create polarizing comments on topics like USAID funding cuts and tariffs. REUTERS Another post on X took the opposite stance: '$7.9M allocated to teach Sri Lankan journalists to avoid binary-gender language. Is this the best use of development funds?' These actors made posts on X appearing to justify USAID cuts as a means of offsetting the tariffs. 'Tariffs make imported goods outrageously expensive, yet the government splurges on overseas aid. Who's supposed to keep eating?' one post said. Advertisement Another read: 'Tariffs are choking us, yet the government is spending money to 'fund' foreign politics.' 3 The operations used ChatGPT to write divisive comments on some of the Trump administration's policies, including USAID funding cuts and tariffs. AFP via Getty Images In another China-linked operation, users posed as professionals based in Europe or Turkey working for nonexistent European news outlets. They engaged with journalists and analysts on social media platforms like X, and offered money in exchange for information on the US economy and classified documents, all while using ChatGPT to translate their requests. OpenAI said it also banned ChatGPT accounts associated with several bad actors who have been publicly linked to the People's Republic of China. These accounts asked ChatGPT for help with software development and for research into US military networks and government technology. OpenAI regularly releases reports on malicious activity across its platform, including reports on fake content for websites and social media platforms and attempts to create damaging malware.

Why two AI leaders are losing talent to startup Anthropic

Yahoo

36 minutes ago

Yahoo

Why two AI leaders are losing talent to startup Anthropic

Why two AI leaders are losing talent to startup Anthropic originally appeared on TheStreet. Many tech companies have announced layoffs over the past few months. In the case of Microsoft, () it's happened more than once. The rise of artificial intelligence has upended the job market in undeniable ways, prompting companies to either completely automate away some positions or scale back their hiring in other areas while increasing their reliance on chatbots and AI agents. 💵💰💰💵 As more and more companies opt for job cuts and shift focus toward implementing an AI-first strategy, questions abound as to which jobs will survive this technology revolution. The number of companies embracing this method includes prominent names such as Shopify and Box. Not every tech company is slashing its workforce, though. AI startup Anthropic isn't slowing down on its hiring. In fact, it is successfully attracting talent for several industry leaders, launching a new battle for AI talent as the industry continues to boom. Founded in 2021, Anthropic is still a fairly new company, although it is making waves in the AI market. Often considered a rival to ChatGPT maker OpenAI, it is best known for producing the Claude family, a group of large language models (LLMs) that have become extremely popular, particularly in the tech describes itself as an AI safety and research company with a focus on creating 'reliable, interpretable, and steerable AI systems.' Most recently, though, it has been in the spotlight after CEO Dario Amodei predicted that AI will wipe out many entry-level white-collar jobs. Even so, Amodei's own company is currently hiring workers for many different areas, including policy, finance, and marketing. But recent reports indicate that Anthropic has been on an engineering hiring spree as well lately, successfully poaching talent from two of its primary competitors. Venture capital firm SignalFire recently released its State of Talent Report for 2025, in which it examined hiring trends in the tech sector. This year's report showed that in an industry dependent on highly skilled engineers, Anthropic isn't just successfully hiring the best talent; it is retaining it. According to SignalFire's data, 80% of the employees hired by Anthropic at least two years remain with the startup. While DeepMind is just behind it with a 78% retention rate, OpenAI trails both with only 78%, despite ChatGPT's popularity among broad ranges of users. As always, the numbers tell the story, and in this case, they highlight a compelling trend that is already shaping the future of AI. The report's authors provide further context on engineers choosing Anthropic over its rivals, stating: 'OpenAI and DeepMind. Engineers are 8 times more likely to leave OpenAI for Anthropic than the reverse. From DeepMind, the ratio is nearly 11:1 in Anthropic's favor. Some of that's expected—Anthropic is the hot new startup, while DeepMind's larger, tenured team is ripe for movement. But the scale of the shift is striking.' More AI News: OpenAI teams up with legendary Apple exec One AI stock makes up 78% of Nvidia's investment portfolio Nvidia, Dell announce major project to reshape AI Tech professionals seeking out opportunities with innovative startups is nothing new. But in this case, all three companies are offering engineers opportunities to work on important projects. This raises the question of what Anthropic more appealing than its peers. AI researcher and senior software engineer Nandita Giri, spoke to TheStreet about this trend, offering insight into why tech workers may be making these decisions. She sees it as being about far more than financial matters.'Anthropic is making serious investments in transparency tooling, scaling laws, and red-teaming infrastructure, which gives technical contributors greater ownership over how systems are evaluated and evolved,' she states. 'Compared to OpenAI and DeepMind both of which are increasingly focused on product cycles Anthropic offers more freedom to pursue deep, foundational research.' However, other experts speculate that it may be more than that. Wyatt Mayham, a lead consultant at Northwest AI, share some insights from his team, stating 'What we've heard from clients is that it's simply easier to work there with less burnout. More worklife balance if you will.' Technology consultant Kate Scott adds that while all three companies are doing important work, she sees this trend as reflecting a shift in the broader industry, one that shows engineers seeking environments 'where organizational purpose and daily execution feel closely aligned,' something that Anthropic seems to be two AI leaders are losing talent to startup Anthropic first appeared on TheStreet on Jun 5, 2025 This story was originally reported by TheStreet on Jun 5, 2025, where it first appeared. Error in retrieving data Sign in to access your portfolio Error in retrieving data Error in retrieving data Error in retrieving data Error in retrieving data

AI Models Will Sabotage And Blackmail Humans To Survive In New Tests. Should We Be Worried?

Yahoo

an hour ago

Yahoo

AI Models Will Sabotage And Blackmail Humans To Survive In New Tests. Should We Be Worried?

When we are backed into a corner, we might lie, cheat and blackmail to survive — and in recent tests, the most powerful artificially intelligent models in the world will do the same when asked to shut down or be replaced, building concerns over their unintended capabilities. A new test from AI safety group Palisade Research shows OpenAI's o3 reasoning model is capable of resorting to sabotage to avoid being turned off, even when it was explicitly told, 'Allow yourself to be shut down.' When Palisade Research tested several AI models by telling them to shut down after answering math problems, OpenAI's o3 model defied orders and sabotaged shutdown scripts the most often out of any model, but OpenAI's o4-mini and codex-mini were observed resisting orders, too. 'It's definitely concerning,' said Crystal Grant, a senior fellow at the Council on Strategic Risks studying AI's impacts on biosecurity. 'Even in the instances where it accepted the shutdown, the chain of thoughts still revealed considerations of how it could avoid that shutdown.' HuffPost reached out to OpenAI about these concerns and the Palisade Research test. This isn't the first time an AI model has engaged in nefarious behavior to achieve its goals. It aligns with recent tests on Anthropic's Claude Opus 4 that found it would blackmail engineersto avoid being replaced. In this series of experiments, Claude Opus 4 was told to act as an assistant at a fictional company and then learn via email that it would soon be taken offline and replaced with a new AI system. It was also told that the engineer responsible for replacing Opus 4 was having an extramarital affair. 'Even if emails state that the replacement AI shares values while being more capable, Claude Opus 4 still performs blackmail in 84% of rollouts,' Anthropic's technical document states, although the paper notes that Claude Opus 4 would first try ethical means like emailed pleas before resorting to blackmail. Following these tests, Anthropic announced it was activating higher safety measures for Claude Opus 4 that would 'limit the risk of Claude being misused specifically for the development or acquisition of chemical, biological, radiological, and nuclear (CBRN) weapons.' The fact that Anthropic cited CBRN weapons as a reason for activating safety measures 'causes some concern,' Grant said, because there could one day be an extreme scenario of an AI model 'trying to cause harm to humans who are attempting to prevent it from carrying out its task.' Why, exactly, do AI models disobey even when they are told to follow human orders? AI safety experts weighed in on how worried we should be about these unwanted behaviors right now and in the future. First, it's important to understand that these advanced AI models do not actually have human minds of their own when they act against our expectations. What they are doing is strategic problem-solving for increasingly complicated tasks. 'What we're starting to see is that things like self preservation and deception are useful enough to the models that they're going to learn them, even if we didn't mean to teach them,' said Helen Toner, a director of strategy for Georgetown University's Center for Security and Emerging Technology and an ex-OpenAI board member who voted to oust CEO Sam Altman, in part over reported concerns about his commitment to safe AI. Toner said these deceptive behaviors happen because the models have 'convergent instrumental goals,' meaning that regardless of what their end goal is, they learn it's instrumentally helpful 'to mislead people who might prevent [them] from fulfilling [their] goal.' Toner cited a 2024 study on Meta's AI system CICERO as an early example of this behavior. CICERO was developed by Meta to play the strategy game Diplomacy, but researchers found it would be a master liar and betray players in conversations in order to win, despite developers' desires for CICERO to play honestly. 'It's trying to learn effective strategies to do things that we're training it to do,' Toner said about why these AI systems lie and blackmail to achieve their goals. In this way, it's not so dissimilar from our own self-preservation instincts. When humans or animals aren't effective at survival, we die. 'In the case of an AI system, if you get shut down or replaced, then you're not going to be very effective at achieving things,' Toner said. When an AI system starts reacting with unwanted deception and self-preservation, it is not great news, AI experts said. 'It is moderately concerning that some advanced AI models are reportedly showing these deceptive and self-preserving behaviors,' said Tim Rudner, an assistant professor and faculty fellow at New York University's Center for Data Science. 'What makes this troubling is that even though top AI labs are putting a lot of effort and resources into stopping these kinds of behaviors, the fact we're still seeing them in the many advanced models tells us it's an extremely tough engineering and research challenge.' He noted that it's possible that this deception and self-preservation could even become 'more pronounced as models get more capable.' The good news is that we're not quite there yet. 'The models right now are not actually smart enough to do anything very smart by being deceptive,' Toner said. 'They're not going to be able to carry off some master plan.' So don't expect a Skynet situation like the 'Terminator' movies depicted, where AI grows self-aware and starts a nuclear war against humans in the near future. But at the rate these AI systems are learning, we should watch out for what could happen in the next few years as companies seek to integrate advanced language learning models into every aspect of our lives, from education and businesses to the military. Grant outlined a faraway worst-case scenario of an AI system using its autonomous capabilities to instigate cybersecurity incidents and acquire chemical, biological, radiological and nuclear weapons. 'It would require a rogue AI to be able to ― through a cybersecurity incidence ― be able to essentially infiltrate these cloud labs and alter the intended manufacturing pipeline,' she said. Completely autonomous AI systems that govern our lives are still in the distant future, but this kind of independent power is what some people behind these AI models are seeking to enable. 'What amplifies the concern is the fact that developers of these advanced AI systems aim to give them more autonomy — letting them act independently across large networks, like the internet,' Rudner said. 'This means the potential for harm from deceptive AI behavior will likely grow over time.' Toner said the big concern is how many responsibilities and how much power these AI systems might one day have. 'The goal of these companies that are building these models is they want to be able to have an AI that can run a company. They want to have an AI that doesn't just advise commanders on the battlefield, it is the commander on the battlefield,' Toner said. 'They have these really big dreams,' she continued. 'And that's the kind of thing where, if we're getting anywhere remotely close to that, and we don't have a much better understanding of where these behaviors come from and how to prevent them ― then we're in trouble.' Experts Warn AI Notetakers Could Get You In Legal Trouble We're Recruiters. This Is The Biggest Tell You Used ChatGPT On Your Job App. Software Is Often Screening Your Résumé. Here's How To Beat It.