OpenAI Brings GPT-4.1 And GPT-4.1 Mini For Paid And Free Users: All Details

News1819-05-2025

Last Updated:
OpenAI is bringing the new iteration of the GPT 4.1 model that available for both paid and free users with different limits and features.
OpenAI has announced the latest AI models, GPT-4.1 and GPT-4.1 Mini, now available the to ChatGPT users. These models are now being integrated into the ChatGPT interface, significantly broadening access for both free and subscription-based users. This decision comes in response to widespread user demand and the increasing need for advanced AI tools, particularly in software development and technical tasks.
The GPT-4.1 model is now available to subscribers of ChatGPT Plus, Pro and Team plans, while GPT-4.1 Mini can be accessed by all users, including those on the free tier. In parallel, OpenAI has confirmed that it will be removing the GPT-4o Mini model from ChatGPT, streamlining its lineup and prioritizing newer models that offer superior performance.
Designed with developers in mind, GPT-4.1 provides faster response times and enhanced capabilities in areas like coding, debugging and web development. It outperforms the now-retired GPT-4o Mini in both speed and command execution, making it particularly well-suited for users who rely on AI for technical productivity.
Despite these improvements, OpenAI has clarified that GPT-4.1 does not qualify as a 'frontier model" — a classification reserved for models that introduce fundamentally new capabilities or interaction modalities. Therefore, it is not held to the same stringent safety reporting standards as frontier models.
In addressing questions about the model's security and safety protocols, Johannes Heidecke, OpenAI's Head of Safety Systems, stated via a post on X, 'GPT-4.1 builds on the safety work and mitigations developed for GPT-4o. Across our standard safety evaluations, GPT-4.1 performs at parity with GPT-4o, showing that improvements can be delivered without introducing new safety risks."
Heidecke further emphasised that, while GPT-4.1 represents a notable upgrade, it does not surpass the 'o3" level in terms of intelligence or interaction capabilities. 'It didn't bring in new ways of interacting with AI models," he added, explaining why GPT-4.1, though improved, remains within the bounds of OpenAI's existing model safety classification.
This development also follows OpenAI's earlier move on April 30 to phase out the GPT-4.0 model entirely from ChatGPT. The decision was aimed at reducing confusion among users by simplifying model options and focusing on newer, more capable versions.
First Published:
May 19, 2025, 08:10 IST

Hashtags

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Thinking AI models collapse in face of complex problems, Apple researchers find

Hindustan Times

5 hours ago

Hindustan Times

Thinking AI models collapse in face of complex problems, Apple researchers find

Just days ahead of the much-anticipated Worldwide Developer Conference (WWDC), Apple has released a study titled 'The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity', which saw researchers testing 'reasoning'; AI models such as Anthropic's Claude, OpenAI's o models, DeepSeek R1 and Google's Thinking models to see how far they can scale to replicate human reasoning. Spoiler alert — not as much, as the entire AI marketing pitch, would have you believe. Could this signal what may be in store for Apple's AI conversation ahead of the keynote? The study questions the current standard evaluation of Large Reasoning Models (LRMs) using established mathematical and coding benchmarks, arguing they suffer from data contamination and don't reveal insights into reasoning trace structure and quality. Instead, it proposes a controlled experimental testbed using algorithmic puzzle environments. The limitations of AI benchmarking, and need to evolve, is something we had written about earlier. 'We show that state-of-the-art LRMs (e.g., o3-mini, DeepSeek-R1, Claude-3.7-Sonnet-Thinking) still fail to develop generalizable problem-solving capabilities, with accuracy ultimately collapsing to zero beyond certain complexities across different environments,' the researcher paper points out. These findings are a stark warning to the industry — current LLMs are far from general-purpose reasoners. The emergence of Large Reasoning Models (LRMs), such as OpenAI's o1/o3, DeepSeek-R1, Claude 3.7 Sonnet Thinking, and Gemini Thinking, has been hailed as a significant advancement, potentially marking steps toward more general artificial intelligence. These models characteristically generate responses following detailed 'thinking processes', such as a long Chain-of-Thought sequence, before providing a final answer. While they have shown promising results on various reasoning benchmarks, the capability of benchmarks to judge rapidly evolving models, itself is in doubt. The researchers cite a comparison between non-thinking LLMs and their 'thinking' evolution. 'At low complexity, non-thinking models are more accurate and token-efficient. As complexity increases, reasoning models outperform but require more tokens—until both collapse beyond a critical threshold, with shorter traces,' they say. The illustrative example of the Claude 3.7 Sonnet and Claude 3.7 Sonnet Thinking illustrates how both models retain accuracy till complexity level three, after which the standard LLM sees a significant drop, something the thinking model too suffers from, a couple of levels later. At the same time, the thinking model is using significantly more tokens. This research attempted to challenge prevailing evaluation paradigms, which often rely on established mathematical and coding benchmarks, which are otherwise susceptible to data contamination. Such benchmarks also primarily focus on final answer accuracy, providing limited insight into the reasoning process itself, something that is the key differentiator for a 'thinking' model compared with a simpler large language model. To address these gaps, the study utilises controllable puzzle environments — Tower of Hanoi, Checker Jumping, River Crossing, and Blocks World — and these puzzles allow for precise manipulation of problem complexity while maintaining consistent logical structures and rules that must be explicitly followed. That structure theoretically opens a window, a glance at how these models attempt to 'think'. The findings from this controlled experimental setup reveal significant limitations in current frontier LRMs. One of the most striking observations is the complete accuracy collapse that occurs beyond certain complexity thresholds across all tested reasoning models. This is not a gradual degradation but a sharp drop to near-zero accuracy as problems become sufficiently difficult. 'The state-of-the-art LRMs (e.g., o3-mini, DeepSeek-R1, Claude-3.7-Sonnet-Thinking) still fail to develop generalizable problem-solving capabilities, with accuracy ultimately collapsing to zero beyond certain complexities across different environments,' note the researchers. These results inevitably challenge any notion that the LRMs truly possess generalisation problem-solving skills, required for planning tasks or multi-step processes. The study also identifies a counter-intuitive scaling limit in the models' reasoning effort (this is measured by the inference token usage during the 'thinking' phase), which sees these models initially spend more tokens, but as complexity increases, they actually reduce reasoning effort closer to the inevitable accuracy collapse. Researchers say that 'despite these claims and performance advancements, the fundamental benefits and limitations of LRMs remain insufficiently understood. Critical questions still persist: Are these models capable of generalizable reasoning, or are they leveraging different forms of pattern matching?,' they ask. There are further questions pertaining to performance scaling with increasing problem complexity, comparisons to the non-thinking standard LLM counterparts when provided with the same inference token compute, and around inherent limitations of current reasoning approaches, as well as improvements that might be necessary to advance toward more robust reasoning. Where do we go from here? The researchers make it clear that their test methodology too has limitations. 'While our puzzle environments enable controlled experimentation with fine-grained control over problem complexity, they represent a narrow slice of reasoning tasks and may not capture the diversity of real-world or knowledge intensive reasoning problems,' they say. They do add that the use of 'deterministic puzzle simulators assumes that reasoning can be perfectly validated' at every step, a validation that may not be feasible to such precision in less structured domains. That they say, would restrict validity of analysis to more reasoning. There is little argument that LRMs represent progress, particularly for the relevance of AI. Yet, this study highlights that not all reasoning models are capable of robust, generalisable reasoning, particularly in the face of increasing complexity. These findings, ahead of WWDC 2025, and from Apple's own researchers, may suggest that any AI reasoning announcements will likely be pragmatic. The focus areas could include specific use cases where current AI methodology is reliable (the research paper indicates lower to medium complexity, less reliance on flawless long-sequence execution) and potentially integrating neural models with traditional computing approaches to handle the complexities where LRMs currently fail. The era of Large Reasoning Models is here, but this 'Illusion of thinking' study is that AI with true reasoning, remains a mirage.

Why ChatGPT essays still fail to fool experts despite good structure, although they are clear and well structured

Hindustan Times

10 hours ago

Hindustan Times

Why ChatGPT essays still fail to fool experts despite good structure, although they are clear and well structured

The advent of AI has marked the rise of many tools, and ChatGPT is one of the most popular ones. Often used for research and writing, this tool has often been the centre of discussion for its ability to fetch interesting content. However, A new study from the University of East Anglia (UEA) in the UK shows that essays written by real students are still better than those produced by ChatGPT, a popular AI writing tool. Researchers compared 145 essays written by university students with 145 essays generated by ChatGPT to see how well the AI can mimic human writing. The study found that although ChatGPT's essays are clear, well structured, and grammatically correct, they lack something important. The AI essays do not show personal insight or deep critical thinking, which are common in student writing. These missing elements make the AI-generated essays feel less engaging and less convincing. However, the researchers do not see AI only as a threat. They believe tools like ChatGPT can be helpful in education if used properly. Instead of shortcuts to finish assignments, AI should be a tool that supports learning and improves writing skills. After all, education is about teaching students how to think clearly and express ideas. These are things no AI can truly replace. One key difference the researchers looked at was how the writers engage readers. Real student essays often include questions, personal comments, and direct appeals to the reader. These techniques help make the writing feel more interactive and persuasive. On the other hand, ChatGPT's essays tend to avoid questions and personal opinions. They follow academic rules but do not show a clear viewpoint or emotional connection. Professor Ken Hyland from UEA explained that the AI focuses on creating text that is logical and smooth but misses conversational details that humans use to connect with readers. This shows that AI writing still struggles with capturing the personal style and strong arguments that real people naturally use.

You can now schedule tasks with Gemini as Google's powerful new AI feature rivals ChatGPT's capabilities

Hindustan Times

11 hours ago

Hindustan Times

You can now schedule tasks with Gemini as Google's powerful new AI feature rivals ChatGPT's capabilities

Google is steadily evolving Gemini into a smarter, more proactive AI assistant that now competes directly with OpenAI's ChatGPT. The tech giant has started rolling out a feature called Scheduled Actions, which lets users automate recurring or timed tasks without repeating commands. Originally previewed during Google I/O, Scheduled Actions is now arriving on both Android and iOS devices. The feature is currently available to subscribers of Google One AI Premium and select Google Workspace business and education plans. With this rollout, Google is pushing Gemini closer to becoming a fully integrated productivity companion. Scheduled Actions let users instruct Gemini to perform specific tasks at set times or intervals. This includes sending daily calendar summaries, weekly content prompts, or even one time reminders. Once scheduled, Gemini handles them automatically in the background with no follow up required. For example, a user might say, 'Send me a summary of today's meetings every morning at 8 AM' or 'Generate weekly blog ideas every Friday at 10 AM.' These tasks run quietly behind the scenes, transforming Gemini from a reactive chatbot into a daily-use productivity tool. The setup process is built to be intuitive, making automation easy for both everyday users and professionals. Within the Gemini app, users can define a task, set the time, and choose the frequency through a clean and accessible interface. Scheduled Actions puts Google in direct competition with the kind of automation ChatGPT users create through Zapier or custom workflows. What gives Gemini a clear edge is its deep integration with Google's suite of apps. Functioning across Gmail, Calendar, Docs, and Tasks, Gemini offers a smooth setup and efficient task execution experience. Since it is built into tools people already use, Gemini can interact directly with information across Google's ecosystem. There is no need for third party services or custom scripts. For users already invested in Google's platform, the experience is more seamless than ChatGPT's dependence on external integrations. Scheduled Actions signals a shift in expectations for how AI assistants should function. Instead of waiting for commands, Gemini can now anticipate and handle repetitive tasks, offering a more personal and assistant like experience. While this may be just the beginning, it is a clear step toward positioning Gemini as a truly productivity first AI assistant. And as Gemini continues to evolve, it may not just catch up to ChatGPT but define the next generation of digital assistance.

OpenAI Brings GPT-4.1 And GPT-4.1 Mini For Paid And Free Users: All Details

Hashtags

Try Our AI Features

Comments

Related Articles

Thinking AI models collapse in face of complex problems, Apple researchers find

Why ChatGPT essays still fail to fool experts despite good structure, although they are clear and well structured

You can now schedule tasks with Gemini as Google's powerful new AI feature rivals ChatGPT's capabilities

Get Started Now: Download the App