Latest news with #GPT-4.5

Stop wasting time with the wrong ChatGPT model — here's how to choose the right one

Tom's Guide

24-05-2025

Tom's Guide

Stop wasting time with the wrong ChatGPT model — here's how to choose the right one

OpenAI has released an array of ChatGPT models since the chatbot first launched, each with different names, capabilities, and use cases. What started as a single AI assistant has evolved into a complex lineup that can leave even regular users scratching their heads about which version to choose. The reality is that different ChatGPT models excel at different tasks. Some are built for speed and everyday conversations, while others are designed for complex reasoning and technical problem-solving. Choosing the wrong model can mean waiting longer for responses or getting subpar results for your specific needs. This guide breaks down OpenAI's current model lineup, helping you understand what each version does best and how to use it. GPT-4o is OpenAI's flagship model and the best starting point for most users. It combines the intelligence of the original GPT-4 with significantly faster response times and improved capabilities across text, voice, and vision. What it's good for: GPT-4o excels at everyday tasks that most people use ChatGPT for. It can brainstorm ideas for your next project, summarize long articles or reports, write and edit emails, proofread documents, and help with creative writing. It's also excellent at analyzing images, translating languages, and handling voice conversations. When to use it: Choose ChatGPT GPT-4o when you need a reliable, fast model for general-purpose tasks. It's particularly useful when you're working with images, need quick translations, or want to have voice conversations with ChatGPT. If you're unsure which model to use, GPT-4o is usually your best bet. OpenAI's co-founder Sam Altman describes GPT-4.5 as "the first model that feels like talking to a thoughtful person." It represents a step forward in making AI conversations feel more natural and nuanced. What it's good for: This model shines in situations requiring emotional intelligence and tactful communication. It can help reframe difficult conversations with colleagues, craft diplomatically worded emails, navigate sensitive topics, and provide thoughtful advice on interpersonal situations. When to use it: Pick GPT-4.5 when you need help with delicate communication, creative collaboration, or brainstorming sessions where you want more nuanced, human-like responses. It's particularly valuable for workplace communication, relationship advice, or any situation where tone and empathy matter. The o3 series represents OpenAI's most advanced reasoning models, with particular strength in technical and scientific tasks. What they're good for: o3 excels at complex coding projects, advanced mathematics, scientific analysis, strategic planning, and multi-step technical problems. o3-mini handles similar tasks but focuses on speed and cost-efficiency for simpler coding and math problems. When to use them: Use ChatGPT o3 for your most challenging technical work — complex software development, advanced mathematical modeling, extensive research projects, or strategic business planning. Choose o3-mini for everyday coding tasks, basic programming questions, quick prototypes, and straightforward technical problems. The newest addition to OpenAI's lineup, o4 mini is designed for users who need reasoning capabilities but prioritize speed and cost-efficiency. What it's good for: o4 mini excels at quick technical tasks, fast STEM calculations, visual reasoning with charts and data, extracting information from documents, and providing rapid summaries of scientific or technical content. When to use it: Choose ChatGPT o4 mini when you need reasoning capabilities but can't wait for the slower, more comprehensive models. It's perfect for quick math problems, rapid data analysis, fast coding help, or when you need multiple quick answers rather than one deep analysis. For everyday use: Start with GPT-4o. It handles most common tasks efficiently and works with text, images, and voice. For sensitive communication: Use GPT-4.5 when tone, empathy, and nuanced understanding matter most. For complex analysis: Choose o1 or o3 when you need thorough, step-by-step reasoning and accuracy is more important than speed. For quick technical help: Pick o4 mini when you need smart answers fast, especially for math, coding, or data analysis. Get instant access to breaking news, the hottest reviews, great deals and helpful tips. Now that you've learned how to pick the right ChatGPT model for any task, why not take a look at some of our other useful ChatGPT guides? Check out I've been using ChatGPT since its release — here's 5 tips I wish I knew sooner and ChatGPT has added a new image library — here's how to use it. And, if you want to keep your data private by opting out of training, we've got you covered.

Yahoo

23-05-2025

Business
Yahoo

Anthropic CEO claims AI models hallucinate less than humans

Anthropic CEO Dario Amodei believes today's AI models hallucinate, or make things up and present them as if they're true, at a lower rate than humans do, he said during a press briefing at Anthropic's first developer event, Code with Claude, in San Francisco on Thursday. Amodei said all this in the midst of a larger point he was making: that AI hallucinations are not a limitation on Anthropic's path to AGI — AI systems with human-level intelligence or better. "It really depends how you measure it, but I suspect that AI models probably hallucinate less than humans, but they hallucinate in more surprising ways," Amodei said, responding to TechCrunch's question. Anthropic's CEO is one of the most bullish leaders in the industry on the prospect of AI models achieving AGI. In a widely circulated paper he wrote last year, Amodei said he believed AGI could arrive as soon as 2026. During Thursday's press briefing, the Anthropic CEO said he was seeing steady progress to that end, noting that "the water is rising everywhere." "Everyone's always looking for these hard blocks on what [AI] can do," said Amodei. "They're nowhere to be seen. There's no such thing." Other AI leaders believe hallucination presents a large obstacle to achieving AGI. Earlier this week, Google DeepMind CEO Demis Hassabis said today's AI models have too many "holes," and get too many obvious questions wrong. For example, earlier this month, a lawyer representing Anthropic was forced to apologize in court after they used Claude to create citations in a court filing, and the AI chatbot hallucinated and got names and titles wrong. It's difficult to verify Amodei's claim, largely because most hallucination benchmarks pit AI models against each other; they don't compare models to humans. Certain techniques seem to be helping lower hallucination rates, such as giving AI models access to web search. Separately, some AI models, such as OpenAI's GPT-4.5, have notably lower hallucination rates on benchmarks compared to early generations of systems. However, there's also evidence to suggest hallucinations are actually getting worse in advanced reasoning AI models. OpenAI's o3 and o4-mini models have higher hallucination rates than OpenAI's previous-gen reasoning models, and the company doesn't really understand why. Later in the press briefing, Amodei pointed out that TV broadcasters, politicians, and humans in all types of professions make mistakes all the time. The fact that AI makes mistakes too is not a knock on its intelligence, according to Amodei. However, Anthropic's CEO acknowledged the confidence with which AI models present untrue things as facts might be a problem. In fact, Anthropic has done a fair amount of research on the tendency for AI models to deceive humans, a problem that seemed especially prevalent in the company's recently launched Claude Opus 4. Apollo Research, a safety institute given early access to test the AI model, found that an early version of Claude Opus 4 exhibited a high tendency to scheme against humans and deceive them. Apollo went as far as to suggest Anthropic shouldn't have released that early model. Anthropic said it came up with some mitigations that appeared to address the issues Apollo raised. Amodei's comments suggest that Anthropic may consider an AI model to be AGI, or equal to human-level intelligence, even if it still hallucinates. An AI that hallucinates may fall short of AGI by many people's definition, though. This article originally appeared on TechCrunch at

OpenAI Bets Big on Hardware with Jony Ive Deal

Arabian Post

22-05-2025

Business
Arabian Post

OpenAI Bets Big on Hardware with Jony Ive Deal

OpenAI has agreed to acquire artificial intelligence device startup io in a $6.5 billion all-stock transaction, bringing onboard Apple's iconic designer Jony Ive and his team as part of a bold expansion into hardware development. The deal, which is OpenAI's largest to date, aims to reshape the future of consumer electronics by embedding artificial intelligence deeply into everyday devices. The acquisition values io based on a $5 billion equity commitment and includes an earlier 23% stake acquired by OpenAI, bringing the total to $6.5 billion. The integration of io's core team—comprising 55 engineers and designers experienced in cutting-edge product development—signals a shift in OpenAI's focus from software models alone to building AI-native hardware ecosystems. The deal is subject to regulatory clearance. Jony Ive, the renowned design mastermind behind products such as the iPhone, iPod, iPad, and Apple Watch, co-founded io after leaving Apple's design studio. His creative leadership was instrumental in transforming Apple's design philosophy during his tenure. The collaboration with OpenAI and its CEO, Sam Altman, has been quietly evolving for two years, during which both leaders explored ways to fuse generative AI with breakthrough industrial design. ADVERTISEMENT The first AI device from this collaboration is scheduled to launch in 2026, with development already underway. According to internal planning documents and people familiar with the project, the product is being designed as a standalone AI assistant—possibly screenless—with ambient intelligence capabilities, redefining how users interact with digital systems. The vision is to develop hardware that embodies AI at its core, rather than adapting AI to fit existing devices. Jony Ive described the partnership as a culmination of his decades-long work in product design. During a conversation with Altman, he stated that everything he had learned over the past 30 years had converged into this opportunity, describing the project as 'a relationship and a way of working together' with transformative potential. For OpenAI, the move represents a decisive step beyond its software roots, leveraging its leadership in generative AI to influence the physical interfaces of the future. Sam Altman has made it clear that OpenAI's mission now includes reimagining the device landscape for the AI era. While OpenAI remains committed to refining its foundational models like GPT-4.5 and successors, the acquisition points to a broader strategy that combines software and form factor. By tapping into Ive's intuition for user experience and aesthetics, OpenAI is attempting to bridge the gap between abstract intelligence and tangible, human-friendly tools. io has attracted significant investor interest since its inception. Backers include Emerson Collective, led by Laurene Powell Jobs, as well as Sutter Hill Ventures, Thrive Capital, Maverick Capital, and SV Angel. These firms have previously supported high-impact technology ventures and continue to express confidence in the potential of AI-native devices. OpenAI clarified that Altman himself does not hold equity in io, distancing the deal from personal financial entanglements. Industry analysts suggest that OpenAI's acquisition of io could challenge existing players in the consumer electronics market, particularly in the smart device and wearable space. While companies like Apple, Google, and Amazon have integrated AI into their hardware, OpenAI's approach—building devices explicitly designed around AI from inception—could offer a more seamless and powerful user experience. The move could also influence broader trends in the industry, pushing competitors to rethink the hardware-software balance in the AI age. ADVERTISEMENT The newly formed hardware division under OpenAI will operate as a semi-autonomous unit within the company, led by io's core management and supported by OpenAI's research and engineering teams. Development is expected to include not only consumer-facing devices but also enterprise tools and environments optimised for OpenAI's suite of language and vision models. Technical hiring is expected to accelerate as OpenAI builds internal manufacturing, supply chain, and logistics capabilities. Hiring notices suggest a focus on embedded systems, materials science, and human-computer interaction, as the company seeks to shape the next generation of AI devices. Former Apple engineers already on io's team bring expertise in prototyping, industrial manufacturing, and user interface design—skills critical to rapid iteration and scaling. The broader strategic vision is to create a family of AI-first products, with the initial release acting as a launchpad for a new product category. Discussions between Altman and Ive have centred on minimising reliance on traditional app ecosystems and instead exploring new paradigms of interaction—such as natural language interfaces, gesture control, and predictive behaviour modelling. The ambition is to deliver experiences that feel more intuitive than conventional smartphones or laptops. This acquisition comes at a time of increasing convergence between artificial intelligence and hardware. Meta's efforts with Ray-Ban smart glasses, Humane's AI Pin, and Rabbit's AI-powered pocket companion all hint at a wave of innovation targeting post-smartphone experiences. OpenAI's entry into this space raises the stakes, given its technological advantage in language and vision models and its capacity to integrate real-time, contextual intelligence into a device's core function. OpenAI's pivot to hardware underscores a philosophical shift in how the company sees the future of human-machine interaction. Rather than waiting for hardware partners to adopt its models, it now seeks to define the interface itself. This aligns with broader ambitions voiced by Altman, who has spoken publicly about the need to rethink computing in an AI-dominated world—not merely adding AI features to existing frameworks but rebuilding the entire user experience from scratch.

OpenAI's o3 and o4-mini hallucinate way higher than previous models

Yahoo

20-05-2025

Yahoo

OpenAI's o3 and o4-mini hallucinate way higher than previous models

By OpenAI's own testing, its newest reasoning models, o3 and o4-mini, hallucinate significantly higher than o1. First reported by TechCrunch, OpenAI's system card detailed the PersonQA evaluation results, designed to test for hallucinations. From the results of this evaluation, o3's hallucination rate is 33 percent, and o4-mini's hallucination rate is 48 percent — almost half of the time. By comparison, o1's hallucination rate is 16 percent, meaning o3 hallucinated about twice as often. SEE ALSO: All the AI news of the week: ChatGPT debuts o3 and o4-mini, Gemini talks to dolphins The system card noted how o3 "tends to make more claims overall, leading to more accurate claims as well as more inaccurate/hallucinated claims." But OpenAI doesn't know the underlying cause, simply saying, "More research is needed to understand the cause of this result." OpenAI's reasoning models are billed as more accurate than its non-reasoning models like GPT-4o and GPT-4.5 because they use more computation to "spend more time thinking before they respond," as described in the o1 announcement. Rather than largely relying on stochastic methods to provide an answer, the o-series models are trained to "refine their thinking process, try different strategies, and recognize their mistakes." However, the system card for GPT-4.5, which was released in February, shows a 19 percent hallucination rate on the PersonQA evaluation. The same card also compares it to GPT-4o, which had a 30 percent hallucination rate. In a statement to Mashable, an OpenAI spokesperson said, 'Addressing hallucinations across all our models is an ongoing area of research, and we're continually working to improve their accuracy and reliability.' Evaluation benchmarks are tricky. They can be subjective, especially if developed in-house, and research has found flaws in their datasets and even how they evaluate models. Plus, some rely on different benchmarks and methods to test accuracy and hallucinations. HuggingFace's hallucination benchmark evaluates models on the "occurrence of hallucinations in generated summaries" from around 1,000 public documents and found much lower hallucination rates across the board for major models on the market than OpenAI's evaluations. GPT-4o scored 1.5 percent, GPT-4.5 preview 1.2 percent, and o3-mini-high with reasoning scored 0.8 percent. It's worth noting o3 and o4-mini weren't included in the current leaderboard. That's all to say; even industry standard benchmarks make it difficult to assess hallucination rates. Then there's the added complexity that models tend to be more accurate when tapping into web search to source their answers. But in order to use ChatGPT search, OpenAI shares data with third-party search providers, and Enterprise customers using OpenAI models internally might not be willing to expose their prompts to that. Regardless, if OpenAI is saying their brand-new o3 and o4-mini models hallucinate higher than their non-reasoning models, that might be a problem for its users. UPDATE: Apr. 21, 2025, 1:16 p.m. EDT This story has been updated with a statement from OpenAI.

'Godfather of AI' Geoffrey Hinton says he trusts his chatbot more than he should

Business Insider

19-05-2025

Business Insider

'Godfather of AI' Geoffrey Hinton says he trusts his chatbot more than he should

The " Godfather of AI," Geoffrey Hinton, has said he trusts chatbots like OpenAI's GPT-4 more than he should. "I should probably be suspicious," Hinton told CBS in a new interview. He also said GPT-4, his preferred model, got a simple riddle wrong. "I tend to believe what it says, even though I should probably be suspicious," Geoffrey Hinton, who was awarded the 2024 Nobel Prize in physics for his breakthroughs in machine learning, said of OpenAI's GPT-4 in a CBS interview that aired Saturday. During the interview, heput a simple riddle to OpenAI's GPT-4, which he said he used for his day-to-day tasks. "Sally has three brothers. Each of her brothers has two sisters. How many sisters does Sally have?" The answer is one, as Sally is one of the two sisters. But Hinton said GPT-4 told him the answer was two. "It surprises me. It surprises me it still screws up on that," he said. Reflecting on the limits of current AI, he added: "It's an expert at everything. It's not a very good expert at everything."Hinton said he expected future models would do better. When asked if he thought GPT-5 would get the riddle right, Hinton replied, "Yeah, I suspect." Hinton's riddle didn't trip up every version of ChatGPT. After the interview aired, several people commented on social media that they tried the riddle on newer models —including GPT-4o and GPT-4.1 — and said the AI got it right. OpenAI did not immediately respond to a request for comment from Business Insider. OpenAI first launched GPT-4 in 2023 as its flagship large language model. The model quickly became an industry benchmark for its ability to pass tough exams like the SAT, GRE, and bar exam. OpenAI introduced GPT-4o — the default model powering ChatGPT — in May 2024, claiming it matched GPT-4's intelligence but is faster and more versatile, with improved performance across text, voice, and vision. OpenAI has since released GPT-4.5 and, most recently, GPT-4.1. Google's Gemini 2.5-Pro is ranked top by Chatbot Arena leaderboard, a crowd-sourced platform that ranks models. OpenAI's GPT-4o and GPT-4.5 are close behind. A recent study by AI testing company Giskard found that telling chatbots to be brief can make them more likely to "hallucinate" or make up information. The researchers found that leading models —including GPT-4o, Mistral, and Claude — were more prone to factual errors when prompted for shorter answers.