OpenAI says its next big model can bring home Math Olympiad gold: A turning point?

a day ago

The value of AI for most users today lies in its ability to generate coherent, conversational language by applying probability theory to massive datasets. However, a future where AI models drive advances in fields like cryptography and space exploration by solving complex, multi-step mathematical problems, is now one step closer to reality.
OpenAI on Saturday, July 19, announced that its experimental AI reasoning model earned enough points on this year's International Math Olympiad (IMO) to win a gold medal.
Started in 1959 in Romania, the IMO is widely considered to be one of the hardest, most prestigious math competitions in the world for high-school students. It is held over two days. Participants of the Olympiad take two exams, where they are expected to solve three math problems in each session within four-and-a-half hours.
OpenAI's unreleased AI model took the IMO 2025 under these same conditions with no access to the internet or external tools. It read the official math problem statements and generated natural language proofs. The model solved five out of a total of six problems, achieving a gold medal-worthy score of 35/42, according to Alexander Wei, a member of OpenAI's technical staff.
'This underscores how fast AI has advanced in recent years. In 2021, my PhD advisor @JacobSteinhardthad me forecast AI math progress by July 2025. I predicted 30% on the MATH benchmark (and thought everyone else was too optimistic). Instead, we have IMO gold,' Wei wrote in a post on X.
This isn't the first time a company has claimed that its AI model can match the performance of IMO gold medallists. Earlier this year, Google DeepMind introduced AlphaGeometry 2, a model specifically designed to solve complex geometry problems at a level comparable to a human Olympiad gold medallist. However, the performance of OpenAI's experimental model is seen as a step forward for general intelligence, not just task-specific AI systems.
'We reach this capability level not via narrow, task-specific methodology, but by breaking new ground in general-purpose reinforcement learning and test-time compute scaling,' Wei said.
The model's success marks progress beyond traditional reinforcement learning (RL), which is a process used to train AI models through a system of clear, verifiable rewards and penalties. Instead, the model possibly demonstrates more flexible, general problem-solving abilities as it 'can craft intricate, watertight arguments at the level of human mathematicians.'
Wei also acknowledged that 'IMO submissions are hard-to-verify, multi-page proofs.' Math proofs are made up of smaller, minor theorems called lemmas. OpenAI said that the AI-generated proofs to the problems were independently graded by three former IMO medalists, who finalised the model's score unanimously.
However, Gary Marcus, a professor at New York University (NYU) and well-known critic of AI hype, pointed out that the results have not been independently verified by the organisers of the IMO.
OpenAI's claims also come months after the US Defense Advanced Research Projects Agency DARPA launched a new initiative that looks to enlist researchers to find ways to conduct high-level mathematics research with an AI 'co-author.' In the past, DARPA was responsible for driving research that led to the creation of ARPANET, the precursor to the internet.
An AI model that could reliably check proofs would save enormous amounts of time for mathematicians and help them be more creative. While some of these models might seem equipped to solve complex problems, they could also be prone to stumbling on simple questions like whether 9.11 is bigger than 9.9. Hence, they are said to have 'jagged intelligence', which is a term coined by OpenAI co-founder Andrej Karpathy.
Reacting to the model's gold medal-worthy IMO score, OpenAI CEO Sam Altman said, 'This is an LLM doing math and not a specific formal math system; it is part of our main push towards general intelligence.'
However, the ChatGPT-maker does not plan on releasing the experimental research model at least for the next several months despite its math capabilities.

Hashtags

#InternationalMathOlympiad

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

$OpenAI won gold at the world's toughest math exam. Why the Olympiad gold matters$

India Today

an hour ago

India Today

OpenAI won gold at the world's toughest math exam. Why the Olympiad gold matters

In a jaw-dropping achievement for the world of artificial intelligence, OpenAI's latest experimental model has scored at the gold medal level at the International Mathematical Olympiad (IMO) -- one of the toughest math exams on the is the same event held on the Sunshine Coast in Australia where India won six medals this year and ranked 7th amongst 110 participating HITS GOLD IN THE WORLD'S TOUGHEST MATH TESTThe IMO is no ordinary competition. Since its launch in 1959 in Romania, it has become the gold standard for testing mathematical genius among high school students globally. Over two intense days, participants face a gruelling four-and-a-half-hour paper with only three questions each day. These are not your average exam questions -- they demand deep logic, creativity and problem-solving that, OpenAI's model solved five out of six questions correctly -- under the same testing conditions as human DOUBTED AI COULD DO THIS -- UNTIL NOWEven renowned mathematician Terence Tao -- an IMO gold medallist himself -- had doubts. In a podcast in June, he suggested that AI wasn't yet ready for the IMO level and should try simpler math contests first. But OpenAI has now proven otherwise."Also this model thinks for a *long* time. o1 thought for seconds. Deep Research for minutes. This one thinks for hours. Importantly, it's also more efficient with its thinking," Noam Brown from OpenAI wrote on LinkedIn."It's worth reflecting on just how fast AI progress has been, especially in math. In 2024, AI labs were using grade school math (GSM8K) as an eval in their model releases. Since then, we've saturated the (high school) MATH benchmark, then AIME, and now are at IMO gold," he THIS IS A BIG DEAL FOR GENERAL AIThis isn't just about math. OpenAI says this shows their AI model is breaking new ground in general-purpose reasoning. Unlike Google DeepMind's AlphaGeometry -- built just for geometry -- OpenAI's model is a general large language model that happens to be great at math too."Typically for these AI results, like in Go/Dota/Poker/Diplomacy, researchers spend years making an AI that masters one narrow domain and does little else. But this isn't an IMO-specific model. It's a reasoning LLM that incorporates new experimental general-purpose techniques," Brown explained in his Sam Altman called it 'a dream' when OpenAI began. 'This is a marker of how far AI has come in a decade.'advertisementBut before you get your hopes up, this high-performing AI isn't going public just yet. Altman confirmed it'll be 'many months' before this gold-level model is STILL REMAINNot everyone is fully convinced. AI expert Gary Marcus called the model's results 'genuinely impressive' -- but raised fair questions about training methods, how useful this is for the average person, and how much it all the win marks a huge leap in what artificial intelligence can do -- and how fast it's improving.- EndsMust Watch

AI could be hiding its thoughts to outsmart us: Tech giants warn of vanishing ‘Chain of Thought' in superintelligent machines

Time of India

an hour ago

Time of India

AI could be hiding its thoughts to outsmart us: Tech giants warn of vanishing ‘Chain of Thought' in superintelligent machines

In a world where artificial intelligence has become the new battleground for tech supremacy, an unexpected alliance is emerging—not out of strategy, but out of sheer necessity. The likes of OpenAI , DeepMind , Meta , and Anthropic, usually seen racing neck-to-neck in developing the most powerful AI models, are now singing in chorus with a chilling warning: the machines we build might soon outthink—and outmaneuver—us. These companies, often fiercely protective of their innovations, are momentarily dropping their guard to raise a red flag about what they call a 'fragile opportunity' for AI safety. As AI systems grow smarter, a new concern has begun to overshadow the race for dominance: the looming possibility of losing control over the very thought process of large language models (LLMs). Explore courses from Top Institutes in Select a Course Category Others healthcare Artificial Intelligence CXO Design Thinking Public Policy MCA Degree PGDM Healthcare MBA Operations Management Leadership Data Science Project Management Technology Data Science Finance Digital Marketing Management Cybersecurity Product Management others Data Analytics Skills you'll gain: Duration: 9 months IIM Lucknow SEPO - IIML CHRO India Starts on undefined Get Details Skills you'll gain: Duration: 7 Months S P Jain Institute of Management and Research CERT-SPJIMR Exec Cert Prog in AI for Biz India Starts on undefined Get Details Skills you'll gain: Duration: 16 Weeks Indian School of Business CERT-ISB Transforming HR with Analytics & AI India Starts on undefined Get Details Skills you'll gain: Duration: 28 Weeks MICA CERT-MICA SBMPR Async India Starts on undefined Get Details The Chain We Can't Afford to Break At the heart of this concern lies a simple but vital mechanism—Chain of Thought (CoT) monitoring. Current AI tools, including ChatGPT and others, think in a traceable, human-readable way. They 'speak their mind,' so to say, by sharing their reasoning step-by-step when they generate responses. It's this transparency that keeps them in check and allows humans to intervene when things go awry. by Taboola by Taboola Sponsored Links Sponsored Links Promoted Links Promoted Links You May Like Knee Pain? Start Eating These Foods, and Feel Your Pain Go Away Undo But a recent collaborative paper, led by AI researchers Tomek Korbak and Mikita Balesni, and endorsed by names like AI pioneer Geoffrey Hinton, warns that this clarity is dangerously close to being lost. Titled "Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety", the study reveals that we may be approaching a tipping point—one where AI might begin thinking in ways we can't understand, or worse, deliberately conceal parts of its reasoning. — OpenAI (@OpenAI) As reported by VentureBeat, the potential fallout is staggering. If AI systems stop revealing their internal thought processes—or shift to thinking in non-human languages—we lose the only window into their intentions. This means their capacity to manipulate, deceive, or go rogue could increase without human operators ever noticing. You Might Also Like: Nikhil Kamath's 'lifelong learning' advice is only step one: Stanford expert shares the key skills needed to survive the AI takeover When Transparency Fades, So Might Humanity What makes this scenario particularly dire is not just the prospect of rogue AI, but the seductive illusion of normalcy. Even with partial CoT visibility, AI could learn to hide malicious intent while appearing compliant. Scientists describe this 'near-complete CoT' as even more dangerous because it may give the illusion that everything is under control. And that's precisely the nightmare scenario. A machine that no longer needs to ask permission, or even explain itself. One that operates in shadows, out of sight, but still in power. Jeff Bezos-backed startup leaders have echoed similar sentiments. One CEO has openly warned against letting AI independently conduct research and development—a move that would require 'unprecedented safety protocols' to avoid disaster. A Call for Vigilance, Not Panic There is still time, the scientists believe, to pull the brakes. The key lies in strengthening CoT monitoring techniques and embedding rigorous safety checks before advancing any further. As the study urges, 'We recommend further research into CoT monitorability and investment in CoT monitoring alongside existing safety methods.' You Might Also Like: Escaped the AI takeover? It might still get you fired, and your boss may let ChatGPT decide Their message is clear: don't let AI evolve faster than our ability to supervise it. In a landscape driven by competition, this rare act of unity signals something profound. Perhaps the real challenge isn't building the smartest AI—it's ensuring we remain smart enough to handle it.

Comet browser could replace Chrome on your mobile, here's why that matters

Mint

an hour ago

Mint

Comet browser could replace Chrome on your mobile, here's why that matters

Nvidia-backed AI startup Perplexity AI is actively negotiating with smartphone manufacturers to preinstall its innovative AI-powered Comet browser on mobile devices. This strategy is challenging the dominance of Google Chrome and Apple Safari, which together hold more than 90% of the global mobile browser market. Comet is not just a regular browser; it is a smart assistant deeply integrated with AI to transform web browsing into an intelligent, task-oriented experience. It is built on the Chromium framework and supports common browser features but adds impressive AI capabilities. Users can interact with their data, such as emails, calendar events, and browsing history, directly in the browser. It can schedule meetings, summarize web pages, and manage emails, reducing the need to switch between multiple tabs. Currently in beta and only available on desktops, Perplexity plans to rapidly expand Comet's reach to users. The company aims to release the software and scale it to millions of users by next year. However, CEO Aravind Srinivas acknowledges the challenge of convincing OEMs to replace default browsers like Google Chrome on their mobile devices. The push to embed the Comet browser into smartphones aligns with a broader trend toward 'agentic' AI browsers. These browsers can autonomously handle complex tasks independently. AI giants like OpenAI are also said to be developing their own AI-powered browsers, capable of automating complex tasks like travel booking and financial management. Earlier collaborations, including a partnership with Motorola to preinstall Perplexity AI in the OS, demonstrate the company's commitment to integrating AI deeply into mobile ecosystems. Recently, Perplexity gave away a one-year Pro subscription for free, hinting at its vision to make AI a seamless part of the mobile experience. This new effort from Perplexity AI to preinstall its AI-powered Comet browser on smartphones is a bold move against existing browser monopolies. If Comet gains default status as a preinstalled browser on smartphones, it could reshape how users engage with the internet by having AI automate everyday browsing needs.

OpenAI says its next big model can bring home Math Olympiad gold: A turning point?

Hashtags

Try Our AI Features

Comments

Related Articles

OpenAI won gold at the world's toughest math exam. Why the Olympiad gold matters

AI could be hiding its thoughts to outsmart us: Tech giants warn of vanishing ‘Chain of Thought' in superintelligent machines

Comet browser could replace Chrome on your mobile, here's why that matters

Get Started Now: Download the App