
Explained: Why this mathematician thinks OpenAI isn't acing the International Mathematical Olympiad — and might be ‘cheating' to win gold
AI isn't taking the real test: GPT models 'solving' International Math
Olympiad
(IMO) problems are often operating under very different conditions—rewrites, retries, human edits.
Tao's warning: Fields Medalist
Terence Tao
says comparing these AI outputs to real IMO scores is misleading because the rules are entirely different.
Behind the curtain: Teams often cherry-pick successes, rewrite problems, and discard failures before showing the best output. It's not cheating, but it's not fair play: The AI isn't sitting in silence under timed pressure—it's basically Iron Man in a school exam hall.
Main takeaway: Don't mistake polished AI outputs under ideal lab conditions for human-level reasoning under Olympiad pressure.
Led Zeppelin
once sang, 'There's a lady who's sure all that glitters is gold.'
But in the age of artificial intelligence, even the shimmer of mathematical brilliance needs closer scrutiny. These days, social media lights up every time a language model like GPT-4 is said to have solved a problem from the International Mathematical Olympiad (IMO) — a competition so elite it makes Ivy League entrance exams look like warm-up puzzles.
'This AI solved an IMO question!'
'Superintelligence is here!'
'We're witnessing the birth of a digital Newton!'
Or so the chorus goes.
But one of the greatest living mathematicians isn't singing along. Terence Tao, a Fields Medal–winning professor at UCLA, has waded into the hype with a calm, clinical reminder: AI models aren't playing by the same rules. And if the rules aren't the same, the gold medal doesn't mean the same thing.
The Setup: What the IMO Actually Demands
The International Mathematical Olympiad is the Olympics of high school math. Students from around the world train for years to face six unspeakably hard problems over two days. They get 4.5 hours per day, no calculators, no internet, no collaboration — just a pen, a problem, and their own mind.
Solving even one problem in full is an achievement. Getting five perfect scores earns you gold. Solve all six and you enter the realm of myth — which, incidentally, is where Tao himself resides. He won a gold medal in the IMO at age 13.
So when an AI is said to 'solve' an IMO question, it's important to ask: under what conditions?
Enter Tao: The IMO, Rewritten (Literally)
In a detailed Mastodon post, Tao explains that many AI demonstrations that showcase Olympiad-level problem solving do so under dramatically altered conditions. He outlines a scenario that mirrors what's actually happening behind the scenes:
'The team leader… gives them days instead of hours to solve a question, lets them rewrite the question in a more convenient formulation, allows calculators and internet searches, gives hints, lets all six team members work together, and then only submits the best of the six solutions… quietly withdrawing from problems that none of the team members manage to solve.'
In other words: cherry-picking, rewording, retries, collaboration, and silence around failure.
It's not quite cheating — but it's not the IMO either. It's an AI-friendly reconstruction of the Olympiad, where the scoreboard is controlled by the people training the system.
From Bronze to Gold (If You Rewrite the Test)
Tao's criticism isn't just about fairness — it's about what we're really evaluating.
He writes,
'A student who might not even earn a bronze medal under the standard IMO rules could earn a 'gold medal' under these alternate rules, not because their intrinsic ability has improved, but because the rules have changed.'
This is the crux. AI isn't solving problems like a student. It's performing in a lab, with handlers, retries, and tools. What looks like genius is often a heavily scaffolded pipeline of failed attempts, reruns, and prompt rewrites. The only thing the public sees is the polished output.
Tao doesn't deny that AI has made remarkable progress. But he warns against blurring the lines between performance under ideal conditions and human-level problem-solving in strict, unforgiving settings.
Apples to Oranges — and Cyborg Oranges
Tao is careful not to throw cold water on AI research. But he urges a reality check.
'One should be wary of making apples-to-apples comparisons between the performance of various AI models (or between such models and the human contestants) unless one is confident that they were subject to the same set of rules.'
A tweet that says 'GPT-4 solved this problem' often omits what really happened:
– Was the prompt rewritten ten times?
– Did the model try and fail repeatedly?
– Were the failures silently discarded?
– Was the answer chosen and edited by a human?
Compare that to a teenager in an exam hall, sweating out one solution in 4.5 hours with no safety net. The playing field isn't level — it's two entirely different games.
The Bottom Line
Terence Tao doesn't claim that AI is incapable of mathematical insight. What he insists on is clarity of conditions. If AI wants to claim a gold medal, it should sit the same exam, with the same constraints, and the same risks of failure.
Right now, it's as if Iron Man entered a sprint race, flew across the finish line, and people started asking if he's the next
Usain Bolt
.
The AI didn't cheat. But someone forgot to mention it wasn't really racing.
And so we return to that Led
Zeppelin
lyric: 'There's a lady who's sure all that glitters is gold.' In 2025, that lady might be your algorithmic feed. And that gold? It's probably just polished scaffolding.
FAQ: AI, the IMO, and Terence Tao's Critique
Q1: What is the International Mathematical Olympiad (IMO)?
It's the world's toughest math competition for high schoolers, with six extremely challenging problems solved over two 4.5-hour sessions—no internet, no calculators, no teamwork.
Q2: What's the controversy with AI and IMO questions?
AI models like GPT-4 are shown to 'solve' IMO problems, but they do so with major help: problem rewrites, unlimited retries, internet access, collaboration, and selective publishing of only successful attempts.
Q3: Who raised concerns about this?
Terence Tao, one of the greatest mathematicians alive and an IMO gold medalist himself, called out this discrepancy in a Mastodon post.
Q4: Is this AI cheating?
Not exactly. But Tao argues that changing the rules makes it a different contest altogether—comparing lab-optimised AI to real students is unfair and misleading.
Q5: What's Tao's main point?
He urges clarity. If we're going to say AI 'solved' a problem, we must also disclose the conditions—otherwise, it's like comparing a cyborg sprinter to a high school track star and pretending they're equals.
Q6: Does Tao oppose AI?
No. He recognises AI's impressive progress in math, but wants honesty about what it means—and doesn't mean—for genuine problem-solving ability.
Q7: What should change?
If AI is to be judged against human benchmarks like the IMO, it must be subjected to the same constraints: time limits, no edits, no retries, no external tools.
Tao's verdict? If you want to claim gold, don't fly across the finish line in an Iron Man suit and pretend you ran.
AI Masterclass for Students. Upskill Young Ones Today!– Join Now
Hashtags

Try Our AI Features
Explore what Daily8 AI can do for you:
Comments
No comments yet...
Related Articles


Time of India
8 hours ago
- Time of India
8 Indian-origin AI and ML experts in the US: Where did they study
Eight Indian-origin researchers shaping AI and ML in the US. (AI Image) In the dynamic field of artificial intelligence, researchers of Indian origin have played a central role in shaping some of the most impactful innovations of the past two decades. Their contributions range from foundational advances in machine learning and computational theory to breakthroughs in natural language processing and computer vision. These scholars' academic journeys reflect a deep integration across premier institutions in India, the UK, and the US, underscoring the global nature of AI research. Eight Indian-origin researchers who have worked or are currently working in the US are highlighted below, focusing on where they studied and their key academic contributions. Ashish Vaswani Ashish Vaswani was born in India and completed his undergraduate studies at BIT Mesra. He received his PhD from the University of Southern California in 2014. Vaswani is best known as the lead author of the landmark paper "Attention Is All You Need," which introduced the Transformer architecture—now a fundamental component of modern AI systems, including large language models like GPT. His work has had a profound impact on the field of natural language processing. by Taboola by Taboola Sponsored Links Sponsored Links Promoted Links Promoted Links You May Like The Most Beautiful Women In The World Undo Jitendra Malik Jitendra Malik (born 1960) was born in India and completed his undergraduate degree in electrical engineering from the Indian Institute of Technology (IIT) Kanpur. He went on to earn his PhD at Stanford University in 1985. Malik is a professor at the University of California, Berkeley, and a renowned figure in computer vision. His work on image segmentation, object recognition, and deep visual understanding has helped define the field over several decades. Anima Anandkumar Anima Anandkumar was born in Mysore, India, and completed her undergraduate studies at IIT Madras. She later earned her PhD in electrical engineering and computer science in the US. Anandkumar is the Bren Professor of Computing at Caltech and former Director of AI Research at NVIDIA. Her contributions include work on tensor decomposition, deep learning architectures, and climate-focused AI models like FourCastNet. She is also an advocate for diversity and ethical AI. Vasant Honavar Vasant G. Honavar was born in India and earned his undergraduate degree from BMS College of Engineering, Bangalore University. He pursued his MS in electrical and computer engineering at Drexel University and completed his PhD in computer science at the University of Wisconsin–Madison. Honavar is currently a professor at Penn State University. His research spans machine learning, bioinformatics, and causal inference, with a focus on the integration of AI and data science across disciplines. Trapit Bansal Trapit Bansal completed his undergraduate education in mathematics and scientific computing at IIT Kanpur. He earned his PhD in computer science from the University of Massachusetts Amherst. Bansal has contributed to advances in reinforcement learning and chain-of-thought reasoning in large language models. He recently joined Meta's Superintelligence Lab, where he works on state-of-the-art AI systems and their alignment. Sanjeev Arora Sanjeev Arora (born 1968) was born in India and received his PhD in computer science from the University of California, Berkeley, in 1994 under the supervision of Umesh Vazirani. He is a professor at Princeton University and leads the Princeton Language and Intelligence Initiative. Arora's research spans approximation algorithms, learning theory, and computational complexity. He is especially known for his work on the PCP theorem and has been awarded the Gödel Prize twice for foundational contributions to theoretical computer science. Eshan Chattopadhyay Eshan Chattopadhyay is an Indian-origin computer scientist who earned his PhD from the University of Texas at Austin under the guidance of David Zuckerman. He is currently an associate professor at Cornell University. In 2025, he was awarded the prestigious Gödel Prize for constructing a two-source randomness extractor that works even with weak randomness—a breakthrough in theoretical computer science that has implications for secure AI computation and learning algorithms. Deepak Pathak Deepak Pathak completed his undergraduate degree at IIT Kanpur before moving to Carnegie Mellon University for graduate studies. He is currently an assistant professor at CMU and co-founder of Skild AI, a robotics company. His research focuses on embodied intelligence, self-supervised learning, and enabling AI agents to interact meaningfully with the physical world. His work bridges the gap between machine learning and real-world robotics. TOI Education is on WhatsApp now. Follow us here . Ready to navigate global policies? Secure your overseas future. Get expert guidance now!


Economic Times
9 hours ago
- Economic Times
Beta Release: Aristotle AI Promises Smarter Conversations
Vlad Tenev, the co-founder and CEO of Robinhood, decided to co-found an AI company. His new venture, Harmonic, is a different beast altogether: a startup focused on artificial intelligence, now entering public view with the beta launch of its chatbot app, on both iOS and Android, Aristotle is, at the surface level, another entrant in the increasingly saturated AI assistant space. Think ChatGPT, but with more branding flair. Harmonic pitches Aristotle as a reasoning-first model, but not just a chatbot that spits out facts, but one that aims to engage in deeper, more thoughtful conversations. The name alone sets expectations high, evoking philosophical inquiry rather than just transactional queries. Somewhat, the app interface is clean, and the conversation flow feels more nuanced than what you'd get from a baseline language model. Aristotle seems to push users to consider 'why' and 'how' more than just 'what.' It occasionally probes with follow-up questions, which can be refreshing - or redundant, depending on what you're looking said, what Harmonic is doing right is focusing. The company isn't chasing enterprise contracts or trying to be everything to everyone. This is a mobile-first, consumer-facing experience, optimized for individual users rather than institutions. In a market where many AI startups are pivoting toward enterprise sales just to survive, this alone sets Harmonic apart - for now. Aristotle achieved a gold medal performance on the 2025 International Math Olympiad (IMO) through a formal test (meaning the problems were translated into a machine‑readable format). Google and OpenAI also developed AI models that achieved gold medal performance on this year's IMO, but through informal tests taken in natural language. The model itself hasn't been open-sourced or benchmarked publicly, so it's hard to gauge technical merit beyond anecdotal usage. Tenev claims Aristotle is 'built to reason.'Ultimately, Aristotle is interesting. The app feels polished, the ideas behind it are ambitious, and Harmonic's commitment to building a truly conversational AI is commendable. But unless the product can show measurable improvements in logic, comprehension, or trustworthiness over established models, it risks becoming just another pretty interface on top of the same backend now, it's a beta worth keeping an eye on, but not because of what it is, but because of what it might evolve into.


Economic Times
9 hours ago
- Economic Times
GLM-4.5 vs DeepSeek: China's AI Cost War Just Got Personal
At the World AI Conference in Shanghai, (formerly Zhipu AI) launched its new open-source large language model, GLM‑4.5, and shook the market with a bold promise - cheaper, faster, and leaner than even China's current cost-leader, DeepSeek. In a global race dominated by compute efficiency and token economics, this move marks a turning point. GLM‑4.5 is built with agentic intelligence in mind, able to autonomously break down and complete multi-step tasks with less redundancy. It requires just eight Nvidia H20 chips and is half the size of DeepSeek's R1 model, which was already considered a breakthrough in operational efficiency. CEO Zhang Peng claims no further chip scaling is needed, a sharp contrast to the GPU-hungry practices of Western competitors. The cost efficiency is what's drawing the spotlight. has priced dramatically undercutting DeepSeek's model, and slashing costs compared to OpenAI's GPT‑4 or Gemini. That unlocks game-changing affordability for startups, product teams, and AI-driven platforms. launch plays into China's broader strategic bet on open-source AI dominance. With over 1,500 LLMs developed to date, China is leveraging lower-cost compute, government support, and model-sharing culture to put pressure on U.S. and European players. Whether you're a startup building a SaaS tool, a product team testing conversational AI, or an enterprise scaling internal automation, GLM‑4.5 offers a high-performance, low-cost alternative to traditional Western LLMs. Developers can easily integrate it into chatbots, agents, document summarizers, or AI copilots using open-source APIs, without burning through compute budgets. Its agentic design means you can offload complex multi-step workflows, such as code generation, customer support, or data analysis, with higher efficiency. The lean GPU requirement lowers the barrier for self-hosting or deploying in resource-constrained environments. Ultimately, GLM‑4.5 enables rapid iteration, reduced inference costs, and greater flexibility, especially for teams operating under tight margins or looking to scale without vendor lock-in. Even so, GLM‑4.5 raises a pivotal question: if high-quality AI can be built and deployed at a fraction of today's cost, what happens to the premium pricing strategies of the West? For budget-conscious developers and enterprises, the message is clear, value is shifting eastward.