AI breakthrough: AlphaGeometry 2 and Symbolic AI outperform Maths Olympiad gold medallists

29-04-2025

The world was astonished a year ago, but it is now shocked. Last year, AlphaGeometry, an AI problem solver developed by Google DeepMind, astonished the world by placing second in the International Mathematical Olympiad (IMO). The DeepMind team now claims that the performance of their improved system, AlphaGeometry 2, has surpassed that of the typical gold medallist. The findings are detailed in a preprint available on the arXiv service.
International Mathematical Olympiad
The International Mathematical Olympiad (IMO) is the world's most prominent mathematics competition. The inaugural competition, which took place in Romania in 1959 among seven Soviet Bloc nations, began to grow swiftly, reaching 50 nations in 1989 and surpassing 100 countries for the first time in 2009. The competition has always aimed to help school-age mathematicians improve their problem-solving abilities.
In India, the Homi Bhabha Centre for Science Education (HBCSE) organises the Mathematical Olympiad Programme on behalf of the National Board for Higher Mathematics (NBHM) of the Department of Atomic Energy (DAE), Government of India. The Indian team to compete in the international competition is chosen using a broad-based Indian Olympiad Qualifier in Mathematics (IOQM). For additional information, click here.
Questions are picked from four topic areas: algebra, combinatorics, geometry, and number theory, with no necessity or expectation that students can utilise calculus.
The competition consists of six problems. The tournament lasts two days and consists of three problems per day; each day, participants get four and a half hours to complete three questions. Each problem is worth 7 points, with a maximum of 42 points.
AI in the race
In 2024, the IMO was hosted in Bath, United Kingdom, with 609 high school students from 108 countries participating. Chinese student Haojia Shi finished first in the individual rankings with a perfect score- 42 points.
In the country rankings, the United States team came out on top, and China came in second. The human problem-solvers won 58 gold medals, 123 silver and 145 bronze. One of the event's highlights was the presence of two unofficial contestants: AlphaGeometry 2 and AlphaProof, both artificial intelligence algorithms built by Google DeepMind.
The two programs were able to solve four out of six tasks. Mathematician and Fields Medallist Timothy Gowers, a past IMO gold medallist, and mathematician Joseph K. Myers, another previous IMO gold medallist, evaluated the two AI systems' solutions using the same criteria as the human competitors. According to these standards, the programs received an excellent 28 points out of a potential 42 points, equivalent to a silver medal.
This means that the AI came close to earning a gold medal, which was granted for a score of 29 points or higher. Furthermore, just 60 pupils achieved higher completion scores. Furthermore, AlphaGeometry 2 solved the geometry problem correctly in just 19 seconds. Meanwhile, AlphaProof solved one number theory and two algebra problems, including one that only five human participants could figure out.
Training the tools
Training an AI requires a large quantity of data. AlphaProof's training was restricted by the amount of mathematical material accessible in a formal mathematical language. The DeepMind researchers then used the Gemini AI tool to translate millions of problems on the Internet that people have solved step-by-step in natural language into the Lean programming language, allowing the proof assistant to learn about them.
Using this huge data, AlphaProof was taught using reinforcement learning, as AI systems were taught to master chess, shogi, and Go previously. Reinforcement Learning (RL) is similar to instructing a dog. The dog (agent) learns tricks by performing actions (such as sitting). If it sits appropriately, it gets a treat (reward); otherwise, no treat. Over time, it learns which acts result in rewards.
Similarly, RL systems learn using trial and error to maximise rewards in tasks such as gaming or robotics. AlphaProof repeatedly competes with itself and improves step by step; if the process does not result in a win, it is penalised and learns to explore alternative techniques.
What works for number theory or algebra does not work in geometry, necessitating a new methodology. As a result, DeepMind created AlphaGeometry, a unique AI system designed to solve geometry difficulties.
The experts initially created an exhaustive list of geometric 'premises,' or basic building pieces of geometry, such as a triangle having three sides, and so on. Just as the Architect studies the design, AlphaGeometry's deduction engine algorithm evaluates the 'problems'. It picks the appropriate blocks (premises) assembled step by step to build the home (proof).
The AI was able to manipulate the geometric objects around a 2D plane, like adding a fourth point and converting a triangle into a quadrilateral or moving a point to change the triangle's height. The 'proof' is complete when all components fit together correctly, and the home is sturdy. Unlike trial-and-error learning (RL), this is equivalent to following an instruction manual with unlimited LEGO parts.
Going for gold
The DeepMind team has now produced an improved version, AlphaGeometry 2, that trains the model with more data and accelerates the process. The AI system is now able to solve linear equations.
With the upgrade, the AI was recently proved capable of answering 84% of all geometry problems set in IMOs during the last 25 years, compared to 54% for the previous version of AlphaGeometry. Future developments in AlphaGeometry will include dealing with mathematical problems containing inequalities and nonlinear equations, which will be necessary to solve geometry completely.
A team of researchers, including IIIT Hyderabad's Ponnurangam Kumaraguru, has made a breakthrough with their 'Symbolic AI', outperforming AlphaGeometry's capabilities. Furthermore, the hybrid symbolic AI, which complemented Wu's technique with AlphaGeometry, outperformed human gold medallists on IMO geometry problems.
This Symbolic AI system solves geometry problems by combining algebraic methods—primarily Wu's Method—with synthetic approaches, such as the deduction engine algorithm. The heart of this technique is 'Wu's method,' which is analogous to systematically completing a gigantic jigsaw puzzle.
Consider solving a jigsaw puzzle; it is challenging to complete if some elements (such as variables and equations in geometry) are concealed behind clutter. Thus, initial decluttering is valuable, such as sorting the puzzle pieces by colour/edge.
Wu's approach rearranges geometric equations into a more organised hierarchy. We can answer one equation at a time, just as we would place corner pieces in a jigsaw puzzle first, then use the results to simplify the next. Wu's Method simplifies complex geometry into a step-by-step assembly line.
'With very low computational requirements, this performs comparably to an IMO silver medalist. When combined with AlphaGeometry, the hybrid system successfully solves 27 out of 30 IMO problems,' according to Mr. Kumaraguru. 'This system is remarkably efficient — on most consumer laptops, with no access to a GPU, it can solve these problems within a few seconds. It also requires no learning or training phase.'
China's endeavours are not far behind. TongGeometry, a system for proposing and solving Euclidean geometry problems that bridge numerical and spatial reasoning, was developed by scholars at the Beijing Institute for General Artificial Intelligence (BIGAI) and the Institute for Artificial Intelligence, Peking University, Beijing. It has solved all International Mathematical Olympiad geometry problems for the last 30 years, outperforming gold medallists for the first time. 'Their work, including their analysis on novel problem generation, is quite interesting,' says Mr. Kumaraguru. However, he declined to express 'informed opinions' on their work due to the lack of publicly available material.
Are mathematicians redundant?
Are we nearing the point when mathematicians are obsolete? Gowers concurs: 'I would guess that we are still a breakthrough or two short of that.'
While the AlphaProof outperformed the humans, it took almost 60 hours to answer the problem, when the humans were only given 41/2 hours. If human competitors had been given that time for each task, they would surely have scored higher.
Another prerequisite is that problems were manually translated into the proof assistant Lean, meaning humans did the auto formalisation. In contrast, the AI program did the necessary mathematics.
Autoformalization converts ambiguous human language into precise logical or mathematical assertions that computers can reason about. For example, the sentence 'If it rains, then the ground gets wet' in plain English has to be translated into Propositional Logic: Rain → WetGround, which is symbolic form: First-Order Logic: ∀x (rain(x) → wetground(x)). Humans could do their own auto formalisation.
Furthermore, in the previous IMO, DeepMind's algorithms did not even attempt to solve combinatorial issues since they were difficult to transfer into programming languages like Lean. Alphageometry could only work with Euclidean plane geometry problems in 2D. Another significant inherent difficulty with AI is 'hallucinations', which are nonsensical or erroneous assertions that might occur, especially when dealing with intricate thinking.
(T.V. Venkateswaran is a science communicator and visiting faculty member at the Indian Institute of Science Education and Research, Mohali.)

Hashtags

Science

#IMO

#InternationalMathematicalOlympiad

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

OpenAI Sends Surprise Message To Employees, Announces Million-Dollar Bonuses To Retain AI Talent

India.com

10 minutes ago

India.com

OpenAI Sends Surprise Message To Employees, Announces Million-Dollar Bonuses To Retain AI Talent

OpenAI ChatGPT-5 Launch: ChatGPT maker OpenAI is giving huge bonus payouts to around 1,000 employees to retain AI talent. This is about one-third of its full-time staff. Just before the launch of GPT-5, CEO Sam Altman surprised employees with the news on Slack. According to The Verge, the two-year quarterly bonuses are for researchers and software engineers working in applied engineering, scaling, and safety teams. The bonus amount depends on the role and seniority. Top researchers will get millions, while engineers will receive hundreds of thousands. Payments will be made every quarter for two years and can be taken in cash, stock, or both. Why Open AI CEO Announces Million-Dollar Bonus? Salt Altman informed that the rise in compensation was a result of market dynamics, likely driven by the demand for AI talent. 'As we mentioned a few weeks ago, we have been looking at comp for our technical teams given the movement in the market,' The Verge cited Altman's message to employees as saying. 'We very much intend to keep increasing comp as we keep doing better and better as a company,' he wrote. 'But we wanted to be transparent about this one since it's a new thing for us,' he added. Tech giants and well-funded startups in Silicon Valley are intensifying competition for AI expertise, announcing bonuses to attract talent. Altman has recently lost several key researchers to Meta, while Elon Musk's xAI is also seeking to attract talent. India Is OpenAI's Second-Largest Market India is OpenAI's second-largest market in the world after the US, and it may well become its biggest market in the near future, according to its CEO Sam Altman. ChatGPT‑5 Available To All Customers ChatGPT‑5 is available to all users, with Plus subscribers getting more usage, and Pro subscribers getting access to GPT‑5 pro, a version with extended reasoning for even more comprehensive and accurate answers. 'GPT‑5 is a unified system with a smart, efficient model that answers most questions, a deeper reasoning model (GPT‑5 thinking) for harder problems, and a real‑time router that quickly decides which to use based on conversation type, complexity, tool needs, and your explicit intent,' the company noted. (With IANS Inputs)

OpenAI Announces Quarterly Bonuses To 1,000 Employees Amid GPT-5 Rollout

NDTV

10 minutes ago

NDTV

OpenAI Announces Quarterly Bonuses To 1,000 Employees Amid GPT-5 Rollout

ChatGPT maker OpenAI has announced massive bonus payouts for about 1,000 employees, which is approximately one-third of its full-time workforce. On the eve of GPT-5's launch, OpenAI CEO Sam Altman sent a surprise message to employees via communication platform Slack. A quarterly bonus for two years was awarded to researchers and software engineers in the firm's applied engineering, scaling, and safety domains, according to The Verge. The payouts vary by role and seniority. Top researchers will receive mid-single-digit millions as bonus, while engineers will get hundreds of thousands. Bonuses will be distributed quarterly for two years and can be received in stock, cash, or a combination of both. Altman informed that the rise in compensation was a result of market dynamics, likely driven by the demand for AI talent. "As we mentioned a few weeks ago, we have been looking at comp for our technical teams given the movement in the market," The Verge cited Altman's message to employees as saying. "We very much intend to keep increasing comp as we keep doing better and better as a company," he wrote. "But we wanted to be transparent about this one since it's a new thing for us," he added. Tech giants and well-funded startups in Silicon Valley are intensifying competition for AI expertise, announcing bonuses to attract talent. Altman has recently lost several key researchers to Meta, while Elon Musk's xAI is also seeking to attract talent. India is OpenAI's second-largest market in the world after the US, and it may well become its biggest market in the near future, according to its CEO Sam Altman. GPT-5 is available to all users, with Plus subscribers getting more usage, and Pro subscribers getting access to GPT-5 pro, a version with extended reasoning for even more comprehensive and accurate answers. "GPT-5 is a unified system with a smart, efficient model that answers most questions, a deeper reasoning model (GPT-5 thinking) for harder problems, and a real-time router that quickly decides which to use based on conversation type, complexity, tool needs, and your explicit intent," the company noted. (Except for the headline, this story has not been edited by NDTV staff and is published from a syndicated feed.)

Are you in a mid-career to senior job? Don't fear AI – you could have this important advantage

Indian Express

an hour ago

Indian Express

Are you in a mid-career to senior job? Don't fear AI – you could have this important advantage

Have you ever sat in a meeting where someone half your age casually mentions 'prompting ChatGPT' or 'running this through AI', and felt a familiar knot in your stomach? You're not alone. There's a growing narrative that artificial intelligence (AI) is inherently ageist, that older workers will be disproportionately hit by job displacement and are more reluctant to adopt AI tools. But such assumptions – especially that youth is a built-in advantage when it comes to AI – might not actually hold. While ageism in hiring is a real concern, if you have decades of work experience, your skills, knowledge and judgement could be exactly what's needed to harness AI's power – without falling into its traps. The research on who benefits most from AI at work is surprisingly murky, partly because it's still early days for systematic studies on AI and work. Some research suggests lower-skilled workers might have more to gain than high-skilled workers on certain straightforward tasks. The picture becomes much less clear under real-world conditions, especially for complex work that relies heavily on judgement and experience. Through our Skills Horizon research project, where we've been talking to Australian and global senior leaders across different industries, we're hearing a more nuanced story. Many older workers do experience AI as deeply unsettling. As one US-based CEO of a large multinational corporation told us: 'AI can be a form of existential challenge, not only to what you're doing, but how you view yourself.' But leaders are also observing an important and unexpected distinction: experienced workers are often much better at judging the quality of AI outputs. This might become one of the most important skills, given that AI occasionally hallucinates or gets things wrong. The CEO of a South American creative agency put it bluntly: 'Senior colleagues are using multiple AIs. If they don't have the right solution, they re-prompt, iterate, but the juniors are satisfied with the first answer, they copy, paste and think they're finished. They don't yet know what they are looking for, and the danger is that they will not learn what to look for if they keep working that way.' Experienced workers have a crucial advantage when it comes to prompting AI: they understand context and usually know how to express it clearly. While a junior advertising creative might ask an AI to 'Write copy for a sustainability campaign', a seasoned account director knows to specify 'Write conversational social media copy for a sustainable fashion brand targeting eco-conscious millennials, emphasising our client's zero-waste manufacturing process and keeping the tone authentic but not preachy'. This skill mirrors what experienced professionals do when briefing junior colleagues or freelancers: providing detailed instructions, accounting for audience, objectives, and constraints. It's a competency developed through years of managing teams and projects. Younger workers, despite their comfort with technology, may actually be at a disadvantage here. There's a crucial difference between using technology frequently and using it well. Many young people may become too accustomed to AI assistance. A survey of US teens this year found 72 per cent had used an AI companion app. Some children and teens are turning to chatbots for everyday decisions. Without the professional experience to recognise when something doesn't quite fit, younger workers risk accepting AI responses that feel right – effectively 'vibing' their work – rather than developing the analytical skills to evaluate AI usefulness. First, everyone benefits from learning more about AI. In our time educating everyone from students to senior leaders and CEOs, we find that misunderstandings about how AI works have little to do with age. A good place to start is reading up on what AI is and what it can do for you: What is AI? Where does AI come from? How does AI learn? What can AI do? What makes a good AI prompt? If you're not even sure which AI platform to try, we would recommend testing the most prominent ones, OpenAI's ChatGPT, Anthropic's Claude, and Google's Gemini. If you're an experienced worker feeling threatened by AI, lean into your strengths. Your decades of experience with delegation, context-setting, and critical evaluation are exactly what AI tools need. Start small. Pick one regular work task and experiment with AI assistance, using your judgement to evaluate and refine outputs. Practice prompting like you're briefing a junior colleague: be specific about context, constraints, and desired outcomes, and repeat the process as needed. Most importantly, don't feel threatened. In a workplace increasingly filled with AI-generated content, your ability to spot what doesn't quite fit, and to know what questions to ask, has never been more valuable.