Latest news with #Wei


Hans India
a day ago
- Science
- Hans India
OpenAI's Experimental AI Matches Gold Medal Math Olympiad Performance, GPT-5 Launch Soon
In a remarkable step forward for artificial intelligence, an experimental large language model (LLM) from OpenAI has demonstrated gold medal-level performance at the 2025 International Math Olympiad (IMO). This milestone highlights how far AI's reasoning abilities have progressed, with OpenAI's CEO, Sam Altman, confirming that the highly anticipated GPT-5 will arrive soon. OpenAI researcher Alexander Wei announced the breakthrough on X, revealing that the experimental AI tackled five out of six problems from this year's IMO under authentic exam conditions. Scoring 35 out of 42 points, the model reached a level that would earn a human contestant a gold medal at the world's most challenging high school math competition. 'We evaluated our models on the 2025 IMO problems under the same rules as human contestants: two 4.5-hour exam sessions, no tools or internet, reading the official problem statements, and writing natural language proofs,' Wei stated. The IMO is widely recognised for its notoriously demanding problems that test the deepest levels of mathematical creativity and reasoning. Wei noted that the AI's performance marks a significant leap forward compared to previous benchmarks. 'We've now progressed from GSM8K (~0.1 min for top humans) MATH benchmark (~1 min) AIME (~10 mins) IMO (~100 mins),' he added, illustrating the scale of advancement. Independent grading by three former IMO medallists confirmed the model's solutions. The AI successfully solved problems P1 through P5 but did not complete P6. Wei made the model's detailed solutions public, pointing out its 'distinct style,' which reflects its experimental framework. What sets this achievement apart is the AI's ability to generate complex, human-like proofs. 'By going beyond the reinforcement learning paradigm of clear-cut, verifiable rewards we've obtained a model that can craft intricate, watertight arguments at the level of human mathematicians,' Wei explained. Despite this progress, OpenAI does not plan to release this specific IMO-level AI model to the public anytime soon. Wei clarified that while the company is gearing up for GPT-5's rollout, the Math Olympiad project remains separate and will continue behind closed doors. 'We don't plan to release a model with IMO gold level of capability for many months,' he added. Echoing Wei's excitement, OpenAI CEO Sam Altman called the accomplishment 'a significant marker of how far AI has come over the past decade.' He emphasised that the IMO-level AI is not a narrowly trained math tool but part of broader research pushing general-purpose reasoning forward. 'We are releasing GPT-5 soon, but want to set accurate expectations: this is an experimental model that incorporates new research techniques. We don't plan to release a model with IMO gold level of capability for many months,' Altman reiterated. Wei also looked back on his early forecasts, reflecting on how AI has surpassed expectations. 'In 2021, my PhD advisor, Jacob Steinhardt, had me forecast AI math progress by July 2025. I predicted 30 per cent on the MATH benchmark. Instead, we have IMO gold,' he wrote. He credited team members like Sheryl Hsu and Noam Brown for their contributions and congratulated this year's IMO competitors, noting that several OpenAI researchers were once IMO medallists themselves. With GPT-5 on the horizon and experimental AI solving Olympiad-level math, OpenAI's latest strides are set to reshape what's possible for machine reasoning in the years ahead.


India Today
2 days ago
- Science
- India Today
Sam Altman says OpenAI LLM achieved IMO gold-level Math skills, GPT-5 launch coming soon
OpenAI model scored 35/42 in 2025 IMO mock test Evaluated under same conditions as human participants GPT-5 coming soon, but won't match IMO model's capabilities An experimental large language model (LLM) developed by OpenAI has achieved gold medal-level performance at the 2025 International Math Olympiad (IMO), setting a new benchmark in mathematical reasoning for AI systems. Announcing the milestone, OpenAI researcher Alexander Wei posted on X that the model solved five out of six problems from the latest IMO under human exam conditions. The model earned 35 out of 42 possible points, a score that would qualify for a gold medal at the real competition. 'We evaluated our models on the 2025 IMO problems under the same rules as human contestants: two 4.5 hour exam sessions, no tools or internet, reading the official problem statements, and writing natural language proofs,' Wei explained. The IMO is regarded as the most prestigious high school maths competition globally, known for its notoriously complex problems. Wei pointed out that such problems demand extended creative reasoning and that achieving gold-level performance represents a leap from earlier benchmarks. 'We've now progressed from GSM8K (~0.1 min for top humans) MATH benchmark (~1 min) AIME (~10 mins) IMO (~100 mins),' he said. Submissions were graded independently by three former IMO medallists, who unanimously validated the model's solutions. According to Wei, 'the model solved P1 through P5; it did not produce a solution for P6.' He shared the model's answers publicly, noting its 'distinct style,' owing to its experimental nature. Wei said what makes the result even more impressive is that IMO proofs are long, complex and hard to verify. "By going beyond the reinforcement learning paradigm of clear-cut, verifiable rewards we've obtained a model that can craft intricate, watertight arguments at the level of human mathematicians.' The LLM that achieved this result will not be released publicly any time soon. Wei clarified that while OpenAI is preparing to launch GPT-5, this IMO-level model is part of a different research track. 'We don't plan to release a model with IMO gold level of capability for many months.' OpenAI CEO Sam Altman echoed this in a follow-up post, calling the achievement 'a significant marker of how far AI has come over the past decade.' He clarified that this model is not a specialised maths system, but a general-purpose reasoning model. 'We are releasing GPT-5 soon but want to set accurate expectations: this is an experimental model that incorporates new research techniques we don't plan to release a model with IMO gold level of capability for many months,' Altman added. Looking back, Wei also reflected on how far AI progress has exceeded expectations. 'In 2021, my PhD advisor Jacob Steinhardt had me forecast AI math progress by July 2025. I predicted 30 per cent on the MATH benchmark Instead, we have IMO gold.' He credited collaborators including Sheryl Hsu and Noam Brown, and concluded by congratulating all 2025 IMO participants, noting that many OpenAI researchers are former IMO medallists themselves. An experimental large language model (LLM) developed by OpenAI has achieved gold medal-level performance at the 2025 International Math Olympiad (IMO), setting a new benchmark in mathematical reasoning for AI systems. Announcing the milestone, OpenAI researcher Alexander Wei posted on X that the model solved five out of six problems from the latest IMO under human exam conditions. The model earned 35 out of 42 possible points, a score that would qualify for a gold medal at the real competition. 'We evaluated our models on the 2025 IMO problems under the same rules as human contestants: two 4.5 hour exam sessions, no tools or internet, reading the official problem statements, and writing natural language proofs,' Wei explained. The IMO is regarded as the most prestigious high school maths competition globally, known for its notoriously complex problems. Wei pointed out that such problems demand extended creative reasoning and that achieving gold-level performance represents a leap from earlier benchmarks. 'We've now progressed from GSM8K (~0.1 min for top humans) MATH benchmark (~1 min) AIME (~10 mins) IMO (~100 mins),' he said. Submissions were graded independently by three former IMO medallists, who unanimously validated the model's solutions. According to Wei, 'the model solved P1 through P5; it did not produce a solution for P6.' He shared the model's answers publicly, noting its 'distinct style,' owing to its experimental nature. Wei said what makes the result even more impressive is that IMO proofs are long, complex and hard to verify. "By going beyond the reinforcement learning paradigm of clear-cut, verifiable rewards we've obtained a model that can craft intricate, watertight arguments at the level of human mathematicians.' The LLM that achieved this result will not be released publicly any time soon. Wei clarified that while OpenAI is preparing to launch GPT-5, this IMO-level model is part of a different research track. 'We don't plan to release a model with IMO gold level of capability for many months.' OpenAI CEO Sam Altman echoed this in a follow-up post, calling the achievement 'a significant marker of how far AI has come over the past decade.' He clarified that this model is not a specialised maths system, but a general-purpose reasoning model. 'We are releasing GPT-5 soon but want to set accurate expectations: this is an experimental model that incorporates new research techniques we don't plan to release a model with IMO gold level of capability for many months,' Altman added. Looking back, Wei also reflected on how far AI progress has exceeded expectations. 'In 2021, my PhD advisor Jacob Steinhardt had me forecast AI math progress by July 2025. I predicted 30 per cent on the MATH benchmark Instead, we have IMO gold.' He credited collaborators including Sheryl Hsu and Noam Brown, and concluded by congratulating all 2025 IMO participants, noting that many OpenAI researchers are former IMO medallists themselves. Join our WhatsApp Channel


Indian Express
2 days ago
- Science
- Indian Express
OpenAI says its next big model can bring home Math Olympiad gold: A turning point?
The value of AI for most users today lies in its ability to generate coherent, conversational language by applying probability theory to massive datasets. However, a future where AI models drive advances in fields like cryptography and space exploration by solving complex, multi-step mathematical problems, is now one step closer to reality. OpenAI on Saturday, July 19, announced that its experimental AI reasoning model earned enough points on this year's International Math Olympiad (IMO) to win a gold medal. Started in 1959 in Romania, the IMO is widely considered to be one of the hardest, most prestigious math competitions in the world for high-school students. It is held over two days. Participants of the Olympiad take two exams, where they are expected to solve three math problems in each session within four-and-a-half hours. OpenAI's unreleased AI model took the IMO 2025 under these same conditions with no access to the internet or external tools. It read the official math problem statements and generated natural language proofs. The model solved five out of a total of six problems, achieving a gold medal-worthy score of 35/42, according to Alexander Wei, a member of OpenAI's technical staff. 'This underscores how fast AI has advanced in recent years. In 2021, my PhD advisor @JacobSteinhardthad me forecast AI math progress by July 2025. I predicted 30% on the MATH benchmark (and thought everyone else was too optimistic). Instead, we have IMO gold,' Wei wrote in a post on X. This isn't the first time a company has claimed that its AI model can match the performance of IMO gold medallists. Earlier this year, Google DeepMind introduced AlphaGeometry 2, a model specifically designed to solve complex geometry problems at a level comparable to a human Olympiad gold medallist. However, the performance of OpenAI's experimental model is seen as a step forward for general intelligence, not just task-specific AI systems. 'We reach this capability level not via narrow, task-specific methodology, but by breaking new ground in general-purpose reinforcement learning and test-time compute scaling,' Wei said. The model's success marks progress beyond traditional reinforcement learning (RL), which is a process used to train AI models through a system of clear, verifiable rewards and penalties. Instead, the model possibly demonstrates more flexible, general problem-solving abilities as it 'can craft intricate, watertight arguments at the level of human mathematicians.' Wei also acknowledged that 'IMO submissions are hard-to-verify, multi-page proofs.' Math proofs are made up of smaller, minor theorems called lemmas. OpenAI said that the AI-generated proofs to the problems were independently graded by three former IMO medalists, who finalised the model's score unanimously. However, Gary Marcus, a professor at New York University (NYU) and well-known critic of AI hype, pointed out that the results have not been independently verified by the organisers of the IMO. OpenAI's claims also come months after the US Defense Advanced Research Projects Agency DARPA launched a new initiative that looks to enlist researchers to find ways to conduct high-level mathematics research with an AI 'co-author.' In the past, DARPA was responsible for driving research that led to the creation of ARPANET, the precursor to the internet. An AI model that could reliably check proofs would save enormous amounts of time for mathematicians and help them be more creative. While some of these models might seem equipped to solve complex problems, they could also be prone to stumbling on simple questions like whether 9.11 is bigger than 9.9. Hence, they are said to have 'jagged intelligence', which is a term coined by OpenAI co-founder Andrej Karpathy. Reacting to the model's gold medal-worthy IMO score, OpenAI CEO Sam Altman said, 'This is an LLM doing math and not a specific formal math system; it is part of our main push towards general intelligence.' However, the ChatGPT-maker does not plan on releasing the experimental research model at least for the next several months despite its math capabilities.

Business Insider
3 days ago
- Science
- Business Insider
OpenAI just won gold at the world's most prestigious math competition. Here's why that's a big deal.
OpenAI's latest experimental model is a math whiz, performing so well on an insanely difficult math exam that everyone's now talking about it. "I'm excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world's most prestigious math competition — the International Math Olympiad (IMO)," Alexander Wei, a member of OpenAI's technical staff, said on X. The International Math Olympiad is a global competition that began in 1959 in Romania and is now considered one of the hardest in the world. It's divided into two days, during which participants are given a four-and-a-half-hour exam, each with three questions. Some famous winners include Grigori Perelman, who helped advance geometry, and Terence Tao, recipient of the Fields Medal, the highest honor in mathematics. In June, Tao predicted on Lex Fridman's podcast that AI would not score high on the IMO. He suggested researchers shoot a bit lower. "There are smaller competitions. There are competitions where the answer is a number rather than a long-form proof," he said. Yet OpenAI's latest model solved five out of six of the problems correctly, working under the same testing conditions as humans, Wei said. Wei's colleague, Noam Brown, said the model displayed a new level of endurance during the exam. "IMO problems demand a new level of sustained creative thinking compared to past benchmarks," he said. "This model thinks for a long time." Wei said the model is an upgrade in general intelligence. The model's performance is "breaking new ground in general-purpose reinforcement learning," he said. DeepMind's AlphaGeometry, by contrast, is specifically designed just to do math. "This is an LLM doing math and not a specific formal math system; it is part of our main push towards general intelligence," Altman said on X. "When we first started openai, this was a dream but not one that felt very realistic to us; it is a significant marker of how far AI has come over the past decade," Altman wrote, referring to the model's performance at IOM. Altman added that a model with a "gold level of capability" will not be available to the public for "many months." The achievement is an example of how fast the technology is developing. Just last year, "AI labs were using grade school math" to evaluate models, Brown said. And tech billionaire Peter Thiel said last year it would take at least another three years before AI could solve US Math Olympiad problems. Still, there are always skeptics. Gary Marcus, a well-known critic of AI hype, called the model's performance "genuinely impressive" on X. But he also posed several questions about how the model was trained, the scope of its "general intelligence," the utility for the general population, and the cost per problem. Marcus also said that the IMO has not independently verified these results.

Engadget
3 days ago
- Science
- Engadget
OpenAI's experimental model achieved gold at the International Math Olympiad
OpenAI has achieved "gold medal-level performance" at the International Math Olympiad, notching another important milestone for AI's fast-paced growth. Alexander Wei, a research scientist at OpenAI working on LLMs and reasoning, posted on X that an experimental research model delivered on this "longstanding grand challenge in AI." According to Wei, an unreleased model from OpenAI was able to solve five out of six problems at one of the world's longest-standing and prestigious math competitions, earning 35 out of 42 points total. The International Math Olympiad (IMO) sees countries send up to six students to solve extremely difficult algebra and pre-calculus problems. These exercises are seemingly simple but usually require some creativity to score the highest marks on each problem. For this year's competition, only 67 of the 630 total contestants received gold medals, or roughly 10 percent. AI is often tasked with tackling complex datasets and repetitive actions, but it usually falls short when it comes to solving problems that require more creativity or complex decision-making. However, with the latest IMO competition, OpenAI says its model was able to handle complicated math problems with human-like reasoning. "By doing so, we've obtained a model that can craft intricate, watertight arguments at the level of human mathematicians," Wei wrote on X. Wei and Sam Altman, CEO of OpenAI, both added that the company doesn't expect to release anything with this level of math capability for several months. That means the upcoming GPT-5 will likely be an improvement from its predecessor, but it won't feature that same impressive capability to compete in the IMO.