Inside the Secret Meeting Where Mathematicians Struggled to Outsmart AI

Yahoo15 hours ago

On a weekend in mid-May, a clandestine mathematical conclave convened. Thirty of the world's most renowned mathematicians traveled to Berkeley, Calif., with some coming from as far away as the U.K. The group's members faced off in a showdown with a 'reasoning' chatbot that was tasked with solving problems they had devised to test its mathematical mettle. After throwing professor-level questions at the bot for two days, the researchers were stunned to discover it was capable of answering some of the world's hardest solvable problems. 'I have colleagues who literally said these models are approaching mathematical genius,' says Ken Ono, a mathematician at the University of Virginia and a leader and judge at the meeting.
The chatbot in question is powered by o4-mini, a so-called reasoning large language model (LLM). It was trained by OpenAI to be capable of making highly intricate deductions. Google's equivalent, Gemini 2.5 Flash, has similar abilities. Like the LLMs that powered earlier versions of ChatGPT, o4-mini learns to predict the next word in a sequence. Compared with those earlier LLMs, however, o4-mini and its equivalents are lighter-weight, more nimble models that train on specialized datasets with stronger reinforcement from humans. The approach leads to a chatbot capable of diving much deeper into complex problems in math than traditional LLMs.
To track the progress of o4-mini, OpenAI previously tasked Epoch AI, a nonprofit that benchmarks LLMs, to come up with 300 math questions whose solutions had not yet been published. Even traditional LLMs can correctly answer many complicated math questions. Yet when Epoch AI asked several such models these questions, which were dissimilar to those they had been trained on, the most successful were able to solve less than 2 percent, showing these LLMs lacked the ability to reason. But o4-mini would prove to be very different.
[Sign up for Today in Science, a free daily newsletter]
Epoch AI hired Elliot Glazer, who had recently finished his math Ph.D., to join the new collaboration for the benchmark, dubbed FrontierMath, in September 2024. The project collected novel questions over varying tiers of difficulty, with the first three tiers covering undergraduate-, graduate- and research-level challenges. By February 2025, Glazer found that o4-mini could solve around 20 percent of the questions. He then moved on to a fourth tier: 100 questions that would be challenging even for an academic mathematician. Only a small group of people in the world would be capable of developing such questions, let alone answering them. The mathematicians who participated had to sign a nondisclosure agreement requiring them to communicate solely via the messaging app Signal. Other forms of contact, such as traditional e-mail, could potentially be scanned by an LLM and inadvertently train it, thereby contaminating the dataset.
The group made slow, steady progress in finding questions. But Glazer wanted to speed things up, so Epoch AI hosted the in-person meeting on Saturday, May 17, and Sunday, May 18. There, the participants would finalize the final batch of challenge questions. Ono split the 30 attendees into groups of six. For two days, the academics competed against themselves to devise problems that they could solve but would trip up the AI reasoning bot. Each problem the o4-mini couldn't solve would garner the mathematician who came up with it a $7,500 reward.
By the end of that Saturday night, Ono was frustrated with the bot, whose unexpected mathematical prowess was foiling the group's progress. 'I came up with a problem which experts in my field would recognize as an open question in number theory—a good Ph.D.-level problem,' he says. He asked o4-mini to solve the question. Over the next 10 minutes, Ono watched in stunned silence as the bot unfurled a solution in real time, showing its reasoning process along the way. The bot spent the first two minutes finding and mastering the related literature in the field. Then it wrote on the screen that it wanted to try solving a simpler 'toy' version of the question first in order to learn. A few minutes later, it wrote that it was finally prepared to solve the more difficult problem. Five minutes after that, o4-mini presented a correct but sassy solution. 'It was starting to get really cheeky,' says Ono, who is also a freelance mathematical consultant for Epoch AI. 'And at the end, it says, 'No citation necessary because the mystery number was computed by me!''
Defeated, Ono jumped onto Signal early that Sunday morning and alerted the rest of the participants. 'I was not prepared to be contending with an LLM like this,' he says, 'I've never seen that kind of reasoning before in models. That's what a scientist does. That's frightening.'
Although the group did eventually succeed in finding 10 questions that stymied the bot, the researchers were astonished by how far AI had progressed in the span of one year. Ono likened it to working with a 'strong collaborator.' Yang Hui He, a mathematician at the London Institute for Mathematical Sciences and an early pioneer of using AI in math, says, 'This is what a very, very good graduate student would be doing—in fact, more.'
The bot was also much faster than a professional mathematician, taking mere minutes to do what it would take such a human expert weeks or months to complete.
While sparring with o4-mini was thrilling, its progress was also alarming. Ono and He express concern that the o4-mini's results might be trusted too much. 'There's proof by induction, proof by contradiction, and then proof by intimidation,' He says. 'If you say something with enough authority, people just get scared. I think o4-mini has mastered proof by intimidation; it says everything with so much confidence.'
By the end of the meeting, the group started to consider what the future might look like for mathematicians. Discussions turned to the inevitable 'tier five'—questions that even the best mathematicians couldn't solve. If AI reaches that level, the role of mathematicians would undergo a sharp change. For instance, mathematicians may shift to simply posing questions and interacting with reasoning-bots to help them discover new mathematical truths, much the same as a professor does with graduate students. As such, Ono predicts that nurturing creativity in higher education will be a key in keeping mathematics going for future generations.
'I've been telling my colleagues that it's a grave mistake to say that generalized artificial intelligence will never come, [that] it's just a computer,' Ono says. 'I don't want to add to the hysteria, but in many ways these large language models are already outperforming most of our best graduate students in the world.'

Hashtags

#UniversityofVirginia

#Glazer

#KenOno

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

OpenAI Wants to get College Kids Hooked on AI

Gizmodo

an hour ago

Gizmodo

OpenAI Wants to get College Kids Hooked on AI

AI chatbots like OpenAI's ChatGPT have been shown repeatedly to provide false information, hallucinate completely made-up sources and facts, and lead people astray with their confidently wrong answers to questions. For that reason, AI tools are viewed with skepticism by many educators. So, of course, OpenAI and its competitors are targeting colleges and pushing its services on students—concerns be damned. According to the New York Times, OpenAI is in the midst of a major push to make ChatGPT a fixture on college campuses, replacing many aspects of the college experience with AI alternatives. According to the report, the company wants college students to have a 'personalized AI account' as soon as they step on campus, same as how they receive a school email address. It envisions ChatGPT serving as everything from a personal tutor to a teacher's aide to a career assistant that helps students find work after graduation. Some schools are already buying in, despite the educational world initially greeting AI with distrust and outright bans. Per the Times, schools like the University of Maryland, Duke University, and California State University have all signed up for OpenAI's premium service, ChatGPT Edu, and have started to integrate the chatbot into different parts of the educational experience. It's not alone in setting its sights on higher education, either. Elon Musk's xAI offered free access to its chatbot Grok to students during exam season, and Google is currently offering its Gemini AI suite to students for free through the end of the 2025-26 academic year. But that is outside of the actual infrastructure of higher education, which is where OpenAI is attempting to operate. Universities opting to embrace AI, after initially taking hardline positions against it over fears of cheating, is unfortunate. There is already a fair amount of evidence piling up that AI is not all that beneficial if your goal is to learn and retain accurate information. A study published earlier this year found that reliance on AI can erode critical thinking skills. Others have similarly found that people will 'offload' the more difficult cognitive work and rely on AI as a shortcut. If the idea of university is to help students learn how to think, AI undermines it. And that's before you get into the misinformation of it all. In an attempt to see how AI could serve in a focused education setting, researchers tried training different models on a patent law casebook to see how they performed when asked questions about the material. They all produced false information, hallucinated cases that did not exist, and made errors. The researchers reported that OpenAI's GPT model offered answers that were 'unacceptable' and 'harmful for learning' about a quarter of the time. That's not ideal. Considering that OpenAI and other companies want to get their chatbots ingrained not just in the classroom, but in every aspect of student life, there are other harms to consider, too. Reliance on AI chatbots can have a negative impact on social skills. And the simple fact that universities are investing in AI means they aren't investing in areas that would create more human interactions. A student going to see a tutor, for example, creates a social interaction that requires using emotional intelligence and establishing trust and connection, ultimately adding to a sense of community and belonging. A chatbot just spits out an answer, which may or may not be correct.

Week in Review: Why Anthropic cut access to Windsurf

Yahoo

2 hours ago

Yahoo

Week in Review: Why Anthropic cut access to Windsurf

Welcome back to Week in Review! Got lots for you today, including why Windsurf lost access to Claude, ChatGPT's new features, WWDC 2025, Elon Musk's fight with Donald Trump, and lots more. Have a great weekend! Duh: During an interview at TC Sessions: AI 2025, Anthropic's co-founder had a perfectly reasonable explanation for why the company cut access to Windsurf: 'I think it would be odd for us to be selling Claude to OpenAI,' Chief Science Officer Jared Kaplan said, referring to rumors and reports that OpenAI, its largest competitor, is acquiring the AI coding assistant. Seems like a good reason to me! Everything is the same: Chinese lab DeepSeek released an updated version of its R1 reasoning AI model last week that performs well on a number of math and coding benchmarks. Now some AI researchers are speculating that at least some of the source data it trained on came from Google's Gemini family of AI. WWDC 2025: Apple's annual developers conference starts Monday. Beyond a newly designed operating system, here's what we're expecting to see at this year's event, including a dedicated gaming app and updates to Mac, Watch, TV, and more. This is TechCrunch's Week in Review, where we recap the week's biggest news. Want this delivered as a newsletter to your inbox every Saturday? Sign up here. Business in the front: ChatGPT is getting new features for business users, including connectors for Dropbox, Box, SharePoint, OneDrive, and Google Drive. This would let ChatGPT look for information across your own services to answer questions. Oh no: Indian grocery delivery startup KiranaPro was hacked, and all of its data was wiped. According to the company, it has 55,000 customers, with 30,000 to 35,000 active buyers across 50 cities, who collectively place 2,000 orders daily. Artsy people, rejoice! Photoshop is now coming to Android, so users of Google's operating system can gussy up their images, too. The app has a similar set of editing tools as the desktop version, including layering and masking. Let's try that again: Tesla filed new trademark applications for "Tesla Robotaxi" after previous attempts to trademark the terms 'Robotaxi' and 'Cybercab" failed. Rolling in dough: Tech startup Anduril just picked up a $1 billion investment as part of a new $2.5 billion raise led by Founders Fund, which means Anduril has doubled its valuation to $30.5 billion. On the road again: When Toma's founders realized car dealerships were drowning in missed calls, they hit the road to see the problem firsthand. That summer road trip turned into a $17 million a16z-backed fundraise that helped Toma get its AI phone agents into more than 100 dealerships across the U.S. Fighting season: All gloves were off on Thursday as Elon Musk and President Trump took to their respective social networks to throw jabs at each other. Though it might be exciting to watch rich men squabble in public, the fallout between the world's richest person and a sitting U.S. president promises to have broader implications for the tech industry. Money talks: Whether you use AI as a friend, a therapist, or even a girlfriend, chatbots are trained to keep you talking. For Big Tech companies, it's never been more competitive to attract users to their chatbot platforms — and keep them there. This article originally appeared on TechCrunch at

Boston Globe

2 hours ago

Boston Globe

Welcome to campus. Here's your ChatGPT.

'Our vision is that, over time, AI would become part of the core infrastructure of higher education,' Leah Belsky, OpenAI's vice president of education, said in an interview. In the same way that colleges give students school email accounts, she said, soon 'every student who comes to campus would have access to their personalized AI account.' Advertisement To spread chatbots on campuses, OpenAI is selling premium AI services to universities for faculty and student use. It is also running marketing campaigns aimed at getting students who have never used chatbots to try ChatGPT. Get Starting Point A guide through the most important stories of the morning, delivered Monday through Friday. Enter Email Sign Up Some universities, including the University of Maryland and California State University, are already working to make AI tools part of students' everyday experiences. In early June, Duke University began offering unlimited ChatGPT access to students, faculty and staff. The school also introduced a university platform, called DukeGPT, with AI tools developed by Duke. OpenAI's campaign is part of an escalating AI arms race among tech giants to win over universities and students with their chatbots. The company is following in the footsteps of rivals like Google and Microsoft that have for years pushed to get their computers and software into schools, and court students as future customers. Advertisement The competition is so heated that Sam Altman, OpenAI's CEO, and Elon Musk, who founded the rival xAI, posted dueling announcements on social media this spring offering free premium AI services for college students during exam period. Then Google upped the ante, announcing free student access to its premium chatbot service 'through finals 2026.' OpenAI ignited the recent AI education trend. In late 2022, the company's rollout of ChatGPT, which can produce human-sounding essays and term papers, helped set off a wave of chatbot-fueled cheating. Generative AI tools like ChatGPT, which are trained on large databases of texts, also make stuff up, which can mislead students. Less than three years later, millions of college students regularly use AI chatbots as research, writing, computer programming and idea-generating aides. Now OpenAI is capitalizing on ChatGPT's popularity to promote the company's AI services to universities as the new infrastructure for college education. OpenAI's service for universities, ChatGPT Edu, offers more features, including certain privacy protections, than the company's free chatbot. ChatGPT Edu also enables faculty and staff to create custom chatbots for university use. (OpenAI offers consumers premium versions of its chatbot for a monthly fee.) OpenAI's push to AI-ify college education amounts to a national experiment on millions of students. The use of these chatbots in schools is so new that their potential long-term educational benefits, and possible side effects, are not yet established. California State University announced this year that it was making ChatGPT available to more than 460,000 students across its 23 campuses to help prepare them for 'California's future AI-driven economy.' Cal State said the effort would help make the school 'the nation's first and largest AI-empowered university system.' Advertisement Some universities say they are embracing the new AI tools in part because they want their schools to help guide, and develop guardrails for, the technologies. " You're worried about the ecological concerns. You're worried about misinformation and bias," Edmund Clark, the chief information officer of California State University, said at a recent education conference in San Diego. 'Well, join in. Help us shape the future.' Last spring, OpenAI introduced ChatGPT Edu, its first product for universities, which offers access to the company's latest AI. Paying clients like universities also get more privacy: OpenAI says it does not use the information that students, faculty and administrators enter into ChatGPT Edu to train its AI. (The New York Times has sued OpenAI and its partner, Microsoft, over copyright infringement. Both companies have denied wrongdoing.) Last fall, OpenAI hired Belsky to oversee its education efforts. An ed tech startup veteran, she previously worked at Coursera, which offers college and professional training courses. She is pursuing a two-pronged strategy: marketing OpenAI's premium services to universities for a fee while advertising free ChatGPT directly to students. OpenAI also convened a panel of college students recently to help get their peers to start using the tech. Among those students are power users like Delphine Tai-Beauchamp, a computer science major at the University of California, Irvine. She has used the chatbot to explain complicated course concepts, as well as help explain coding errors and make charts diagraming the connections between ideas. 'I wouldn't recommend students use AI to avoid the hard parts of learning,' Tai-Beauchamp said. She did recommend students try AI as a study aid. 'Ask it to explain something five different ways.' Advertisement Some faculty members have already built custom chatbots for their students by uploading course materials like their lecture notes, slides, videos and quizzes into ChatGPT. Jared DeForest, the chair of environmental and plant biology at Ohio University, created his own tutoring bot, called SoilSage, which can answer students' questions based on his published research papers and science knowledge. Limiting the chatbot to trusted information sources has improved its accuracy, he said. 'The curated chatbot allows me to control the information in there to get the product that I want at the college level,' DeForest said. But even when trained on specific course materials, AI can make mistakes. In a new study -- 'Can AI Hold Office Hours?' -- law school professors uploaded a patent law casebook into AI models from OpenAI, Google and Anthropic. Then they asked dozens of patent law questions based on the casebook and found that all three AI chatbots made 'significant' legal errors that could be 'harmful for learning.' 'This is a good way to lead students astray,' said Jonathan S. Masur, a professor at the University of Chicago Law School and a co-author of the study. 'So I think that everyone needs to take a little bit of a deep breath and slow down.' OpenAI said the 250,000-word casebook used for the study was more than twice the length of text that its GPT-4o model can process at once. Anthropic said the study had limited usefulness because it did not compare the AI with human performance. Google said its model accuracy had improved since the study was conducted. Advertisement Belsky said a new 'memory' feature, which retains and can refer to previous interactions with a user, would help ChatGPT tailor its responses to students over time and make the AI 'more valuable as you grow and learn.' Privacy experts warn that this kind of tracking feature raises concerns about long-term tech company surveillance. In the same way that many students today convert their school-issued Gmail accounts into personal accounts when they graduate, Belsky envisions graduating students bringing their AI chatbots into their workplaces and using them for life. 'It would be their gateway to learning -- and career life thereafter,' Belsky said. This article originally appeared in

Inside the Secret Meeting Where Mathematicians Struggled to Outsmart AI

Hashtags

Try Our AI Features

Comments

Related Articles

OpenAI Wants to get College Kids Hooked on AI

Week in Review: Why Anthropic cut access to Windsurf

Welcome to campus. Here's your ChatGPT.

Get Started Now: Download the App