Latest news with #Strengths

AI experts divided over Apple's research on large reasoning model accuracy

Business Standard

15-06-2025

Business
Business Standard

AI experts divided over Apple's research on large reasoning model accuracy

A recent study by tech giant Apple claiming that the accuracy of frontier large reasoning models (LRMs) declines as task complexity increases, and eventually collapses altogether, has led to differing views among experts in the artificial intelligence (AI) world. The paper titled 'The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity' was published by Apple last week. Apple, in its paper, said it conducted experiments across diverse puzzles which show that such LRMs face a complete accuracy collapse beyond certain complexities. While their reasoning efforts increase with the complexity of a problem till a point, it then declines despite having an adequate token budget. A token budget for large language models (LLM) refers to the practice of setting a limit on the number of tokens an LLM can use for a specific task. The paper is co-authored by Samy Bengio, senior director, AI and ML research at Apple who is also the brother of Yoshua Bengio, often referred to as the godfather of AI. Meanwhile, AI company Anthropic, backed by Amazon, countered Apple's claims in a separate paper, saying that the 'findings primarily reflect experimental design limitations rather than fundamental reasoning failures.' 'Their central finding has significant implications for AI reasoning research. However, our analysis reveals that these apparent failures stem from experimental design choices rather than inherent model limitations,' it said. Mayank Gupta, founder of Swift Anytime, currently building an AI product on stealth, told Business Standard that both sides have equally important points. 'What this tells me is that we're still figuring out how to measure reasoning in LRMs the right way. The models are improving rapidly, but our evaluation tools haven't caught up. We need tools that separate how well an LRM reasons from how well it generates output and that's where the real breakthrough lies,' he said. Gary Marcus, a US academic, who has become a voice of caution on the capabilities of AI models, said in a best case scenario, these models can write python code, supplementing their own weaknesses with outside symbolic code, but even this is not reliable. 'What this means for business and society is that you can't simply drop o3 or Claude into some complex problem and expect it to work reliably,' he wrote in his blog, Marcus on AI. The Apple researchers conducted experiments comparing thinking and non-thinking model pairs across controlled puzzle environments. 'The most interesting regime is the third regime where problem complexity is higher and the performance of both models have collapsed to zero. Results show that while thinking models delay this collapse, they also ultimately encounter the same fundamental limitations as their non-thinking counterparts,' they wrote. Apple's observations in the paper perhaps can explain why the iPhone maker has been slow to embed AI across its products or operating systems, a point on which it was criticised at the Worldwide Developers Conference (WWDC) last week. This approach is opposite to the ones adopted by Microsoft-backed OpenAI, Meta, and Google, who are spending billions to build more sophisticated frontier models to solve more complex tasks. However, there are other voices too who believe that Apple's paper has its limitations. Ethan Mollick, associate professor at the Wharton School who studies the effects of AI on work, entrepreneurship, and education, mentioned on X that while the limits of reasoning models are useful, it is premature to say that LLMs are hitting a wall.

Latest news with #Strengths

AI experts divided over Apple's research on large reasoning model accuracy

Get Started Now: Download the App