Latest news with #TheIllusionofThinking


The Hindu
2 hours ago
- Science
- The Hindu
Apple AI research shows reasoning models collapse when problems are more complex
A research paper from Apple published on June 6 stated that although large reasoning models (LRMs) showed improved performance on benchmarks, they struggled with accuracy when the problems became more complex. Titled, 'The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity,' the paper revealed that even the most advanced AI reasoning models collapsed entirely when facing harder problems. 'They exhibit a counter- intuitive scaling limit: their reasoning effort increases with problem complexity up to a point, then declines despite having an adequate token budget,' the paper noted. To test the AI models, the researchers categorised the problems into low complexity, medium complexity and high complexity tasks which included a bunch of puzzles like Checkers Jumping, River Crossing, Blocks World and the Tower of Hanoi. The researchers picked Claude 3.7 Sonnet and DeepSeek-V3 from among the large language models and the Claude 3.7 Sonnet with Thinking and DeepSeek-R1 from among the large reasoning models. The research concluded that both the types of AI models had a similar level of capability. For low complexity problems, the models were found to solve the puzzles but as they proceeded to the high complex category, both AI models failed to work. The hardware giant has been seen as lagging behind in developing AI technology. Notably, Apple's annual Worldwide Developers Conference is also expected to begin later today.


India Today
7 hours ago
- India Today
Apple researchers say models like ChatGPT o3 look smart but collapse when faced with real complexity
They may talk the talk, but can they truly think it through? A new study by Apple researchers suggests that even the most advanced AI models like ChatGPT o3, Claude, and DeepSeek start to unravel when the going gets tough. These so-called 'reasoning' models may impress with confident answers and detailed explanations, but when faced with genuinely complex problems, they stumble – and sometimes fall flat. advertisementApple researchers have found that the most advanced large language models today may not be reasoning in the way many believe. In a recently released paper titled The Illusion of Thinking, researchers at Apple show that while these models appear intelligent on the surface, their performance dramatically collapses when they are faced with truly complex study looked at a class of models now referred to as Large Reasoning Models (LRMs), which are designed to "think" through complex tasks using a series of internal steps, often called a 'chain of thought.' This includes models like OpenAI's o3, DeepSeek-R1, and Claude 3.7 Sonnet Thinking. Apple's researchers tested how these models handle problems of increasing difficulty – not just whether they arrive at the correct answer, but how they reason their way The findings were striking. As problem complexity rose, the models' performance did not apparently degrade gracefully – it collapsed completely. 'They think more up to a point,' tweeted tech critique Josh Wolfe, referring to the findings. 'Then they give up early, even when they have plenty of compute left.' Apple's team built custom puzzle environments such as the Tower of Hanoi, River Crossing, and Blocks World to carefully control complexity levels. These setups allowed them to observe not only whether the models found the right answer, but how they tried to get found that:-At low complexity, traditional LLMs (without reasoning chains) performed better and were more efficient-At medium complexity, reasoning models briefly took the lead-At high complexity, both types failed completelyEven when given a step-by-step algorithm for solving a problem, so that they only needed to follow instructions, models still made critical mistakes. This suggests that they struggle not only with creativity or problem-solving, but with basic logical execution. The models also showed odd behaviour when it came to how much effort they put in. Initially, they 'thought' more as the problems got harder, using more tokens for reasoning steps. But once a certain threshold was reached, they abruptly started thinking less. This happened even when they hadn't hit any computational limits, highlighting what Apple calls a 'fundamental inference time scaling limitation.'advertisementCognitive scientist Gary Marcus said the paper supports what he's been arguing for decades: these systems don't generalise well beyond their training data. 'Neural networks can generalise within a training distribution of data they are exposed to, but their generalisation tends to break down outside that distribution,' Marcus wrote on Substack. He also noted that the models' 'reasoning traces' – the steps they take to reach an answer – can look convincing, but often don't reflect what the models actually did to reach a State University's Subbarao (Rao) Kambhampati, whose previous work has critiqued so-called reasoning models, was also echoed in Apple's findings, points out Marcus. Rao has shown that models often appear to think logically but actually produce answers that don't match their thought process. Apple's experiments back this up by showing models generate long reasoning paths that still lead to the wrong answer, particularly as problems get the most damning evidence came when Apple tested whether models could follow exact instructions. In one test, they were handed the algorithm to solve the Tower of Hanoi puzzle and asked to just execute it. The models still failed once the puzzle complexity passed a certain conclusion is blunt: today's top models are 'super expensive pattern matchers' that can mimic reasoning only within familiar settings. The moment they're faced with novel problems – ones just outside their training data – they findings have serious implications for claims that AI is becoming capable of human-like reasoning. As the paper puts it, the current approach may be hitting a wall, and overcoming it could require an entirely different way of thinking about how we build intelligent systems. In short, we are still leaps away from AGI.