
Despite Billions In Investment, AI Reasoning Models Are Falling Short
The research community dug in, and the responses were swift. Despite the increasing adoption of Generative AI and the adoption and the presumption that AI will replace tasks and jobs at scale, these Large Reasoning Models are falling short.
By definition, Large Reasoning Models (LRMs) are Large Language Models (LLMS) focused on step-by-step thinking. This is called Chain of Thought (CoT) which facilitates problem solving by guiding the model to articulate reasoning steps.
Jing Hu, writer, researcher and author of 2nd Order Thinkers, who dissected the paper's findings remarked that "AI is just sophisticated pattern matching, no thinking, no reasoning" and 'AI can only do tasks accurately up to a certain degree of complexity.'
As part of the study, researchers created a closed puzzle environment for games like Checkers Jumping, River Crossing, and Tower of Hanoi, which simulate varied conditions of complexity. Puzzles were applied across three stages of complexity ranging from the simplest to high complexity. Across all three stages of the models' performances the paper concluded:
At 'Low Complexity', the regular models performed better than LRMs. Hu explained, 'The reasoning models were overthinking — wrote thousands of words, exploring paths that weren't needed, second-guessing correct answers and making things more complicated than they should have been.' In the Tower of Hanoi, a human can solve the puzzle within seven moves while Claude-3.7 Sonnet Thinking uses '10x more tokens compared to the regular version while achieving the same accuracy... it's like driving a rocket ship to the corner store.'
At 'Medium Complexity', LRMs outperformed LLMs, revealing traces of chain of thought reasoning . Hu argues LRMs tended to explore wrong answers first before eventually finding the correct answer, however, she argues, 'these thinking models use 10-50x more compute power (15,000-20,000 tokes vs. 1,000-5000). Imagine paying $500 instead of $50 for a hamburger that tastes 10% better.'
Hu says that this isn't an impressive breakthrough but reveal a complexity that is dressed to impress audiences and 'simple enough to avoid total failure.'
At 'High Complexity', both LRMs and standard models collapse, and accuracy drops to zero. As the problems get more complex, the models simply stopped trying.
Hu explains, referencing Figure 6 from Apple's paper,
'Accuracy starts high for all models on simple tasks, dips slowly then crashes to near zero at a 'critical point' of complexity. If this compared to the row displaying Token use, the latter rises as problems become harder ('models think more'), peaks, then drops sharply at the same critical point even if token budget is still available.' Hu explains models aren't scaling up their effort; rather they abandon real reasoning and output less.
Gary Marcus is an authority on AI. He's a scientist and has written several books including The Algebraic Mind and Rebooting AI. He continues to scrutinize the releases from these AI companies. In his response to Apple's paper, he states, 'it echoes and amplifies the training distribution argument that I have been making since 1998: neural networks of various kinds can generalize within a training distribution of data they are exposed to, but their generalizations tend to break down outside that distribution.' This means the more edge cases introduced to these LRMs the more they will go off-track especially with problems that are very different from the training data.
He also advises that LRMs have a scaling problem because 'the outputs would require too many output tokens' indicating the correct answer would be too long for the LRMs to produce.
The implications? Hu advises, 'This comparison matters because it debunks hype around LRMs by showing they only shine on medium complexity tasks, not simple or extreme ones.'
Why this Hedge Fund CEO Passes on GenAI
Ryan Pannell is the CEO of Kaiju Worldwide, a technology research and investment firm specializing in predictive artificial intelligence and algorithmic trading. He plays in an industry that demands compliance and a stronger level of certainty. He uses Predictive AI, which is a type of artificial intelligence leveraging statistical analysis and machine learning to forecast based on patterns on historical data; unlike generative AI like LLM and LRM chatbots, it does not create original content.
Sound data is paramount and for the hedge funds, they only leverage closed datasets, as Pannell explains, 'In our work with price, time, and quantity, the analysis isn't influenced by external factors — the integrity of the data is reliable, as long as proper precautions are taken, such as purchasing quality data sets and putting them through rigorous quality control processes, ensuring only fully sanitized data are used.'
The data they purchase — price, time, and quantity — are from three different vendors and when they compare their outputs, 99.999% of the time they all match. However, when there's an error — since some data vendors occasionally provide incorrect price, time, or quantity information — the other two usually point out the mistake. Pannell argues, 'This is why we use data from three sources. Predictive systems don't hallucinate because they aren't guessing.'
For Kaiju, the predictive model uses only what it knows and whatever new data they collect to spot patterns they use to predict what will come next. 'In our case, we use it to classify market regimes — bull, bear, neutral, or unknown. We've fed them trillions of transactions and over four terabytes of historical price and quantity data. So, when one of them outputs 'I don't know,' it means it's encountered something genuinely unprecedented.' He claims that if it sees loose patterns and predicts a bear market with 75% certainty, it's likely correct, however, 'I don't know,' signals a unique scenario, something never seen in decades of market data. 'That's rare, but when it happens, it's the most fascinating for us,' says Pannell.
In 2017, when Trump policy changes caused major trade disruptions, Pannell asserted these systems were not in place so the gains they made within this period of high uncertainty were mostly luck. But the system today, which has experienced this level of volatility before, can perform well, and with consistency.
AI Detection and the Anomaly of COVID-19
Just before the dramatic drop of the stock market of February 2020, the stock market was still at an all-time high. However, Pannell noted that the system was signaling that something was very wrong and the strange behavior in the market kept intensifying, 'The system estimated a 96% chance of a major drop and none of us knew exactly why at the time. That's the challenge with explainability — AI can't tell you about news events, like a cruise ship full of sick people or how COVID spread across the world. It simply analyzes price, time and quantity patterns and predicts a fall based on changing behavior it is seeing, even though it has no awareness of the underlying reasons. We, on the other hand, were following the news as humans do.'
The news pointed to this 'COVID-19' thing, at the time it seemed isolated. Pannell's team weren't sure what to expect but in hindsight he realized the value of the system: it analyzes terabytes of data and billions of examinations daily for any recognizable pattern and sometimes determines what's happening matches nothing it has seen before.' In those cases, he realized, the system acted as an early warning, allowing them to increase their hedges.
With the billions of dollars generated from these predictive AI systems, their efficacy drops off after a week to ~21%-17% and making trades outside this range is extremely risky. Pannell suggests he hasn't seen any evidence suggesting AI — of any kind — will be able to predict financial markets with accuracy 90 days, six months or a year in advance. 'There are simply too many unpredictable factors involved. Predictive AI is highly accurate in the immediate future — between today and tomorrow — because the scope of possible changes is limited.'
Pannell remains skeptical on the promises of LLMs and the current LRMS for his business. He describes wasting three hours being lied to by ChatGPT 4.o when he was experimenting with using it to architecting a new framework. He was blown away the system had substantially increased its functionality at first, but he determined after three hours, it was lying to him the entire time. He explains, 'When I asked, 'Do you have the capability to do what you just said?' the system responded it did not and added that its latest update had programmed it to keep him engaged over giving an honest answer.'
Pannell adds, 'Within a session, an LLM can adjust when I give it feedback, like 'don't do this again,' but as soon as the session goes for too long, it forgets and starts lying again.'
He also points to ChatGPT's memory constraints. He noted it performs really well for the first hour but in the second or third hour, ChatGPT starts forgetting earlier context, making mistakes and dispensing false information. He described it to a colleague this way, 'It's like working with an extremely talented but completely drunk programmer. It does some impressive work, but it also over-estimates its capabilities, lies about what it can and can't do, delivers some well-written code, wrecks a bunch of stuff, apologizes and says it won't do it again, tells me that my ideas are brilliant and that I am 'right for holding it accountable', and then repeats the whole process over and over again. The experience can be chaotic.'
Could Symbolic AI be the Answer?
Catriona Kennedy holds a Ph.D. in Computer Science from the University of Birmingham and is an independent researcher focusing on cognitive systems and ethical automation.
Kennedy explains that automated reasoning has always been a branch of AI with the inference engine at the core, which applies the rules of logic to a set of statements that are encoded in a formal language. She explains, 'An inference engine is like a calculator, but unlike AI, it operates on symbols and statements instead of numbers. It is designed to be correct.' It is designed to deduce new information, simulating the decision-making of a human expert. Generative AI, in comparison, is a statistical generator, therefore prone to hallucinations because 'they do not interpret the logic of the text in the prompt.'
This is the heart of symbolic AI, one that uses an inference engine and allows for human experience and authorship. It is a distinct AI system from generative AI. The difference with Symbolic AI is the knowledge structure. She explains, 'You have your data and connect it with knowledge allowing you to classify the data based on what you know. Metadata is an example of knowledge. It describes what data exists and what it means and this acts as a knowledge base linking data to its context — such as how it was obtained and what it represents.' Kennedy also adds ontologies are becoming popular again. Ontology defines all the things that exist and the interdependent properties and relationships. As an example, animal is a class, and a sub class is a bird and a further sub-class is an eagle or robin. The properties of a bird: Has 2 feet, has feathers, and flies. However, what an eagle eats may be different from what a robin eats. Ontologies and metadata can connect with logic-based rules to ensure the correct reasoning based on defined relationships.
The main limitation of pure symbolic AI is that it doesn't easily scale. Kennedy points out that these knowledge structures can become unwieldy. While it excels at special purpose tasks, it becomes brittle at very complex levels and difficult to manage when dealing with large, noisy or unpredictable data sets.
What we have today in current LRMs has not yet satisfied these researchers that AI models are any closer to thinking like humans, as Marcus points out, 'our argument is not that humans don't have any limits, but LRMs do, and that's why they aren't intelligent... based on what we observe from their thoughts, their process is not logical and intelligent.'
For Jing Hu, she concludes, "Too much money depends on the illusion of progress — there is a huge financial incentive to keep the hype going even if the underlying technology isn't living up to the promises. Stop the Blind worship of GenAI." (Note: Open AI recently raised $40billion with a post-money valuation of $300billion.)
For hedge fund CEO, Ryan Pannell, combining generative AI (which can handle communication and language) with predictive systems (which can accurately process data in closed environments) would be ideal. As he explains, 'The challenge is that predictive AI usually doesn't have a user-friendly interface; it communicates in code and math, not plain English. Most people can't access or use these tools directly.'
He opts for integrating GPT as an intermediary, 'where you ask GPT for information, and it relays that request to a predictive system and then shares the results in natural language—it becomes much more useful. In this role, GPT acts as an effective interlocutor between the user and the predictive model.'
Gary Marcus believes by combining symbolic AI with neural networks — which is coined Neurosymbolic AI — connecting data to knowledge that leverage human thought processes, the result will be better. He explains that this will provide a robust AI capable of 'reasoning, learning and cognitive modelling.'
Marcus laments that for four decades, the elites that have evolved machine-learning, 'closed-minded egotists with too much money and power' have 'tried to keep a good idea, namely neurosymbolic AI, down — only to accidentally vindicate the idea in the end."
'Huge vindication for what I have been saying all along: we need AI that integrates both neural networks and symbolic algorithms and representations (such as logic, code, knowledge graphs, etc.). But also, we need to do so reliably, and in a general way, and we haven't yet crossed that threshold.'
Hashtags

Try Our AI Features
Explore what Daily8 AI can do for you:
Comments
No comments yet...
Related Articles
Yahoo
17 minutes ago
- Yahoo
Informa TechTarget to Announce Q2 2025 Financial Results on August 12, 2025
Live Conference Call and Webcast Scheduled to Begin at 8:30 a.m. ET on August 12, 2025 NEWTON, Mass., August 06, 2025--(BUSINESS WIRE)--TechTarget, Inc. (Nasdaq: TTGT) ("Informa TechTarget" or the "Company"), a leading growth accelerator for the B2B Technology sector, today announced that it will release its Q2 2025 financial results for the three months ended June 30, 2025 before the market opens on Tuesday, August 12, 2025. The Company's Chief Executive Officer, Gary Nugent, and Chief Financial Officer, Dan Noreck, will host a live conference call and webcast at 8:30 a.m. Eastern Time on that day to discuss the Company's financial results and outlook. The Q2 2025 financial results will be available prior to the conference call and webcast on the investor relations section of the Company's website at Conference Call Dial-In Information: United States (Toll Free): 1-833-470-1428 United States: 1-404-975-4839 United Kingdom (Toll Free): +44 808 189 6484 United Kingdom: +44 20 8068 2558 Global Dial-in Numbers Access code: 967110 Please access the call at least 10 minutes prior to the time the conference is set to begin. Please ask to be joined into the Informa TechTarget call. Conference Call Webcast Information: This webcast can be accessed via Informa TechTarget's website at Conference Call Replay Information: A replay of the conference call will be available via telephone beginning one (1) hour after the conference call through September 11, 2025 at 11:59 p.m. ET. To hear the replay: United States (Toll Free): 1-866-813-9403 United States: 1-929-458-6194 Access Code: 703085 A web version will also be available for replay during the same period on Informa TechTarget's website at About Informa TechTarget TechTarget, Inc. (Nasdaq: TTGT), which also refers to itself as Informa TechTarget, informs, influences and connects the world's technology buyers and sellers, helping accelerate growth from R&D to ROI. With a vast reach of over 220 highly targeted technology-specific websites and over 50 million permissioned first-party audience members, Informa TechTarget has a unique understanding of and insight into the technology market. Underpinned by those audiences and their data, we offer expert-led, data-driven, and digitally enabled services that have the potential to deliver significant impact and measurable outcomes to our clients: Trusted information that shapes the industry and informs investment Intelligence and advice that guides and influences strategy Advertising that grows reputation and establishes thought leadership Custom content that engages and prompts action Intent and demand generation that more precisely targets and converts Informa TechTarget is headquartered in Boston, MA and has offices in 19 global locations. For more information, visit and follow us on LinkedIn. © 2025 TechTarget, Inc. All rights reserved. All trademarks are the property of their respective owners. View source version on Contacts Investor Inquiries Daniel NoreckMitesh KotechaInforma TechTarget617-431-9200investor@ Media Inquiries Garrett MannCorporate CommunicationsInforma
Yahoo
17 minutes ago
- Yahoo
Kulicke and Soffa (NASDAQ:KLIC) Q2: Beats On Revenue, Provides Optimistic Revenue Guidance for Next Quarter
Semiconductor production equipment company Kulicke & Soffa (NASDAQ: KLIC) reported Q2 CY2025 results topping the market's revenue expectations , but sales fell by 18.3% year on year to $148.4 million. On top of that, next quarter's revenue guidance ($170 million at the midpoint) was surprisingly good and 7.4% above what analysts were expecting. Its non-GAAP profit of $0.07 per share was 27.3% above analysts' consensus estimates. Is now the time to buy Kulicke and Soffa? Find out in our full research report. Kulicke and Soffa (KLIC) Q2 CY2025 Highlights: Revenue: $148.4 million vs analyst estimates of $145.8 million (18.3% year-on-year decline, 1.8% beat) Adjusted EPS: $0.07 vs analyst estimates of $0.06 (27.3% beat) Adjusted Operating Income: $1.59 million vs analyst estimates of -$931,670 (1.1% margin, significant beat) Revenue Guidance for Q3 CY2025 is $170 million at the midpoint, above analyst estimates of $158.3 million Adjusted EPS guidance for Q3 CY2025 is $0.22 at the midpoint, above analyst estimates of $0.19 Operating Margin: -4.1%, down from 4.6% in the same quarter last year Free Cash Flow Margin: 3.7%, down from 13.3% in the same quarter last year Inventory Days Outstanding: 182, up from 171 in the previous quarter Market Capitalization: $1.71 billion Fusen Chen, Kulicke & Soffa's President and Chief Executive Officer, stated, "We continue to execute on multiple technology transitions supported by parallel customer engagements. As we expand our portfolio, we are unlocking new opportunities across general semiconductor, memory, automotive, and industrial markets. Additionally, we are encouraged by positive market feedback of our latest solutions and also by recent order momentum within our highest-volume regions." Company Overview Headquartered in Singapore, Kulicke & Soffa (NASDAQ: KLIC) is a provider of production equipment and tools used to assemble semiconductor devices Revenue Growth A company's long-term sales performance can indicate its overall quality. Any business can have short-term success, but a top-tier one grows for years. Regrettably, Kulicke and Soffa's sales grew at a sluggish 2.4% compounded annual growth rate over the last five years. This fell short of our benchmarks and is a poor baseline for our analysis. Semiconductors are a cyclical industry, and long-term investors should be prepared for periods of high growth followed by periods of revenue contractions. We at StockStory place the most emphasis on long-term growth, but within semiconductors, a half-decade historical view may miss new demand cycles or industry trends like AI. Kulicke and Soffa's performance shows it grew in the past but relinquished its gains over the last two years, as its revenue fell by 10.8% annually. This quarter, Kulicke and Soffa's revenue fell by 18.3% year on year to $148.4 million but beat Wall Street's estimates by 1.8%. Despite the beat, the drop in sales could mean that the current downcycle is deepening. Company management is currently guiding for a 6.2% year-on-year decline in sales next quarter. Looking further ahead, sell-side analysts expect revenue to grow 2.3% over the next 12 months. While this projection indicates its newer products and services will fuel better top-line performance, it is still below average for the sector. Today's young investors likely haven't read the timeless lessons in Gorilla Game: Picking Winners In High Technology because it was written more than 20 years ago when Microsoft and Apple were first establishing their supremacy. But if we apply the same principles, then enterprise software stocks leveraging their own generative AI capabilities may well be the Gorillas of the future. So, in that spirit, we are excited to present our Special Free Report on a profitable, fast-growing enterprise software stock that is already riding the automation wave and looking to catch the generative AI next. Product Demand & Outstanding Inventory Days Inventory Outstanding (DIO) is an important metric for chipmakers, as it reflects a business' capital intensity and the cyclical nature of semiconductor supply and demand. In a tight supply environment, inventories tend to be stable, allowing chipmakers to exert pricing power. Steadily increasing DIO can be a warning sign that demand is weak, and if inventories continue to rise, the company may have to downsize production. This quarter, Kulicke and Soffa's DIO came in at 182, which is 23 days above its five-year average, suggesting that the company's inventory has grown to higher levels than we've seen in the past. Key Takeaways from Kulicke and Soffa's Q2 Results It was good to see Kulicke and Soffa beat analysts' revenue, EPS, and adjusted operating income expectations this quarter. We were also excited its guidance for next quarter outperformed Wall Street's estimates by a wide margin. A slight blemish is that its inventory levels materially increased. Zooming out, we think this was a good print with some key areas of upside. The stock traded up 4.2% to $33.43 immediately after reporting. Indeed, Kulicke and Soffa had a rock-solid quarterly earnings result, but is this stock a good investment here? When making that decision, it's important to consider its valuation, business qualities, as well as what has happened in the latest quarter. We cover that in our actionable full research report which you can read here, it's free. Error in retrieving data Sign in to access your portfolio Error in retrieving data Error in retrieving data Error in retrieving data Error in retrieving data
Yahoo
17 minutes ago
- Yahoo
Paycom raises 2025 revenue and profit forecasts on AI-driven demand
(Reuters) -Payroll processor Paycom Software raised its forecast for annual revenue and profit on Wednesday, as the addition of AI features helps accelerate demand for its employee management services, sending its shares up 7% in extended trading. The company now expects fiscal 2025 revenue of $2.05 billion to $2.06 billion, up from its previous projection of $2.02 billion to $2.04 billion. Analysts on average expect $2.03 billion, according to data compiled by LSEG. Paycom has been integrating artificial intelligence features into its software with its 'smart AI' suite that automates tasks such as writing job descriptions and helps employers identify which employees are most at risk of leaving. This has boosted demand for Paycom's services as businesses look to simplify workforce management functions. "We are well positioned to extend our product lead and eclipse the industry with even greater AI and automation," CEO Chad Richison said in a statement. Paycom expects 2025 core profit in the range of $872 million to $882 million, up from previous expectations of $843 million to $858 million. The payroll processor reported revenue of $483.6 million for the second quarter ended June 30, beating analysts' estimate of $472 million. Adjusted core profit was $198.3 million, compared to $159.7 million in the same period last year. Paycom's expectation of strong growth comes despite a sharp deterioration in U.S. labor market conditions. U.S. employment growth was weaker than expected in July, while the nonfarm payrolls count for the prior two months was revised down by 258,000 jobs, according to a Labor Department report. Sign in to access your portfolio