Latest news with #JingHu

Despite Billions In Investment, AI Reasoning Models Are Falling Short

Forbes

04-08-2025

Science
Forbes

Despite Billions In Investment, AI Reasoning Models Are Falling Short

In early June, Apple released an explosive paper, The Illusion of Thinking: Understanding the Limitations of Reasoning Models via the Lens of Problem Complexity. It examines the reasoning ability of Large Reasoning Models (LRMs) such as Claude 3.7 Sonnet Thinking, Gemini Thinking, DeepSeek-R1, and OpenAI's o-series models — how they think, especially as problem complexity increases. The research community dug in, and the responses were swift. Despite the increasing adoption of Generative AI and the adoption and the presumption that AI will replace tasks and jobs at scale, these Large Reasoning Models are falling short. By definition, Large Reasoning Models (LRMs) are Large Language Models (LLMS) focused on step-by-step thinking. This is called Chain of Thought (CoT) which facilitates problem solving by guiding the model to articulate reasoning steps. Jing Hu, writer, researcher and author of 2nd Order Thinkers, who dissected the paper's findings remarked that "AI is just sophisticated pattern matching, no thinking, no reasoning" and 'AI can only do tasks accurately up to a certain degree of complexity.' As part of the study, researchers created a closed puzzle environment for games like Checkers Jumping, River Crossing, and Tower of Hanoi, which simulate varied conditions of complexity. Puzzles were applied across three stages of complexity ranging from the simplest to high complexity. Across all three stages of the models' performances the paper concluded: At 'Low Complexity', the regular models performed better than LRMs. Hu explained, 'The reasoning models were overthinking — wrote thousands of words, exploring paths that weren't needed, second-guessing correct answers and making things more complicated than they should have been.' In the Tower of Hanoi, a human can solve the puzzle within seven moves while Claude-3.7 Sonnet Thinking uses '10x more tokens compared to the regular version while achieving the same accuracy... it's like driving a rocket ship to the corner store.' At 'Medium Complexity', LRMs outperformed LLMs, revealing traces of chain of thought reasoning . Hu argues LRMs tended to explore wrong answers first before eventually finding the correct answer, however, she argues, 'these thinking models use 10-50x more compute power (15,000-20,000 tokes vs. 1,000-5000). Imagine paying $500 instead of $50 for a hamburger that tastes 10% better.' Hu says that this isn't an impressive breakthrough but reveal a complexity that is dressed to impress audiences and 'simple enough to avoid total failure.' At 'High Complexity', both LRMs and standard models collapse, and accuracy drops to zero. As the problems get more complex, the models simply stopped trying. Hu explains, referencing Figure 6 from Apple's paper, 'Accuracy starts high for all models on simple tasks, dips slowly then crashes to near zero at a 'critical point' of complexity. If this compared to the row displaying Token use, the latter rises as problems become harder ('models think more'), peaks, then drops sharply at the same critical point even if token budget is still available.' Hu explains models aren't scaling up their effort; rather they abandon real reasoning and output less. Gary Marcus is an authority on AI. He's a scientist and has written several books including The Algebraic Mind and Rebooting AI. He continues to scrutinize the releases from these AI companies. In his response to Apple's paper, he states, 'it echoes and amplifies the training distribution argument that I have been making since 1998: neural networks of various kinds can generalize within a training distribution of data they are exposed to, but their generalizations tend to break down outside that distribution.' This means the more edge cases introduced to these LRMs the more they will go off-track especially with problems that are very different from the training data. He also advises that LRMs have a scaling problem because 'the outputs would require too many output tokens' indicating the correct answer would be too long for the LRMs to produce. The implications? Hu advises, 'This comparison matters because it debunks hype around LRMs by showing they only shine on medium complexity tasks, not simple or extreme ones.' Why this Hedge Fund CEO Passes on GenAI Ryan Pannell is the CEO of Kaiju Worldwide, a technology research and investment firm specializing in predictive artificial intelligence and algorithmic trading. He plays in an industry that demands compliance and a stronger level of certainty. He uses Predictive AI, which is a type of artificial intelligence leveraging statistical analysis and machine learning to forecast based on patterns on historical data; unlike generative AI like LLM and LRM chatbots, it does not create original content. Sound data is paramount and for the hedge funds, they only leverage closed datasets, as Pannell explains, 'In our work with price, time, and quantity, the analysis isn't influenced by external factors — the integrity of the data is reliable, as long as proper precautions are taken, such as purchasing quality data sets and putting them through rigorous quality control processes, ensuring only fully sanitized data are used.' The data they purchase — price, time, and quantity — are from three different vendors and when they compare their outputs, 99.999% of the time they all match. However, when there's an error — since some data vendors occasionally provide incorrect price, time, or quantity information — the other two usually point out the mistake. Pannell argues, 'This is why we use data from three sources. Predictive systems don't hallucinate because they aren't guessing.' For Kaiju, the predictive model uses only what it knows and whatever new data they collect to spot patterns they use to predict what will come next. 'In our case, we use it to classify market regimes — bull, bear, neutral, or unknown. We've fed them trillions of transactions and over four terabytes of historical price and quantity data. So, when one of them outputs 'I don't know,' it means it's encountered something genuinely unprecedented.' He claims that if it sees loose patterns and predicts a bear market with 75% certainty, it's likely correct, however, 'I don't know,' signals a unique scenario, something never seen in decades of market data. 'That's rare, but when it happens, it's the most fascinating for us,' says Pannell. In 2017, when Trump policy changes caused major trade disruptions, Pannell asserted these systems were not in place so the gains they made within this period of high uncertainty were mostly luck. But the system today, which has experienced this level of volatility before, can perform well, and with consistency. AI Detection and the Anomaly of COVID-19 Just before the dramatic drop of the stock market of February 2020, the stock market was still at an all-time high. However, Pannell noted that the system was signaling that something was very wrong and the strange behavior in the market kept intensifying, 'The system estimated a 96% chance of a major drop and none of us knew exactly why at the time. That's the challenge with explainability — AI can't tell you about news events, like a cruise ship full of sick people or how COVID spread across the world. It simply analyzes price, time and quantity patterns and predicts a fall based on changing behavior it is seeing, even though it has no awareness of the underlying reasons. We, on the other hand, were following the news as humans do.' The news pointed to this 'COVID-19' thing, at the time it seemed isolated. Pannell's team weren't sure what to expect but in hindsight he realized the value of the system: it analyzes terabytes of data and billions of examinations daily for any recognizable pattern and sometimes determines what's happening matches nothing it has seen before.' In those cases, he realized, the system acted as an early warning, allowing them to increase their hedges. With the billions of dollars generated from these predictive AI systems, their efficacy drops off after a week to ~21%-17% and making trades outside this range is extremely risky. Pannell suggests he hasn't seen any evidence suggesting AI — of any kind — will be able to predict financial markets with accuracy 90 days, six months or a year in advance. 'There are simply too many unpredictable factors involved. Predictive AI is highly accurate in the immediate future — between today and tomorrow — because the scope of possible changes is limited.' Pannell remains skeptical on the promises of LLMs and the current LRMS for his business. He describes wasting three hours being lied to by ChatGPT 4.o when he was experimenting with using it to architecting a new framework. He was blown away the system had substantially increased its functionality at first, but he determined after three hours, it was lying to him the entire time. He explains, 'When I asked, 'Do you have the capability to do what you just said?' the system responded it did not and added that its latest update had programmed it to keep him engaged over giving an honest answer.' Pannell adds, 'Within a session, an LLM can adjust when I give it feedback, like 'don't do this again,' but as soon as the session goes for too long, it forgets and starts lying again.' He also points to ChatGPT's memory constraints. He noted it performs really well for the first hour but in the second or third hour, ChatGPT starts forgetting earlier context, making mistakes and dispensing false information. He described it to a colleague this way, 'It's like working with an extremely talented but completely drunk programmer. It does some impressive work, but it also over-estimates its capabilities, lies about what it can and can't do, delivers some well-written code, wrecks a bunch of stuff, apologizes and says it won't do it again, tells me that my ideas are brilliant and that I am 'right for holding it accountable', and then repeats the whole process over and over again. The experience can be chaotic.' Could Symbolic AI be the Answer? Catriona Kennedy holds a Ph.D. in Computer Science from the University of Birmingham and is an independent researcher focusing on cognitive systems and ethical automation. Kennedy explains that automated reasoning has always been a branch of AI with the inference engine at the core, which applies the rules of logic to a set of statements that are encoded in a formal language. She explains, 'An inference engine is like a calculator, but unlike AI, it operates on symbols and statements instead of numbers. It is designed to be correct.' It is designed to deduce new information, simulating the decision-making of a human expert. Generative AI, in comparison, is a statistical generator, therefore prone to hallucinations because 'they do not interpret the logic of the text in the prompt.' This is the heart of symbolic AI, one that uses an inference engine and allows for human experience and authorship. It is a distinct AI system from generative AI. The difference with Symbolic AI is the knowledge structure. She explains, 'You have your data and connect it with knowledge allowing you to classify the data based on what you know. Metadata is an example of knowledge. It describes what data exists and what it means and this acts as a knowledge base linking data to its context — such as how it was obtained and what it represents.' Kennedy also adds ontologies are becoming popular again. Ontology defines all the things that exist and the interdependent properties and relationships. As an example, animal is a class, and a sub class is a bird and a further sub-class is an eagle or robin. The properties of a bird: Has 2 feet, has feathers, and flies. However, what an eagle eats may be different from what a robin eats. Ontologies and metadata can connect with logic-based rules to ensure the correct reasoning based on defined relationships. The main limitation of pure symbolic AI is that it doesn't easily scale. Kennedy points out that these knowledge structures can become unwieldy. While it excels at special purpose tasks, it becomes brittle at very complex levels and difficult to manage when dealing with large, noisy or unpredictable data sets. What we have today in current LRMs has not yet satisfied these researchers that AI models are any closer to thinking like humans, as Marcus points out, 'our argument is not that humans don't have any limits, but LRMs do, and that's why they aren't intelligent... based on what we observe from their thoughts, their process is not logical and intelligent.' For Jing Hu, she concludes, "Too much money depends on the illusion of progress — there is a huge financial incentive to keep the hype going even if the underlying technology isn't living up to the promises. Stop the Blind worship of GenAI." (Note: Open AI recently raised $40billion with a post-money valuation of $300billion.) For hedge fund CEO, Ryan Pannell, combining generative AI (which can handle communication and language) with predictive systems (which can accurately process data in closed environments) would be ideal. As he explains, 'The challenge is that predictive AI usually doesn't have a user-friendly interface; it communicates in code and math, not plain English. Most people can't access or use these tools directly.' He opts for integrating GPT as an intermediary, 'where you ask GPT for information, and it relays that request to a predictive system and then shares the results in natural language—it becomes much more useful. In this role, GPT acts as an effective interlocutor between the user and the predictive model.' Gary Marcus believes by combining symbolic AI with neural networks — which is coined Neurosymbolic AI — connecting data to knowledge that leverage human thought processes, the result will be better. He explains that this will provide a robust AI capable of 'reasoning, learning and cognitive modelling.' Marcus laments that for four decades, the elites that have evolved machine-learning, 'closed-minded egotists with too much money and power' have 'tried to keep a good idea, namely neurosymbolic AI, down — only to accidentally vindicate the idea in the end." 'Huge vindication for what I have been saying all along: we need AI that integrates both neural networks and symbolic algorithms and representations (such as logic, code, knowledge graphs, etc.). But also, we need to do so reliably, and in a general way, and we haven't yet crossed that threshold.'

Are We Finally Ceding Control To The Machine? The Human Costs Of AI Transformation

Forbes

02-07-2025

Science
Forbes

Are We Finally Ceding Control To The Machine? The Human Costs Of AI Transformation

AI robot controlling puppet business human. Generative Artificial Intelligence has exploded into the mainstream. Since its introduction, it has transformed the ways individuals work, create, and interact with technology. But is this adoption useful? While technology is saving people considerable time and money, will its effects have repercussions on human health and economic displacement? Jing Hu isn't your typical AI commentator. Trained as a biochemist, she traded the lab bench for the wild west of tech, spending a decade building products before turning her sights on AI research and journalism. Hu's publication on Substack, 2nd Order Thinkers AI's impact on individual and commercial world, as Hu states, 'thinking for yourself amid the AI noise.' In a recent episode of Tech Uncensored I spoke with Jing Hu to discuss the cognitive impacts from increasing usage of Chatbots built on LLMs. Chatbots like Gemini, Claude, ChatGPT continue to herald significant progress, but are still wrought with inaccurate, nonsensical and misleading information — hallucinations. The content generated can be harmful, unsafe, and often misused. LLMs today are not fully trustworthy, by the standards we should expect for full adoption of any software products. Are Writing and Coding Occupations at Risk? In her recent blog, Why thinking Hurts After Using AI, Hu writes, 'Seduced by AI's convenience, I'd rush through tasks, sending unchecked emails and publishing unvetted content,' and surmises that 'frequent AI usage is actively reshaping our critical thinking patterns.' Hu references OpenAI and UPenn study from 2023 that looks at the labor market impact from these LLMs. It states that tasks that involve science and critical thinking are the tasks that would be safe; however, those which involve programming and writing would be at risk. Hu cautions, 'however, this study is two years old, and at the pace of AI, it needs updating.' She explains, 'AI is very good at drafting articles, summarizing and formatting. However, we humans are irreplaceable when it comes to strategizing or discussing topics that are highly domain specific. Various research found that AI's knowledge is only surface level. This becomes especially apparent when it comes to originality.' Hu explains that when crafting marketing copy, 'we initially thought AI could handle all the writing. However, we noticed that AI tends to use repetitive phrases and predictable patterns, often constructing sentences like, "It's not about X, it's about Y," or overusing em-dashes. These patterns are easy to spot and can make the writing feel dull and uninspired.' For companies like Duolingo whose CEO promises to be an 'AI-first company,' replacing their contract employees is perhaps a knee-jerk decision that has yet to be brought to bear. The employee memo clarified that 'headcount will only be given if a team cannot automate more of their work.' The company was willing to take 'small hits on quality than move slowly and miss the moment.' For companies like this, Hu argues that they will run into trouble very soon and begin rehiring just to fix AI generated bugs or security issues. Generative AI for coding can be inaccurate because models were trained on Github, or similar databases. She explains, 'Every database has its own quirks and query syntax, and many contain hidden data or schema errors. If you rely on AI-generated sample code to wire them into your system, you risk importing references to tables or drivers that don't exist, using unsafe or deprecated connection methods, and overlooking vital error-handling or transaction logic. These mismatches can cause subtle bugs, security gaps, and performance problems—making integration far more error-prone than it first appears.' Another important consideration is cybersecurity, which must be approached holistically. 'If you focus on securing just one area, you might fix a vulnerability but miss the big picture,' she said. She points to the third issue: Junior developers using tools like Copilot often become overly confident in the code these tools generate. And when asked to explain their code, many are unable to do it because they don't truly understand what was produced. Hu concedes that AI is good at producing code quickly, however it is a only part (25-75%) of software development, 'People often ignore the parts that we do need: architecture, design, security. Humans are needed to configure the system properly for the system to run as a whole.' She explains that the parts of code that will be replaced by AI will be routine and repetitive, so this is an opportune moment for developers to transition, advising 'To thrive in the long term, how should we — as thinking beings —develop our capacity for complex, non-routine problem-solving? Specifically, how do we cultivate skills for ambiguous challenges that require analysis beyond pattern recognition (where AI excels)?' The Contradiction of Legacy Education and The Competition for Knowledge Creation In a recent article from the NY Times. 'Everyone is Cheating their Way through College,' a student remarked, 'With ChatGPT, I can write an essay in two hours that normally takes 12.' Cheating is not new, but as one student exclaimed, 'the ceiling has been blown off.' A professor remarks, 'Massive numbers of students are going to emerge from university with degrees, and into the workforce, who are essentially illiterate.' For Hu, removing AI from the equation does not negate cheating. Those who genuinely want to learn will choose how to use the tools wisely. Hu was at a recent panel discussion at Greenwich University and Hu commented to a question from a professor about whether to ban students from using AI: 'Banning AI in education misses the point. AI can absolutely do good in education, but we need to find a way so students don't offload their thinking to AI and lose the purpose of learning itself. The goal should be fostering critical thinking, not just policing the latest shortcut.' Another professor posed the question, 'If a student is not a native English speaker, but the exam requires them to write an essay in English, which approach is better? Hu commented that not one professor on this panel could answer the question. The situation was unfathomable and far removed from situations covered by current policy and governance. She observes, 'There is already a significant impact on education and many important decisions have yet to be made. It's difficult to make clear choices right now because so much depends on how technology will evolve and how fast the government and schools can adapt.' For educational institutions that have traditionally been centers of knowledge creation, the rise of AI is powerful — one that often feels more like a competitor than a tool. As a result, it has left schools struggling to determine how AI should be integrated to support student learning. Meanwhile, schools face a dilemma: many have been using generative AI to develop lessons, curricula, even review students' performance, yet the institution remains uncertain and inconsistent in their overall approach to AI. On a broader scale, the incentive structures within education are evolving. The obsession with grades have 'prevented teachers from using assessments that would support meaningful learning.' The shift towards learning and critical thinking may be the hope that students need to tackle an environment with pervasive AI. MIT Study Sites Cognitive Decline with Increasing LLM Use MIT Media Lab produced a recent study that monitored the brain activity of about 60 research subjects. These participants were asked to write essays on given topics and were split into three groups: 1) use LLM only 2) use traditional search engine only 3) use only their brain and no other external aid. The conclusion: 'LLM users showed significantly weaker neural connectivity, indicating lower cognitive effort and engagement compared to others.' Brain connectivity is scaled down with the amount of external support. This MIT brain scans show: Writing with Google dims your brain by up to 48%. ChatGPT pulls the plug, with 55% less neural connectivity. Some other findings: Hu noticed that the term 'cognitive decline' was misleading since the study was conducted over a four-month period. We've yet to see the long-term effects. However, she acknowledges that in one study about how humans develop amnesia suggests just this: either we use it or lose it. She adds, 'While there are also biological factors involved such as changes in brain proteins, reduced brain activity is thought to increase the risk of diseases that affect memory.' The MIT study found that the brain-only group showed much more active brain waves compared to the search-only and LLM-only groups. In the latter two groups, participants relied on external sources for information. The search-only group still needed some topic understanding to look up information, and like using a calculator — you must understand its functions to get the right answer. In contrast, the LLM-only group simply had to remember the prompt used to generate the essay, with little to no actual cognitive processing involved. As Hu noted, 'there was little mechanism formulating when only AI was used in writing an essay. This ease of using AI, just by inputting natural language, is what makes it dangerous in the long run.' AI Won't Replace Humans, but Humans using AI Will — is Bull S***! Hu pointed to this phrase that has been circulating on the web: 'AI won't Replace Humans, but Humans using AI Will.' She argues that this kind of pressure will compel people to use AI, engineered from a position of fear explaining, 'If we refer to those studies on AI and critical thinking released last year, it is less about whether we use AI but more about our mindset, which determine how we interact with AI and what consequences you encounter.' Hu pointed to a list of concepts she curated from various studies she called AI's traits — how AI could impact our behavior: Hu stresses that we need to be aware of these traits when we work with AI on a daily basis and be mindful that we maintain our own critical thinking. 'Have a clear vision of what you're trying to achieve and continue to interrogate output from AI,' she advises. Shifting the Narrative So Humans are AI-Ready Humanity is caught in a tug of war between the provocation to adopt or be left behind and the warning to minimize dependence on a system that is far from trustworthy. When it comes to education, Hu, in her analysis of the MIT study, advocates for delaying AI integration. First, invest in independent self-directed learning to build the capacity for critical thinking, memory retention, and cognitive engagement. Secondly, make concerted efforts to use AI as a supplement — not a substitute. Finally, teach students to be mindful of AI's cognitive costs and lingering consequences. Encourage them to engage critically — knowing when to rely on AI and when to intervene with their own judgement. She realizes, 'In the education sector, there is a gap between the powerful tool and understanding how to properly leverage it. It's important to develop policy that sets boundaries for both students and faculty for AI responsible use.' Hu insists that implementing AI in the workforce needs to be done with tolerance and compassion. She points to a recent manifesto by Tobi Lütke's Shopify CEO, that called for an immediate and universal AI adoption within the company — a new uncompromising standard for current and future employees. This memo shared AI will be the baseline for work integration, improving productivity, setting performance standards which mandates a total acceptance of the technology. Hu worries that CEOs like Lütke are wielding AI to intimidate employees to work harder, or else! She alluded to one of the sections that demanded employees to demonstrate why a task could not be accomplished with AI before asking for more staff or budget as she asserts, 'This manifesto is not about innovation at all. It feels threatening and if I were an employee of Shopify, I would be in constant fear of losing my job. That kind of speech is unnecessary.' Hu emphasized that this would only discourage employees further, and it would embolden CEOs to continue to push the narrative of how AI is inevitably going to drive layoffs. She cautions CEOs to pursue an understanding of AI's limitations for to ensure sustainable benefit for their organizations. She encourages CEOs to pursue a practical AI strategy that complements workforce adoption, considers current data gaps, systems, and cultural limitations that will have more sustainable payoffs. Many CEOs today may be more likely to pursue a message with AI, 'we can achieve anything,' but this deviates from reality. Instead, develop transparent communication in lock-step with each AI implementation, that clarifies how AI will be leveraged to meet those goals, and what this will this mean for the organization. Finally, for individuals, Hu advises, 'To excel in a more pervasive world of AI, you need to clearly understand your personal goals and commit your effort to the more challenging ones requiring sustained mental effort. This is a significant step to start building the discipline and skills needed to succeed.' There was no mention, this time, of 'AI' in Hu's counsel. And rightly so — humans should own their efforts and outcomes. AI is a mere sidekick.

Latest news with #JingHu

Despite Billions In Investment, AI Reasoning Models Are Falling Short

Are We Finally Ceding Control To The Machine? The Human Costs Of AI Transformation

Get Started Now: Download the App