Crossed Wires: Artificial Intelligence reality check – not as smart as it thinks it is

Apple researchers' recent paper, The Illusion of Thinking, challenges the hype around AI, revealing its limitations in solving complex problems.
If one is to believe Sam Altman and other AI boosters and accelerationists, the era of abundance is almost upon us. AI is about to relieve us all of drudgery, ill health, poverty and many other miseries before leading us to some promised land where we will shed our burdens and turn our attention to loftier concerns. Any day now.
And so the publication of a paper by Apple researchers this month arrived as a refreshing dose of realism. It was titled The Illusion of Thinking and it broke the AI Internet. It concluded that ChatGPT-style GenAI models (like Claude, Gemini, DeepSeek and others) can only solve a constrained set of problems and tend to collapse spectacularly when complexity is introduced. The implications of the paper are clear – the underlying technologies that have so supercharged the AI narrative and fueled so much hyperbole have a long way to go before anyone attains the holy grail of Artificial General Intelligence (AGI) and the imagined utopia of techno-optimists.
For anyone with time and grit, here is the paper. One of the examples cited concerns the well-known 'Tower of Hanoi' problem, which involves stacking variously sized disks on a vertical rod. Any reasonably smart nine-year-old can find a solution, a very short computer program can describe the solution but, left to its own devices, GenAI cannot come up with a general solution to the problem. As more and more disks are added, the AI becomes a blithering idiot. It has no idea what it is doing. It is not able to 'generalise' from a few disks to many.
This leads to the inescapable conclusion that, if a child or a very short algorithm can best the most advanced 'reasoning' models from ChatGPT and Claude, then we are far from AGI. No matter what Sam Altman says.
It is not as if a whole slew of clever researchers are blind to this fact. There are some researchers busy trying to embed ethics and alignment into AI so that humans can survive its evolution without too much pain or possible extinction. There are some researchers who are taking what we have now and applying it to current real-world problems in science, education, healthcare or the sludge of institutional processes. And there are some who are saying: This version of AI, this 'deep learning' machine that has captured everyone's attention – it is simply not good enough. They are looking to invent something that breaks free of the constraints which Apple's paper so brutally highlights.
There are some clever band-aids available to patch over the obvious weaknesses of current AI models, such as a widely used technique called RL (Reinforcement Learning), which boosts learning after the AI has been trained, but these partial fixes do not address the basic weakness of the core architecture – they address the symptoms and not the cause.
It doesn't take an expert to know that humans learn in many different ways, all the way back to our warm launchpad in the womb. We have genetic programs gifted by our ancestors, we learn from our senses, we learn by example, we learn by trial and error, we learn by being taught by others, we learn by accident, we learn by intent, and then we also learn to reason, to generalize, to deduce, to infer. It is probably fair to say that we humans are learning machines – running all day, every day, from the moment of conception. Our learning may well be faulty, our memories inaccurate, our lessons sometimes haphazard, our failures manifold – but learn we do, always and forever.
It is in this area that the current crop of AI techniques are exposed as having only a thin veneer of competence. Take ChatGPT, at least in its text version. It has learnt how to predict the next word from a historical store of human-created documents reduced to gigantic matrices of statistically related words. There is not much more to it than that, even though its usefulness has astounded everyone.
But really, compare this with what our species does as we go about our daily business – learning, learning, learning, both to our benefit and sometimes to our detriment – all the time, unable to stop for even a microsecond. AI models are simply embarrassing next to that. Babies are smarter, primates are smarter. Dogs are smarter. The challenge of 'continuous autonomous learning' has yet to be met in current AI models.
Before I go overboard about the absurdity of the AGI-is-nearly-here claim, I should throw some light on what has been achieved, especially via the GenAI technologies. These are sometimes confusingly called Large Language Models (they now go way beyond mere language). What they can do is truly unprecedented. I use them all day, every day. They are novel and brilliant assistants. They are much, much smarter or faster than I am at doing a whole slew of important things. But I am much, much smarter than they are when it comes to a huge number of other things.
AGI, as now commonly defined, means the point at which AI equals (or betters) humans at all cognitive (as opposed to physical) tasks. I spend a large part of my day reading about the advances at the edge of this fabulous field, which is probably the most important technological development in human history. There is fabulous stuff coming down the line. A cure for cancer, perhaps. Infinite cheap energy. Healthy and long lives.
But will it be better than humans at all cognitive tasks? Not today. Not this year. Not next year. Not until AI is spawned as we are and learns as we do.
Like the witches riddle in Shakespeare's Macbeth, perhaps only when AI is of woman born. DM

Hashtags

Science

#AI

#TheIllusionofThinking

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Artificial intelligence — an aid to thought, not a replacement

Daily Maverick

9 hours ago

Daily Maverick

Artificial intelligence — an aid to thought, not a replacement

'The danger of outsourcing our thinking to machines is that we still have to live in the world they end up creating. That's too big a responsibility to just hand over.' When ChatGPT briefly went offline last week, it felt, as journalist and writer Gus Silber put it, 'as if the sun had fallen from the sky'. Speaking on a Jive Media Africa webinar on the subject of 'Machines are writing – do we trust the message?', Silber and other panellists tossed around concepts of 'Uberisation', 'forklifting' and 'outsourcing' to get to grips with AI technology and its ethical pitfalls. Silber noted that in just a few years, AI had morphed from novelty to necessity and is now deeply woven into daily work across media, academia and science communication. Its seductive convenience allows us to 'outsource thinking to a machine', said Silber, while noting both the potential and the perils of doing so. Fellow panellists, science communicator and champion of language equity in science Sibusiso Biyela and Michelle Riedlinger, associate professor in the school of communications at the Queensland University of Technology, agreed, in a discussion peppered with metaphors, to highlight the divisions of labour in the partnership between technology and humans. Introducing the webinar, Jive Media director Robert Inglis said that 'artificial intelligence, particularly generative AI, is reshaping both the practice of research and the craft of science communication. This impact is felt by researchers, by science communicators and by others working at the intersection of science, society and media and especially those who are grappling with how AI tools influence credibility, ethics and public trust.' While many fret over the elimination of jobs and the technological encroachment on the preserve of what it means to be human, Silber readily calls himself a Utopian on the subject, believing 'it's ultimately going to be of good for humanity'. Silber notes that the reach of AI, 'originally a niche technology, has expanded dramatically, driven by advances like fibre, broadband and always-on connectivity. Tools such as ChatGPT now serve as default knowledge engines, sometimes even surpassing Google. Being able to 'outsource a lot of your thinking to, effectively, a machine,' he said, 'tempts users to let AI handle increasingly complex tasks'. In academia and media, some rely heavily on AI-generated content, resulting in a sameness of voice: 'It sounds human, but it sounds human in a very kind of generic and samey way.' While AI offers powerful assistance in tasks like transcription – 'you can transcribe two hours' worth of interviews in five or ten minutes' – the risk is that its convenience leads to 'creative atrophy'. It's 'a real temptation, a kind of 'tyranny of ease', where you can just prompt the AI to write essays or theses. That scares me because it risks giving up your creative energy.' Collaborative use He nevertheless enthuses about the rise of multimodal AI and mentioned tools like Whisper, Notebook LMand Genspark AI, which are already revolutionising research, communication and creative industries. But he draws clear boundaries: 'I draw the line at outsourcing full creative processes to AI.' Instead, he advocates using AI collaboratively, augmenting human thought rather than replacing it. 'We're lucky to live in this creative technical renaissance. We can't go back to how things were before. My advice: explore these tools, break them, have fun and find ways to use them collaboratively. Let machines do the heavy lifting while we focus on human creativity.' Anxieties, however, are pervasive, said Riedlinger. Her research shows that news audiences 'found familiar concerns: misinformation, copyright, elections, job displacement.' But people weren't rejecting AI outright; 85% wanted transparency; visible labels, a kind of 'nutritional label' for AI-generated content.' She said there's a growing 'authenticity infrastructure' emerging, with companies like Adobe working on labelling multimodal content. Audiences want AI to support, not replace, human journalists and science communicators. 'The key is to keep humans in the loop, to ensure creativity, empathy and accountability remain central.' To help navigate this, Riedlinger reached for metaphors. First, she said, contrast 'forklifting versus weightlifting. Forklifting covers repetitive, heavy tasks – transcription, translation, drafting – where AI helps move things efficiently but under human guidance. Weightlifting represents skills that build strength: framing stories, interpreting data, learning audiences. These are areas we risk weakening if we outsource too much to AI.' The second is the 'Uber metaphor'. 'You can make coffee yourself or order it through Uber. It's convenient, but hides labour behind the scenes: the barista, the driver, data centres. Generative AI feels equally magical but isn't free; there are hidden costs in energy use, data scraping and ethical concerns. Before outsourcing, we must consider these unseen consequences. Hallucinations and bias 'In global studies, people increasingly recognise AI's limits: hallucinations, biases in gender, race, geography and class. Some see AI as a calculator, improving over time, but that's misleading. Calculators give fixed answers; generative AI doesn't.' Reaching for yet another metaphor, she said 'it's more like a talking mirror from a fairy tale', generating fluent, tailoredand sometimes flattering responses, but blending truth and invention in a way that can flatten creativity and make unique ideas more generic. 'Authenticity, trust and disclosure are vital. We need consistent labels, audience controland clear public policies.' This, said Riedlinger, will build trust over time. 'Science communicators must reflect on each task: Is this forklifting or weightlifting? Am I calling an Uber for something I should craft myself? Science communication deserves thoughtful tools and thoughtful users. We need to ensure that our publics have authentic interactions. ' The watchwords, when dealing with AI, are: 'Disclose. Collaborate. Stay in the loop as a human. Design for trust.' Picking up on the trust, or mistrust, of the machine, Biyela said 'there's a lot of antagonism around AI, especially with articles not disclosing if they're AI-assisted. When audiences hear something was generated by AI, they often turn away. It becomes less of an achievement if it wasn't really done by a human.' But, he said, 'audiences (and ourselves) need to understand AI's limitations and how it actually works. We call it artificial intelligence, but it's in no way intelligent. It's an automaton that looks like it's thinking, but it's not. It's a clever prediction model using computing power to make it seem like it's thinking for us. But it's not. The thinking is always being done by people. AI never does anything; it's always us. What it produces has been trained to give us what we want.' Biyela emphasises that 'You're the human in the loop' and have to account for every line an LLM is asked to produce. 'If it summarises something you haven't seen, you have to check it. It makes the job easier, but it doesn't perform it.' Caveats aside, Biyela says 'generative AI also offers potential in communicating science in underserved languages, like African languages. Driving AI In his conclusion, Inglis, too, reached for a metaphor to guide how science communicators and other professionals and students should engage with AI: 'We would never jump into a car without having learnt to drive the thing. Now you've got these tools at our disposal and we'll use them, but we've got to be aware of the dangers that using them for the wrong things can bring about in the world.' In short, the panel agreed that in the partnership between AI and people, AI is good at the 'forklifting' work: sorting, calculating, transcribing, processing vast amounts of data quickly, but that humans still carry the mental load: setting priorities, interpreting meaning, understanding context, reading emotions, anticipating unintended consequencesand ultimately taking responsibility for decisions. Inglis further reflected: 'Our work in science communication is to play a part in solving the complex challenges we face and to ensure we do so in ways that build a better future for society and for the planet.' He cited a recent study by Apple, which reveals just how bad large reasoning models are when it comes to deep reasoning, having been found to face a 'complete accuracy collapse beyond certain complexities'. 'This underlines the need for human operators to use these tools as an aid to thinking, not as a replacement for thinking. That grappling with complex ideas is exactly what we're doing with this webinar series – these kinds of answers can't be scraped from the web, they need to be generated and discovered through exploration, conversation, dialogue and skilful engagement. 'The danger of outsourcing our thinking to machines is that we still have to live in the world they end up creating. That's too big a responsibility to just hand over because it's easier than engaging with tough issues. It's lazy and at this time in the history of our planet, we can't afford to be lazy.' DM

Daily Maverick

15 hours ago

Daily Maverick

Crossed Wires: Artificial Intelligence reality check – not as smart as it thinks it is

Apple researchers' recent paper, The Illusion of Thinking, challenges the hype around AI, revealing its limitations in solving complex problems. If one is to believe Sam Altman and other AI boosters and accelerationists, the era of abundance is almost upon us. AI is about to relieve us all of drudgery, ill health, poverty and many other miseries before leading us to some promised land where we will shed our burdens and turn our attention to loftier concerns. Any day now. And so the publication of a paper by Apple researchers this month arrived as a refreshing dose of realism. It was titled The Illusion of Thinking and it broke the AI Internet. It concluded that ChatGPT-style GenAI models (like Claude, Gemini, DeepSeek and others) can only solve a constrained set of problems and tend to collapse spectacularly when complexity is introduced. The implications of the paper are clear – the underlying technologies that have so supercharged the AI narrative and fueled so much hyperbole have a long way to go before anyone attains the holy grail of Artificial General Intelligence (AGI) and the imagined utopia of techno-optimists. For anyone with time and grit, here is the paper. One of the examples cited concerns the well-known 'Tower of Hanoi' problem, which involves stacking variously sized disks on a vertical rod. Any reasonably smart nine-year-old can find a solution, a very short computer program can describe the solution but, left to its own devices, GenAI cannot come up with a general solution to the problem. As more and more disks are added, the AI becomes a blithering idiot. It has no idea what it is doing. It is not able to 'generalise' from a few disks to many. This leads to the inescapable conclusion that, if a child or a very short algorithm can best the most advanced 'reasoning' models from ChatGPT and Claude, then we are far from AGI. No matter what Sam Altman says. It is not as if a whole slew of clever researchers are blind to this fact. There are some researchers busy trying to embed ethics and alignment into AI so that humans can survive its evolution without too much pain or possible extinction. There are some researchers who are taking what we have now and applying it to current real-world problems in science, education, healthcare or the sludge of institutional processes. And there are some who are saying: This version of AI, this 'deep learning' machine that has captured everyone's attention – it is simply not good enough. They are looking to invent something that breaks free of the constraints which Apple's paper so brutally highlights. There are some clever band-aids available to patch over the obvious weaknesses of current AI models, such as a widely used technique called RL (Reinforcement Learning), which boosts learning after the AI has been trained, but these partial fixes do not address the basic weakness of the core architecture – they address the symptoms and not the cause. It doesn't take an expert to know that humans learn in many different ways, all the way back to our warm launchpad in the womb. We have genetic programs gifted by our ancestors, we learn from our senses, we learn by example, we learn by trial and error, we learn by being taught by others, we learn by accident, we learn by intent, and then we also learn to reason, to generalize, to deduce, to infer. It is probably fair to say that we humans are learning machines – running all day, every day, from the moment of conception. Our learning may well be faulty, our memories inaccurate, our lessons sometimes haphazard, our failures manifold – but learn we do, always and forever. It is in this area that the current crop of AI techniques are exposed as having only a thin veneer of competence. Take ChatGPT, at least in its text version. It has learnt how to predict the next word from a historical store of human-created documents reduced to gigantic matrices of statistically related words. There is not much more to it than that, even though its usefulness has astounded everyone. But really, compare this with what our species does as we go about our daily business – learning, learning, learning, both to our benefit and sometimes to our detriment – all the time, unable to stop for even a microsecond. AI models are simply embarrassing next to that. Babies are smarter, primates are smarter. Dogs are smarter. The challenge of 'continuous autonomous learning' has yet to be met in current AI models. Before I go overboard about the absurdity of the AGI-is-nearly-here claim, I should throw some light on what has been achieved, especially via the GenAI technologies. These are sometimes confusingly called Large Language Models (they now go way beyond mere language). What they can do is truly unprecedented. I use them all day, every day. They are novel and brilliant assistants. They are much, much smarter or faster than I am at doing a whole slew of important things. But I am much, much smarter than they are when it comes to a huge number of other things. AGI, as now commonly defined, means the point at which AI equals (or betters) humans at all cognitive (as opposed to physical) tasks. I spend a large part of my day reading about the advances at the edge of this fabulous field, which is probably the most important technological development in human history. There is fabulous stuff coming down the line. A cure for cancer, perhaps. Infinite cheap energy. Healthy and long lives. But will it be better than humans at all cognitive tasks? Not today. Not this year. Not next year. Not until AI is spawned as we are and learns as we do. Like the witches riddle in Shakespeare's Macbeth, perhaps only when AI is of woman born. DM

Google turns internet queries into conversations

eNCA

a day ago

eNCA

Google turns internet queries into conversations

SAN FRANCISCO - Google began letting people turn online searches into conversations, with generative artificial intelligence providing spoken summaries of query results. With Audio Overviews, Gemini AI models quickly sum up query results in conversational style, according to Google. "An audio overview can help you get a lay of the land, offering a convenient, hands-free way to absorb information whether you're multitasking or simply prefer an audio experience," Google said in a blog post. "We display helpful web pages right within the audio player on the search results page so you can easily dive in and learn more." Google is beefing up online search with generative artificial intelligence, embracing AI despite fears for its ad-based business model. CEO Sundar Pichai recently unveiled a new AI mode in Google search. The search engine's nascent AI mode goes further than AI Overviews which display answers to queries from the tech giant's generative AI powers above the traditional blue links to websites and ads. Since Google debuted AI Overviews in search slightly more than a year ago, it has grown to more than 1.5 billion users across several countries, according to Pichai. Google's push into generative AI comes amid intensifying competition with OpenAI's ChatGPT, which has itself incorporated search engine features into its popular chatbot.

Crossed Wires: Artificial Intelligence reality check – not as smart as it thinks it is

Hashtags

Try Our AI Features

Comments

Related Articles

Artificial intelligence — an aid to thought, not a replacement

Crossed Wires: Artificial Intelligence reality check – not as smart as it thinks it is

Google turns internet queries into conversations

Get Started Now: Download the App