Latest news with #TuringTest


Tom's Guide
17-05-2025
- Tom's Guide
Has AI exceeded human levels of intelligence? The answer is more complicated than you might think
It's no wonder that many of us find the idea of artificial general intelligence (AGI) mildly terrifying. Hollywood script writers have long enjoyed stretching the idea of self-aware computers to their most unsettling extremes. If you've watched the likes of '2001: A Space Odyssey', the 'Terminator' franchise or 'Ex Machina', then you've already had a flavor of where AGI could take us — and it rarely ends well. While you certainly shouldn't believe everything you see at the movies, the concept of AGI is a hot topic of discussion for computer scientists, theorists and philosophers. Is AGI's reputation as the harbinger of inevitable apocalypse a fair one? And how long have we got until AGI becomes a genuine concern? IBM gives one of the more succinct and straightforward definitions of AGI: 'Artificial general intelligence is a hypothetical stage in the development of machine learning in which an artificial intelligence system can match or exceed the cognitive abilities of human beings across any task'. If that sounds a bit like the Turing Test, it's not dissimilar. But while Alan Turing's famous game challenges participants to differentiate humans and computers from the text-based responses, true AGI goes beyond wanting to merely mimic human intelligence. And although generative AI models like ChatGPT and Google Gemini are already smart enough to hold very convincing conversations, they do so by using their 'training' to predict what the next best word in the sentence should be. AGI, on the other hand, seeks deeper, self-directed comprehension. It effectively has its own independent consciousness, that is able to autonomously learn, understand, communicate and form goals without the guiding hand of a human. Get instant access to breaking news, the hottest reviews, great deals and helpful tips. To level up from the AI we have now, AGI needs to demonstrate a combination of physical and intellectual traits that we'd normally associate with organic lifeforms. Intuitive visual and auditory perception, for example, that goes beyond basic identification that tools like Google Lens can already achieve; creativity that isn't merely an aggregated regurgitation of what has gone before; problem solving that improves upon learned diagnostics to incorporate a form of common sense. Only artificial intelligence that can demonstrate independent reasoning, learning and empathy can be regarded as true AGI. The word 'hypothetical' in IBM's above definition of AGI may sound disappointing to AI advocates and reassuring to those fearing the rise of our digital overlords. But AGI's fruition is seen by most commentators as a matter of when rather than if. Indeed, some researchers think that it has already arrived. A Google engineer (since fired) claimed in 2022 that the company's LaMDA chatbot understood its own personhood and was indistinguishable from 'a 7-year-old, 8-year-old kid that happens to know physics'. And a 2025 study in which OpenAI's GPT-4.5 is claimed to have passed the Turing Test is seen as further proof. But most experts see this view as having jumped the gun on the basis that these models have only mastered the game of imitation, rather than developed their own general intelligence. Ray Kurzweil predicts that AGI is just around the corner. The trusted academic, who has a track record for anticipating leaps forward in artificial intelligence, foretold its advent in the 2030s in his 2005 book 'The Singularity Is Near'. He subsequently doubled down on this prediction in 2024's 'The Singularity Is Nearer', stating that artificial intelligence will 'reach human levels by around 2029' and will go on to 'multiply the human biological machine intelligence of our civilization a billion-fold'. Kurzweil is more optimistic than most. In 2022, the 'Expert Survey on Progress in AI' received responses from 738 machine learning researchers. When asked for a forecast of when there would be a 50% chance of high-level machine intelligence (which shares many of the same traits as AGI), the average prediction was 2059. The emergence in the second half of the 21st century is a timeline shared by many moderate estimators. For others, however, the notion of computers reaching a human-like level of sentience is the domain only of the science fiction genre — or, at best, way beyond our lifetimes. The short answer is no. Regardless of whether they already pass the Turing Test, how good ChatGPT is at helping you through a panic attack, or how smart Anthropic's Claude is getting, the current crop of AI chatbots still fall short of the recognized requirements for AGI. But these large language models (LLMs) shouldn't be written out of AGI's story entirely. Their popularity and exponential growth in users could be a useful foundation of AGI's development, according to innovators like OpenAI co-creator Ilya Sutskever. He suggests that LLMs are a path to AGI, likening their predictive nature to a genuine understanding about the world. Co-founder of Google's DeepMind, Demis Hassabis, is another prominent AI spokesperson who sees these chatbots as a component of AGI development. Unsurprisingly, there are plenty of dissenting voices, too. Another voice from Google, François Chollet is an AI researcher and co-founder of the global ARC Prize for progress towards AGI. His view is that OpenAI has actually 'set back progress to AGI by five to 10 years', and says that 'LLMs essentially sucked the oxygen out of the room — everyone is doing LLMs'. Meta's Chief AI Scientist, Yann LeCun, agrees that LLMs are a dead end when it comes to advancements in AGI.
Yahoo
11-05-2025
- Science
- Yahoo
Can ChatGPT pass the Turing Test yet?
Artificial intelligence chatbots like ChatGPT are getting a whole lot smarter, a whole lot more natural, and a whole lot more…human-like. It makes sense — humans are the ones creating the large language models that underpin AI chatbots' systems, after all. But as these tools get better at "reasoning" and mimicking human speech, are they smart enough yet to pass the Turing Test? For decades, the Turing Test has been held up as a key benchmark in machine intelligence. Now, researchers are actually putting LLMs like ChatGPT to the test. If ChatGPT can pass, the accomplishment would be a major milestone in AI development. So, can ChatGPT pass the Turing Test? According to some researchers, yes. However, the results aren't entirely definitive. The Turing Test isn't a simple pass/fail, which means the results aren't really black and white. Besides, even if ChatGPT could pass the Turing Test, that may not really tell us how 'human' an LLM really is. Let's break it down. The concept of the Turing Test is actually pretty simple. The test was originally proposed by British mathematician Alan Turing, the father of modern computer science and a hero to nerds around the world. In 1949 or 1950, he proposed the Imitation Game — a test for machine intelligence that has since been named for him. The Turing Test involves a human judge having a conversation with both a human and a machine without knowing which one is which (or who is who, if you believe in AGI). If the judge can't tell which one is the machine and which one is the human, the machine passes the Turing Test. In a research context, the test is performed many times with multiple judges. Of course, the test can't necessarily determine if a large language model is actually as smart as a human (or smarter) — just if it's able to pass for a human. Large language models, of course, do not have a brain, consciousness, or world model. They're not aware of their own existence. They also lack true opinions or beliefs. Instead, large language models are trained on massive datasets of information — books, internet articles, documents, transcripts. When text is inputted by a user, the AI model uses its "reasoning" to determine the most likely meaning and intent of the input. Then, the model generates a response. At the most basic level, LLMs are word prediction engines. Using their vast training data, they calculate probabilities for the first 'token' (usually a single word) of the response using their vocabulary. They repeat this process until a complete response is generated. That's an oversimplification, of course, but let's keep it simple: LLMs generate responses to input based on probability and statistics. So, the response of an LLM is based on mathematics, not an actual understanding of the world. So, no, LLMs don't actually think in any sense of the word. Joseph Maldonado / Mashable Composite by Rene Ramos Credit: Mashable There have been quite a few studies to determine if ChatGPT has passed the Turing test, and many of them have had positive findings. That's why some computer scientists argue that, yes, large language models like GPT-4 and GPT-4.5 can now pass the famous Turing Test. Most tests focus on OpenAI's GPT-4 model, the one that's used by most ChatGPT users. Using that model, a study from UC San Diego found that in many cases, human judges were unable to distinguish GPT-4 from a human. In the study, GPT-4 was judged to be a human 54% of the time. However, this still lagged behind actual humans, who were judged to be human 67% of the time. Then, GPT-4.5 was released, and the UC San Diego researchers performed the study again. This time, the large language model was identified as human 73% of the time, outperforming actual humans. The test also found that Meta's LLaMa-3.1-405B was able to pass the test. Other studies outside of UC San Diego have also given GPT passing grades, too. A 2024 University of Reading study of GPT-4 had the model create answers for take-home assessments for undergraduate courses. The test graders weren't told about the experiment, and they only flagged one of 33 entries. ChatGPT received above-average grades with the other 32 entries. So, are these studies definitive? Not quite. Some critics (and there are a lot of them) say these research studies aren't as impressive as they seem. That's why we aren't ready to definitively say that ChatGPT passes the Turing Test. We can say that while previous-gen LLMs like GPT-4 sometimes passed the Turing test, passing grades are becoming more common as LLMs get more advanced. And as cutting-edge models like GPT-4.5 come out, we're fast headed toward models that can easily pass the Turing Test every time. OpenAI itself certainly envisions a world in which it's impossible to tell human from AI. That's why OpenAI CEO Sam Altman has invested in a human verification project with an eyeball-scanning machine called The Orb. We decided to ask ChatGPT if it could pass the Turing Test, and it told us yes, with the same caveats we've already discussed. When we posed the question, "Can ChatGPT pass the Turing Test?" to the AI chatbot (using the 4o model), it told us, "ChatGPT can pass the Turing Test in some scenarios, but not reliably or universally." The chatbot concluded, "It might pass the Turing Test with an average user under casual conditions, but a determined and thoughtful interrogator could almost always unmask it." AI-generated image. Credit: OpenAI Some computer scientists now believe the Turing test is outdated, and that it's not all that helpful in judging large language models. Gary Marcus, an American psychologist, cognitive scientist, author, and popular AI prognosticator, summed it up best in a recent blog post, where he wrote, 'as I (and many others) have said for years, the Turing Test is a test of human gullibility, not a test of intelligence." It's also worth keeping in mind that the Turing Test is more about the perception of intelligence rather than actual intelligence. That's an important distinction. A model like ChatGPT 4o might be able to pass simply by mimicking human speech. Not only that, but whether or not a large language model passes the test will vary depending on the topic and the tester. ChatGPT could easily ape small talk, but it could struggle with conversations that require true emotional intelligence. Not only that, but modern AI systems are used for much more than chatting, especially as we head toward a world of agentic AI. None of that is to say that the Turing Test is irrelevant. It's a neat historical benchmark, and it's certainly interesting that large language models are able to pass it. But the Turing Test is hardly the gold-standard benchmark of machine intelligence. What would a better benchmark look like? That's a whole other can of worms that we'll have to save for another story. Disclosure: Ziff Davis, Mashable's parent company, in April filed a lawsuit against OpenAI, alleging it infringed Ziff Davis copyrights in training and operating its AI systems.


Forbes
23-04-2025
- Science
- Forbes
Can AI Understand? The Chinese Room Argument Says No, But Is It Right?
Artificial intelligence is everywhere these days. AI and the tools that enable it, including machine learning and neural networks, have, of course, been the subject of intensive research and engineering progress going back decades, dating back to the 1950s and early 60s. Many of the foundational concepts and mathematics are much older. But throughout its history, up to the present state-of-the-art large language models, the question remains: Are these systems genuinely intelligent, or are they merely sophisticated simulations? In other words, do they understand? At the heart of this debate lies a famous philosophical thought experiment — the Chinese room argument — proposed by the philosopher John Searle in 1980. The Chinese room argument challenges the claim that AI can genuinely understand language, let alone possess true consciousness. The thought experiment goes like this: A person who knows no Chinese sits inside a sealed room. Outside, a native Chinese speaker passes notes written in Chinese through a slot to the person inside. Inside the room, the person follows detailed instructions from a manual, written in English, that tells them exactly how to respond to these notes using a series of symbols. As they receive input characters in Chinese, the manual tells the person, in English, what output characters in Chinese and in what sequence they should pass back out the slot. By mechanically and diligently following the instructions, the person inside the room returns appropriate replies to the Chinese speaker outside the room. From the perspective of the Chinese speaker outside, the room seems perfectly capable of understanding and communicating in Chinese. To them, the room is a black box; they have no knowledge about what is happening inside. Yet, at the core of Searle's argument, neither the person inside nor the room itself actually understands the Chinese language. They are simply systematically manipulating symbols based on the rules in the instruction manual. The essence of the argument is that understanding requires something beyond the mere manipulation of symbols and syntax. It requires semantics — meaning and intentionality. AI systems, no matter how sophisticated, Searle argues, are fundamentally similar to the person inside the Chinese room. And therefore cannot have true understanding, no matter how sophisticated they may get. Searle's argument did not emerge in isolation. Questions about whether AI actually learns and understands are not new; it has been fiercely debated for decades, deeply rooted in philosophical discussions about the nature of learning and intelligence. The philosophical foundations of questioning machine intelligence date back much further than 1980 when Searle published his now famous paper. Most notably to Alan Turing's seminal paper in 1950 where he proposed the 'Turing Test'. In Turing's scenario, a computer is considered intelligent if it can hold a conversation indistinguishable from that of a human. In other words, if the human interacting with the computer cannot tell if it is another human or a machine. While Turing focused on practical interactions and outcomes between the human and the computer, Searle asked a deeper philosophical question: Even if a computer passes the Turing Test, does it have or lack genuine understanding? Can it ever? Well before Searle and Turing, philosophers including René Descartes and Gottfried Leibniz had grappled with the nature of consciousness and mechanical reasoning. Leibniz famously imagined a giant mill as a metaphor for the brain, arguing that entering it would reveal nothing but mechanical parts, never consciousness or understanding. Somehow, consciousness is an emergent property of the brain. Searle's Chinese room argument extends these ideas explicitly to computers, emphasizing the limits of purely mechanical systems. Since its introduction, the Chinese room argument has sparked significant debate and numerous counter arguments. Responses generally fall into a few different key categories. One group of related responses, referred to as the 'systems reply', argues that although the individual in the room might not understand Chinese, the system as a whole — including the manual, the person, and the room — does. Understanding, in this view, emerges from the entire system rather than from any single component. The focus on the person inside the room, for these counter arguments, is misguided. Searle argued against this by suggesting that the person could theoretically memorize the entire manual, in essence not requiring the room or the manual and becoming the whole system themselves, and still not understand Chinese —adding that understanding requires more than following rules and instructions. Another group of counter arguments, the 'robot reply', suggest that it is necessar to embed the computer within a physical robot that interacts with the world, allowing sensory inputs and outputs. These counter arguments propose that real understanding requires interaction with the physical world, something Searle's isolated room lacks. But similarly, Searle countered that adding sensors to an embodied robot does not solve the fundamental problem — the system, in this case including the robot, would still be following instructions it did not understand. Counter arguments that fall into the 'brain simulator reply' category propose that if an AI could precisely simulate every neuron in a human brain, it would necessarily replicate the brain's cognitive processes and, by extension, its understanding. Searle replied that even perfect simulation of brain activity does not necessarily create actual understanding, exposing a deep fundamental question in the debate: What exactly is understanding, even in our own brains and minds? The common thread in Searle's objections is that the fundamental limitation his thought experiment proposes remains unaddressed by these counter arguments: the manipulation of symbols alone, i.e. syntax, no matter how complex or seemingly intelligent, does not imply comprehension or understanding, i.e. semantics. To me, there is an even more fundamental limitation to Searle's argument: For any formally mechanical syntactic system that manipulates symbols, in the absence of any understanding, there is no way for the person in the room to be able to make decisions about which symbols to send back out. In the use of real language, there is an increasingly large space of parallel and equal branching syntactic, i.e. symbol, decisions that can occur that can only be resolved and decided by semantic understanding. Admittedly, the person in the room does not know Chinese. But this is by the very narrow construction of the thought experiment in the first place. As a result, Searle's argument is not 'powerful' enough — is logically insufficient —to conclude that some future AI will not be able to understand and think, because any true understanding AI must necessarily exceed the constraints of the thought experiment. Any decisions about syntax, i.e. what symbols are chosen and in what order they are placed, must necessarily be dependent on an understanding of the semantics, i.e. context and meaning, of the incoming message in order to reply back in a meaningful way. In other words, the number of equally possible parallel syntactic choices are really large, and can only be disambiguated by some form of semantic understanding. Syntactic decisions are not linear and sequential. They are parallel and branching. In any real conversation, the Chinese speaker on the outside would not be fooled. So do machines 'understand'? Maybe, maybe not, and maybe they never will. But if an AI can interact using language by responding in ways that clearly demonstrate branching syntactic decisions that are increasingly complex, meaningful, and relevant to the human (or other agent) it is interacting with, eventually it will cross a threshold that invalidates Searle's argument. The Chinese room argument matters given the increasing language capabilities of today's large language models and other advancing forms of AI. Do these systems genuinely understand language? Are they actually reasoning? Or are they just sophisticated versions of Searle's Chinese room? Current AI systems rely on performing huge statistical computations that predict the next word in a sentence based on learned probabilities, without any genuine internal experience or comprehension. Most experts agree that, as impressive as they are, they are fundamentally similar to Searle's Chinese room. But will this remain so? Or will a threshold be crossed from today's systems that perform sophisticated but 'mindless' statistical pattern matching to systems that truly understand and reason in the sense that they have an internal representation, meaning, and experience of their knowledge and the manipulation of that knowledge? The ultimate question may be if they do cross such a threshold would we even know it or recognize it? We as humans do not fully understand our own minds. Part of the challenge in understanding our own conscious experience is precisely that it is an internal self-referential experience. So how will we go about testing or recognizing it in an intelligence that is physically different and operates using different algorithms than our own? Right or wrong, Searle's argument — and all the thinking it has inspired —has never been more relevant.
Yahoo
07-04-2025
- Science
- Yahoo
AI model passes Turing Test ‘better than a human'
A leading AI chatbot has passed a Turing Test more convincingly than a human, according to a new study. Participants in a blind test judged OpenAI's GPT-4.5 model, which powers the latest version of ChatGPT, to be a human 'significantly more often than actual humans'. The Turing Test, first proposed by the British computer scientist Alan Turing in 1950, is meant to be a barometer of whether artificial intelligence can match human intelligence. The test involves a text-based conversation with a human interrogator, who has to assess whether the interaction is with another human or a machine. Nearly 300 participants took part in the latest study, which ran tests for various chatbots and large language models (LLMs). OpenAI's GPT-4.5 was judged to be a human 73 per cent of the time when instructed to adopt a persona. 'We think this is pretty strong evidence that [AI chatbots] do [pass the Turing Test],' Dr Cameron Jones, a postdoc researcher from UC San Diego who led the study, wrote in a post to X. 'And 4.5 was even judged to be human significantly more often than actual humans.' It is not the first time that an AI programme has beaten the Turing Test, though the researchers from UC San Diego who conducted the study claim this to be the most comprehensive proof that the benchmark has been passed. Other models tested in the latest research included Meta's Llama-3.1, which passed less convincingly, and an early chatbot called ELIZA, which failed. Despite passing the Turing Test, the researchers noted that it does not mean that the AI bots have human-level intelligence, also known as artificial general intelligence (AGI). This is because LLMs are trained on large data sets in order to predict what a correct answer might be, making them essentially an advanced form of pattern recognition. 'Does this mean LLMs are intelligent? I think that's a very complicated question that's hard to address in a paper (or a tweet),' Dr Jones said. 'Broadly I think this should be evaluated as one among many other pieces of evidence for the kind of intelligence LLMs display. 'More pressingly, I think the results provide more evidence that LLMs could substitute for people in short interactions without anyone being able to tell. This could potentially lead to automation of jobs, improved social engineering attacks, and more general societal disruption.' The research is detailed in a preprint study, titled 'Large language models pass the Turing Test'. Sign in to access your portfolio
Yahoo
06-04-2025
- Science
- Yahoo
Terrifying study reveals AI robots have passed ‘Turing test' — and are now indistinguishable from humans, scientists say
The AI-mpersonation is complete. The dystopian lessons in every sci-fi movie from 'Terminator' to 'Ex Machina' appear to be coming true. Artificial intelligence has become so sophisticated that bots are no longer discernable from their human counterparts, per a concerning preprint study conducted by scientists at the University of California in San Diego. 'People were no better than chance at distinguishing humans from GPT-4.5 and LLaMa (a multi-lingual language model released by Meta AI),' concluded head author Cameron Jones, a researcher at UC San Diego's Language and Cognition Lab, in an X post. The researchers set out to see if AI bots, which are programmed to parrot human speech patterns, could pass the iconic Turing Test. Developed by British WWII code breaker and computer scientist Alan Turing, this tech-istential exam gauges the intelligence of machines by determining if their digital discourse could be differentiated from that of a human — if they can't tell the difference, the machine has passed. Researchers tested four large language models (LLMs) — GPT-4o, LLaMa-3, and GPT-4.5 and Eliza (a 1960s era chat program) — in two randomized and controlled trials, Mashable reported. To administer said Turing Test, they enlisted 126 undergraduate students from the University of California San Diego and 158 people from online data pool Prolific, the Daily Mail reported. These participants were instructed to have five-minute simultaneous online exchanges with a robot and a human to see if they could spot the Decepticon — with the kicker being that they didn't know which was which. Meanwhile, the human and AI respondents were tasked with convincing their interrogator that they were human. Researchers found that, when'prompted to adopt a humanlike persona, GPT-4.5 was judged to be the human 73% of the time.' This was 'significantly more often than interrogators selected the real human participant,' and of course, more than enough to pass the test, per the study. Meanwhile, 'LLaMa-3.1, with the same prompt, was judged to be the human 56% of the time.' While not significantly less than its human brethren, this still achieved a passing grade. Finally, the baseline models (ELIZA and GPT-4o) failed after deceiving the witnesses just 23% and 21% of the time respectively. Researchers found that it was paramount to have the synthetic mimics adopt a human persona because, when administered a second test sans said prompt, they performed significantly worse. Caveats aside, these 'results constitute the first empirical evidence that any artificial system passes a standard three-party Turing test,' researchers concluded. Does this mean that AI-infused learning language models are intelligent? 'I think that's a very complicated question that's hard to address in a paper (or a tweet),' said Jones on X. 'But broadly I think this should be evaluated as one among many other pieces of evidence for the kind of intelligence LLMs display.' Interestingly, the experts at Psychology Today concluded that the bots had beaten the Turing Test, not through smarts, but by being a 'better' human than the actual humans. 'While the Turing Test was supposed to measure machine intelligence, it has inadvertently revealed something far more unsettling: our growing vulnerability to emotional mimicry,' wrote John Nosta, founder of the innovation think tank Nosta Lab, while describing this man-squerade. 'This wasn't a failure of AI detection. It was a triumph of artificial empathy.' Nosta based his analysis on the fact that participants rarely asked logical questions, instead prioritizing 'emotional tone, slang, and flow,' and basing their selections on which 'one had more of a human vibe.' He concluded, 'In other words, this wasn't a Turing Test. It was a social chemistry test— a measure of intelligence, but of emotional fluency. And the AI aced it.' This isn't the first time AI has demonstrated an uncanny ability to pull the wool over our eyes. In 2023, OpenAI's GPT-4 tricked a human into thinking it was blind to cheat the online CAPTCHA test that determines if users are human.