Latest news with #FrançoisChollet


Scientific American
18-07-2025
- Science
- Scientific American
AI's Achilles Heel—Puzzles Humans Solve in Seconds Often Defy Machines
There are many ways to test the intelligence of an artificial intelligence —conversational fluidity, reading comprehension or mind-bendingly difficult physics. But some of the tests that are most likely to stump AIs are ones that humans find relatively easy, even entertaining. Though AIs increasingly excel at tasks that require high levels of human expertise, this does not mean that they are close to attaining artificial general intelligence, or AGI. AGI requires that an AI can take a very small amount of information and use it to generalize and adapt to highly novel situations. This ability, which is the basis for human learning, remains challenging for AIs. One test designed to evaluate an AI's ability to generalize is the Abstraction and Reasoning Corpus, or ARC: a collection of tiny, colored-grid puzzles that ask a solver to deduce a hidden rule and then apply it to a new grid. Developed by AI researcher François Chollet in 2019, it became the basis of the ARC Prize Foundation, a nonprofit program that administers the test—now an industry benchmark used by all major AI models. The organization also develops new tests and has been routinely using two (ARC-AGI-1 and its more challenging successor ARC-AGI-2). This week the foundation is launching ARC-AGI-3, which is specifically designed for testing AI agents—and is based on making them play video games. Scientific American spoke to ARC Prize Foundation president, AI researcher and entrepreneur Greg Kamradt to understand how these tests evaluate AIs, what they tell us about the potential for AGI and why they are often challenging for deep-learning models even though many humans tend to find them relatively easy. Links to try the tests are at the end of the article. On supporting science journalism If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today. [ An edited transcript of the interview follows. ] What definition of intelligence is measured by ARC-AGI-1? Our definition of intelligence is your ability to learn new things. We already know that AI can win at chess. We know they can beat Go. But those models cannot generalize to new domains; they can't go and learn English. So what François Chollet made was a benchmark called ARC-AGI—it teaches you a mini skill in the question, and then it asks you to demonstrate that mini skill. We're basically teaching something and asking you to repeat the skill that you just learned. So the test measures a model's ability to learn within a narrow domain. But our claim is that it does not measure AGI because it's still in a scoped domain [in which learning applies to only a limited area]. It measures that an AI can generalize, but we do not claim this is AGI. How are you defining AGI here? There are two ways I look at it. The first is more tech-forward, which is 'Can an artificial system match the learning efficiency of a human?' Now what I mean by that is after humans are born, they learn a lot outside their training data. In fact, they don't really have training data, other than a few evolutionary priors. So we learn how to speak English, we learn how to drive a car, and we learn how to ride a bike—all these things outside our training data. That's called generalization. When you can do things outside of what you've been trained on now, we define that as intelligence. Now, an alternative definition of AGI that we use is when we can no longer come up with problems that humans can do and AI cannot—that's when we have AGI. That's an observational definition. The flip side is also true, which is as long as the ARC Prize or humanity in general can still find problems that humans can do but AI cannot, then we do not have AGI. One of the key factors about François Chollet's benchmark... is that we test humans on them, and the average human can do these tasks and these problems, but AI still has a really hard time with it. The reason that's so interesting is that some advanced AIs, such as Grok, can pass any graduate-level exam or do all these crazy things, but that's spiky intelligence. It still doesn't have the generalization power of a human. And that's what this benchmark shows. How do your benchmarks differ from those used by other organizations? One of the things that differentiates us is that we require that our benchmark to be solvable by humans. That's in opposition to other benchmarks, where they do 'Ph.D.-plus-plus' problems. I don't need to be told that AI is smarter than me—I already know that OpenAI's o3 can do a lot of things better than me, but it doesn't have a human's power to generalize. That's what we measure on, so we need to test humans. We actually tested 400 people on ARC-AGI-2. We got them in a room, we gave them computers, we did demographic screening, and then gave them the test. The average person scored 66 percent on ARC-AGI-2. Collectively, though, the aggregated responses of five to 10 people will contain the correct answers to all the questions on the ARC2. What makes this test hard for AI and relatively easy for humans? There are two things. Humans are incredibly sample-efficient with their learning, meaning they can look at a problem and with maybe one or two examples, they can pick up the mini skill or transformation and they can go and do it. The algorithm that's running in a human's head is orders of magnitude better and more efficient than what we're seeing with AI right now. What is the difference between ARC-AGI-1 and ARC-AGI-2? So ARC-AGI-1, François Chollet made that himself. It was about 1,000 tasks. That was in 2019. He basically did the minimum viable version in order to measure generalization, and it held for five years because deep learning couldn't touch it at all. It wasn't even getting close. Then reasoning models that came out in 2024, by OpenAI, started making progress on it, which showed a step-level change in what AI could do. Then, when we went to ARC-AGI-2, we went a little bit further down the rabbit hole in regard to what humans can do and AI cannot. It requires a little bit more planning for each task. So instead of getting solved within five seconds, humans may be able to do it in a minute or two. There are more complicated rules, and the grids are larger, so you have to be more precise with your answer, but it's the same concept, more or less.... We are now launching a developer preview for ARC-AGI-3, and that's completely departing from this format. The new format will actually be interactive. So think of it more as an agent benchmark. How will ARC-AGI-3 test agents differently compared with previous tests? If you think about everyday life, it's rare that we have a stateless decision. When I say stateless, I mean just a question and an answer. Right now all benchmarks are more or less stateless benchmarks. If you ask a language model a question, it gives you a single answer. There's a lot that you cannot test with a stateless benchmark. You cannot test planning. You cannot test exploration. You cannot test intuiting about your environment or the goals that come with that. So we're making 100 novel video games that we will use to test humans to make sure that humans can do them because that's the basis for our benchmark. And then we're going to drop AIs into these video games and see if they can understand this environment that they've never seen beforehand. To date, with our internal testing, we haven't had a single AI be able to beat even one level of one of the games. Can you describe the video games here? Each 'environment,' or video game, is a two-dimensional, pixel-based puzzle. These games are structured as distinct levels, each designed to teach a specific mini skill to the player (human or AI). To successfully complete a level, the player must demonstrate mastery of that skill by executing planned sequences of actions. How is using video games to test for AGI different from the ways that video games have previously been used to test AI systems? Video games have long been used as benchmarks in AI research, with Atari games being a popular example. But traditional video game benchmarks face several limitations. Popular games have extensive training data publicly available, lack standardized performance evaluation metrics and permit brute-force methods involving billions of simulations. Additionally, the developers building AI agents typically have prior knowledge of these games—unintentionally embedding their own insights into the solutions.


Hamilton Spectator
12-06-2025
- Business
- Hamilton Spectator
VERSES® 'Digital Brain' Featured in WIRED and Popular Mechanics
VANCOUVER, British Columbia, June 12, 2025 (GLOBE NEWSWIRE) — VERSES AI Inc. (CBOE: VERS; OTCQB: VRSSF) ('VERSES' or the 'Company') a cognitive computing company specializing in next-generation agentic software systems today announced important third-party recognition of its digital-brain architecture, AXIOM, following features in WIRED and Popular Mechanics and public acknowledgement from ARC-AGI benchmark creator François Chollet. WIRED: A 'very original' path to AGI In WIRED 's feature ' A Deep Learning Alternative Can Help AI Agents Gameplay the Real World ,' senior writer Will Knight describes AXIOM as 'a new machine-learning approach that draws inspiration from how the human brain models and learns about the world.' He adds that it 'offers an alternative to the artificial neural networks dominant in modern AI' and highlights its 'impressive efficiency' across multiple video-game environments. François Chollet—Keras inventor, TIME 100 AI honoree, and creator of the ARC-AGI benchmark—told WIRED : 'The general goals of the [VERSES] approach and some of its key features track with what I see as the most important problems to focus on to get to AGI… The work strikes me as very original… We need more people trying out new ideas away from the beaten path of large language models.' Chollet also posted on acknowledging that active inference—as demonstrated by AXIOM, where agents act to reduce uncertainty by aligning their internal world models with reality—is 'badly missing from the deep-learning era' and '100% correct' New Benchmarks For AGI - Gameworlds Chollet's well known benchmark for AGI known as ARC-AGI—which measures progress toward general intelligence—tests AI systems on spatial-reasoning tasks and is used by OpenAI, Google, Anthropic, and others as the industry's gold standard. ARC-AGI 3, the next installment of this benchmark, is expected to deploy 100+ novel game worlds to test a new set of capabilities. We believe that this reflects the AI community's move from static Q&A to interactive environments, where games serve as the medium to force agents to explore, form hypotheses, and spontaneously generalize. AXIOM's Active-Inference engine has already demonstrated these skills: it learns unfamiliar worlds, plans by minimizing uncertainty, and adapts in real time— using its cognitive architecture. On the Gameworld 10K benchmark, AXIOM outperformed Google DeepMind's DreamerV3 by up to 60%, used 99% less compute, and learned 39× faster as validated by Soothsayer Analytics, in June. Popular Mechanics: 'This breakthrough could redefine intelligence forever.' Popular Mechanics also published a feature article titled ' This AI Model Can Mimic Human Thought—And May Even Be Capable of Reading Your Mind ,' calling Genius—VERSES' product suite powered by AXIOM— 'a level up from existing AI' and noting that Genius agents run on watts instead of gigawatts and can operate from a laptop battery rather than the cloud. The article begins: 'AI is learning to think like us, bridging the worlds of biology and technology. This breakthrough could redefine intelligence forever.' 'AXIOM was built for interactive intelligence—exploring, planning, and learning in real time,' said VERSES CEO Gabriel René. 'Active Inference is designed to master new worlds faster, with far less compute and human-like adaptability—bringing us closer to truly human-level AI and, we believe, positioning VERSES as the market leader.' Notes to editors About VERSES VERSES® is a cognitive computing company building next-generation intelligent software systems modeled after the wisdom and genius of Nature. Designed around first principles found in science, physics and biology, our flagship product, Genius,™ is an agentic enterprise intelligence platform designed to generate reliable domain-specific predictions and decisions under uncertainty. Imagine a Smarter World that elevates human potential through technology inspired by Nature. Learn more at , LinkedIn and X . On behalf of the Company Gabriel René, Founder & CEO, VERSES AI Inc. Press Inquiries: press@ Investor Relations Inquiries James Christodoulou, Chief Financial Officer IR@ , +1(212)970-8889


Associated Press
12-06-2025
- Business
- Associated Press
VERSES® 'Digital Brain' Featured in WIRED and Popular Mechanics
VANCOUVER, British Columbia, June 12, 2025 (GLOBE NEWSWIRE) -- VERSES AI Inc. (CBOE: VERS; OTCQB: VRSSF) ('VERSES' or the 'Company') a cognitive computing company specializing in next-generation agentic software systems today announced important third-party recognition of its digital-brain architecture, AXIOM, following features in WIRED and Popular Mechanics and public acknowledgement from ARC-AGI benchmark creator François Chollet. WIRED: A 'very original' path to AGI In WIRED 's feature ' A Deep Learning Alternative Can Help AI Agents Gameplay the Real World,' senior writer Will Knight describes AXIOM as 'a new machine-learning approach that draws inspiration from how the human brain models and learns about the world.' He adds that it 'offers an alternative to the artificial neural networks dominant in modern AI' and highlights its 'impressive efficiency' across multiple video-game environments. François Chollet—Keras inventor, TIME 100 AI honoree, and creator of the ARC-AGI benchmark—told WIRED: 'The general goals of the [VERSES] approach and some of its key features track with what I see as the most important problems to focus on to get to AGI… The work strikes me as very original… We need more people trying out new ideas away from the beaten path of large language models.' Chollet also posted on acknowledging that active inference—as demonstrated by AXIOM, where agents act to reduce uncertainty by aligning their internal world models with reality—is 'badly missing from the deep-learning era' and '100% correct' New Benchmarks For AGI - Gameworlds Chollet's well known benchmark for AGI known as ARC-AGI—which measures progress toward general intelligence—tests AI systems on spatial-reasoning tasks and is used by OpenAI, Google, Anthropic, and others as the industry's gold standard. ARC-AGI 3, the next installment of this benchmark, is expected to deploy 100+ novel game worlds to test a new set of capabilities. We believe that this reflects the AI community's move from static Q&A to interactive environments, where games serve as the medium to force agents to explore, form hypotheses, and spontaneously generalize. AXIOM's Active-Inference engine has already demonstrated these skills: it learns unfamiliar worlds, plans by minimizing uncertainty, and adapts in real time— using its cognitive architecture. On the Gameworld 10K benchmark, AXIOM outperformed Google DeepMind's DreamerV3 by up to 60%, used 99% less compute, and learned 39× faster as validated by Soothsayer Analytics, in June. Popular Mechanics: 'This breakthrough could redefine intelligence forever.' Popular Mechanics also published a feature article titled ' This AI Model Can Mimic Human Thought—And May Even Be Capable of Reading Your Mind,' calling Genius—VERSES' product suite powered by AXIOM— 'a level up from existing AI' and noting that Genius agents run on watts instead of gigawatts and can operate from a laptop battery rather than the cloud. The article begins: 'AI is learning to think like us, bridging the worlds of biology and technology. This breakthrough could redefine intelligence forever.' 'AXIOM was built for interactive intelligence—exploring, planning, and learning in real time,' said VERSES CEO Gabriel René. 'Active Inference is designed to master new worlds faster, with far less compute and human-like adaptability—bringing us closer to truly human-level AI and, we believe, positioning VERSES as the market leader.' Notes to editors About VERSES VERSES® is a cognitive computing company building next-generation intelligent software systems modeled after the wisdom and genius of Nature. Designed around first principles found in science, physics and biology, our flagship product, Genius,™ is an agentic enterprise intelligence platform designed to generate reliable domain-specific predictions and decisions under uncertainty. Imagine a Smarter World that elevates human potential through technology inspired by Nature. Learn more at LinkedIn and X. On behalf of the Company Gabriel René, Founder & CEO, VERSES AI Inc. James Christodoulou, Chief Financial Officer [email protected], +1(212)970-8889
Yahoo
12-06-2025
- Business
- Yahoo
VERSES® 'Digital Brain' Featured in WIRED and Popular Mechanics
New AXIOM model captures mainstream headlines and industry acclaim VANCOUVER, British Columbia, June 12, 2025 (GLOBE NEWSWIRE) -- VERSES AI Inc. (CBOE: VERS; OTCQB: VRSSF) ('VERSES' or the 'Company') a cognitive computing company specializing in next-generation agentic software systems today announced important third-party recognition of its digital-brain architecture, AXIOM, following features in WIRED and Popular Mechanics and public acknowledgement from ARC-AGI benchmark creator François Chollet. WIRED: A 'very original' path to AGI In WIRED's feature 'A Deep Learning Alternative Can Help AI Agents Gameplay the Real World,' senior writer Will Knight describes AXIOM as 'a new machine-learning approach that draws inspiration from how the human brain models and learns about the world.' He adds that it 'offers an alternative to the artificial neural networks dominant in modern AI' and highlights its 'impressive efficiency' across multiple video-game environments. François Chollet—Keras inventor, TIME 100 AI honoree, and creator of the ARC-AGI benchmark—told WIRED: 'The general goals of the [VERSES] approach and some of its key features track with what I see as the most important problems to focus on to get to AGI… The work strikes me as very original… We need more people trying out new ideas away from the beaten path of large language models.' Chollet also posted on acknowledging that active inference—as demonstrated by AXIOM, where agents act to reduce uncertainty by aligning their internal world models with reality—is 'badly missing from the deep-learning era' and '100% correct' New Benchmarks For AGI - Gameworlds Chollet's well known benchmark for AGI known as ARC-AGI—which measures progress toward general intelligence—tests AI systems on spatial-reasoning tasks and is used by OpenAI, Google, Anthropic, and others as the industry's gold standard. ARC-AGI 3, the next installment of this benchmark, is expected to deploy 100+ novel game worlds to test a new set of capabilities. We believe that this reflects the AI community's move from static Q&A to interactive environments, where games serve as the medium to force agents to explore, form hypotheses, and spontaneously generalize. AXIOM's Active-Inference engine has already demonstrated these skills: it learns unfamiliar worlds, plans by minimizing uncertainty, and adapts in real time— using its cognitive architecture. On the Gameworld 10K benchmark, AXIOM outperformed Google DeepMind's DreamerV3 by up to 60%, used 99% less compute, and learned 39× faster as validated by Soothsayer Analytics, in June. Popular Mechanics: 'This breakthrough could redefine intelligence forever.' Popular Mechanics also published a feature article titled 'This AI Model Can Mimic Human Thought—And May Even Be Capable of Reading Your Mind,' calling Genius—VERSES' product suite powered by AXIOM—'a level up from existing AI' and noting that Genius agents run on watts instead of gigawatts and can operate from a laptop battery rather than the cloud. The article begins: 'AI is learning to think like us, bridging the worlds of biology and technology. This breakthrough could redefine intelligence forever.' 'AXIOM was built for interactive intelligence—exploring, planning, and learning in real time,' said VERSES CEO Gabriel René. 'Active Inference is designed to master new worlds faster, with far less compute and human-like adaptability—bringing us closer to truly human-level AI and, we believe, positioning VERSES as the market leader.' Notes to editors The Wired article can be found at: The Popular Mechanics article can be found at: Soothsayer Analytics have validated the Axiom model. Further details can be found at: About VERSES VERSES® is a cognitive computing company building next-generation intelligent software systems modeled after the wisdom and genius of Nature. Designed around first principles found in science, physics and biology, our flagship product, Genius,™ is an agentic enterprise intelligence platform designed to generate reliable domain-specific predictions and decisions under uncertainty. Imagine a Smarter World that elevates human potential through technology inspired by Nature. Learn more at LinkedIn and X. On behalf of the CompanyGabriel René, Founder & CEO, VERSES AI Inquiries: press@ Relations InquiriesJames Christodoulou, Chief Financial Officer IR@ +1(212)970-8889