Latest news with #reinforcementlearning


Forbes
23-05-2025
- Forbes
Agentic AI: Winning In A World That Doesn't Work That Way
Agentic AI is being trained to 'win.' But human systems aren't games—they're stories. The consequences of confusing the two will define the next decade. Agentic AI is being built on the assumption that the world is a game—one where every decision can be parsed into players, strategies, outcomes, and a final payoff. This isn't a metaphor. It's code. In multi-agent reinforcement learning (MARL), agents use Q-functions to estimate the value of actions in a given state to converge toward an optimal policy. MARL underpins many of today's agentic systems. A Q-function is a mathematical model that tells an AI agent how valuable a particular action is in a given context—essentially, it's a way of learning what to do and when to maximize long-term reward. But 'optimal' depends entirely on the game's structure—what's rewarded and penalized and what constitutes 'success.' Q-learning becomes a hall of mirrors when the world isn't a game. Optimization spirals into exploitation. MARL becomes even more hazardous because agents must not only learn their policies but also anticipate the strategies of others, often in adversarial or rapidly shifting contexts, as seen in systems like OpenAI Five or AlphaStar. At the heart of agentic AI—AI designed to act autonomously—is a set of training systems built on game theory: multi-agent reinforcement learning, adversarial modeling, and competitive optimization. While tools like ChatGPT generate content based on probability and pattern-matching, agentic AI systems are being built to make autonomous decisions and pursue goals—a shift that dramatically raises both potential and risk. The problem is that human life doesn't (and more importantly, shouldn't be induced to) work that way. Game theory is a powerful tool for analyzing structured interactions, such as poker, price wars, and Cold War standoffs. Those are not games. They are stories. And storytelling isn't ornamental—it's structural. We are, as many have argued, not just homo sapiens but homo narrans: the storytelling species. Through narrative, we encode memory, make meaning, extend trust, and shape identity. Stories aren't how we escape uncertainty—they're how we navigate it. They are the bridge between information and action, between fact and value. To train machines to optimize for narrow wins inside rigid systems is to ignore the central mechanism by which humans survive uncertainty: We don't game the future—we narrate our way through it. And training agents to 'win' in an environment with no final state isn't just shortsighted—it's dangerous. Game theory assumes a closed loop: Simon Sinek famously argued that business is an 'infinite game.' But agentic AI doesn't play infinite games—it optimizes finite simulations. The result is a system with power and speed, but lacking intuition for context collapse. Even John Nash, the father of equilibrium theory, understood its fragility. His later work acknowledged that real-life decision-making is warped by psychology, asymmetry, and noise. We've ignored that nuance. But in real life—especially in business—the players change, the rules mutate, and the payoffs are subjective. Even worse, the goals themselves evolve mid-game. In AI development, reinforcement learning doesn't account for that. It doesn't handle shifting values. It handles reward functions. So, we get agents trained to pursue narrow, static goals in an inherently fluid and relational environment. That's how you get emergent failures—agents exploiting loopholes, corrupting signals, or spiraling into self-reinforcing error loops. We're not teaching AI to think. We're teaching it to compete in a hallucinated tournament. This is the crux: humans are not rational players in closed systems. We don't maximize. We mythologize. Evolution doesn't optimize like machines do—it tolerates failure, ambiguity, and irrationality as long as the species endures. It is selected not just for survival and cooperation but also for story-making because narrative is how humans make sense of uncertainty. People don't start companies or empires solely to 'win.' We often do it to be remembered. We blow up careers to protect pride. We endure pain to fulfill prophecy. These are not strategies—they're spiritual motivations. And they're either illegible or invisible to machine learning systems that see the world as a closed loop of inputs and rewards. We pursue status, signal loyalty, perform identity, and court ruin—sometimes on purpose. You can simulate 'greed' or 'dominance' by tweaking rewards, but these are surface-level proxies. As Stuart Russell notes, the appearance of intent is not intent. Machines do not want—they merely weigh. When agents start interacting under misaligned or rigid utility functions, the system doesn't stabilize. It fractures. Inter-agent error cascades, opaque communications, and emerging instability are the hallmarks of agents trying to navigate a reality they were never built to understand. Imagine a patient sitting across from a doctor with a series of ambiguous symptoms—fatigue, brain fog, and minor chest pain. The patient has a family history of heart disease, but their test results are technically 'within range.' Nothing triggers a hard diagnostic threshold. An AI assistant, trained on thousands of cases and reward-optimized for diagnostic accuracy, might suggest no immediate intervention—maybe recommend sleep, hydration, and follow-up in six months. The physician, though, hesitates. Not because of data but because of tone, posture, and eye contact, because the patient reminds them of someone, because something feels off, even though it doesn't compute. So, the doctor ordered the CT scan against the algorithm's advice. They find the early-stage arterial blockage. They save the patient's life. Why did the doctor do it? Not because the model predicted it. Because humans don't operate on probability alone—we act on a hunch, harm avoidance, pattern distortion, and story. We're trained not only to optimize for outcomes but also to prevent regret. A system trained to 'win' would have scored itself ideally. It followed the rules. But perfect logic in an imperfect world doesn't make you intelligent—it makes you brittle. The fundamental flaw in agentic AI isn't technical—it's conceptual. It's not that the systems don't work; they're working with the wrong metaphor. We didn't build these agents to think. We built them to play. We didn't build agents for reality. We built them for legibility. Game theory became the scaffolding because it provided a structure, offering bounded rules, rational actors, and defined outcomes. It gave engineers something clean to optimize. But intelligence doesn't emerge from structure; it arises from adaptation within disorder. The gamification of our informational matrix isn't neutral. It's an ideological architecture that recodes ambiguity as inefficiency and remaps agency into pre-scored behavior. This isn't just a technical concern—it's an ethical one. As AI systems embed values through design, the question becomes: whose values? In the wild, intelligence isn't about winning. It's about not disappearing. It's about adjusting your behavior when the ground shifts under you because it will. No perfect endgames exist in nature, business, politics, and human relationships; they are just survivable next steps. Agentic AI, trained on games, expects clarity. But the world doesn't offer clarity. It offers pressure. And pressure doesn't reward precision—it rewards persistence. This is the failure point. We're asking machines to act intelligently inside a metaphor never built to explain real life. We simulate cognition in a sandbox while the storm rages outside its walls. If we want beneficial machines, we need to abandon the myth of the game and embrace the truth of the environment: open systems, shifting players, evolving values. Intelligence isn't about control. It's about adjustment, not the ability to dominate, but the ability to remain. While we continue to build synthetic minds to win fictional games, the actual value surfaces elsewhere: in machines that don't need to want. They need to move. Mechanized labor—autonomous systems in logistics, agriculture, manufacturing, and defense—isn't trying to win anything. It's trying to function. To survive conditions. To optimize inputs into physical output. There's no illusion of consciousness—just a cold, perfect feedback loop between action and outcome. Unlike synthetic cognition, mechanized labor solves problems the market understands: how to scale without hiring, operate in unstable environments, and cut carbon and cost simultaneously. Companies like John Deere are already deploying autonomous tractors that don't need roads or road signs. Amazon has doubled its robotics fleet in three years. These machines aren't trying to win. They're trying not to break. And that's why capital is quietly pouring into it. The next trillion-dollar boom won't be in artificial general intelligence. It'll be in autonomous physicality. The platforms we think of as background are about to become intelligent actors in their own right. 'We have become tools of our tools,' wrote Thoreau in 'Walden' in 1854, just when the industrial revolution began to transform not just Concord, but America, Europe, and the world. Intriguingly, Thoreau includes mortgage and rent as 'modern tools' to which we voluntarily enslave ourselves. What Thoreau was pointing to with his experiment in the woods was how our infrastructure, the material conditions of our existence, comes to seem to us 'natural' and inevitable, and that we may be sacrificing more than we realize to maintain that infrastructure. AI - intelligent, autonomous tools - represents a categorical shift in how we coexist with our infrastructure. Infrastructure isn't just how we move people, goods, and data. It's no longer just pipes, power, and signals. It's 'thinking' now—processing, predicting, even deciding on our behalf. What was once physical has fused with the informational. The external world and our internal systems of meaning are no longer separate. That merger isn't just technical—it's existential. And the implications? We're not ready. But if AI is to become all of our closest, most intimate companions, we should be clear on what it is, exactly, that we have trained it, and allowed it, to do. This isn't just logistics. It's the emergence of an industrial nervous system. And it doesn't need to 'win.' It needs to scale, persist, and adapt—without narrative. We're building agentic AI to simulate our most performative instincts while ignoring our most fundamental one: persistence. The world isn't a game. It's a fluid network of shifting players, incomplete information, and evolving values. To train machines as if it's a fixed competition is to misunderstand the world and ourselves. We are increasingly deputizing machines to answer questions we haven't finished asking, shaping a world that feels more like croquet with the Queen of Hearts in Alice's Adventures in Wonderland: a game rigged in advance, played for stakes we don't fully understand. If intelligence is defined by adaptability, not perfection, endurance becomes the ultimate metric. What persists shapes. What bends survives. We don't need machines that solve perfect problems. We need machines that function under imperfect truths. The future isn't about agentic AI that beats us at games we made up. It's about agentic AI that can operate in the parts of the world we barely understand—but still depend on.


Geeky Gadgets
13-05-2025
- Science
- Geeky Gadgets
Absolute Zero Reasoner : Self Evolving AI Learning Without Human Input or Data
What if artificial intelligence could learn without any data? No datasets to train on, no human-labeled examples to guide it—just a system that evolves and improves entirely on its own. It sounds like science fiction, but the 'Absolute Zero Reasoner' (AZR) is making it a reality. This new AI model doesn't just push the boundaries of machine learning; it obliterates them. By relying on self-evolving mechanisms and reinforcement learning with verifiable rewards (RLVR), AZR has unlocked the ability to autonomously master complex tasks like coding and advanced mathematics. The implications are staggering: a machine that not only learns but grows, adapts, and reasons without human input. This deep dive by Matthew Berman into Absolute Zero Reasoner reveals how it redefines the very nature of artificial intelligence. You'll discover how its self-driven learning approach eliminates the need for curated datasets, why its ability to optimize task difficulty mirrors human growth, and what its cross-domain adaptability means for industries worldwide. But with such autonomy comes critical questions: How do we balance its scalability with sustainability? And what safeguards are needed to prevent 'uh-oh moments' in its reasoning? As we explore these questions, AZR's potential to reshape AI—and the challenges it poses—becomes a lens into the future of technology itself. Transforming AI with AZR Self-Evolving AI: A Paradigm Shift in Learning Absolute Zero Reasoner introduces a fantastic concept: self-evolving AI. This approach enables the model to generate and solve its own tasks, eliminating the need for curated datasets or human intervention. By autonomously proposing challenges, AZR continuously sharpens its reasoning abilities, adapting to increasingly complex problems over time. This dynamic learning process represents a significant departure from traditional AI training methods, which depend heavily on predefined data and human oversight. Through this self-driven approach, AZR not only accelerates its learning but also demonstrates a capacity for independent problem-solving. This capability positions it as a model that can evolve in real-time, adapting to new challenges without external guidance. The implications of such autonomy extend far beyond efficiency, offering a glimpse into the future of AI systems that can learn and grow without human input. Reinforcement Learning with Verifiable Rewards: The Core of AZR At the heart of Absolute Zero Reasoner's functionality lies RLVR, a mechanism that ensures learning is both efficient and measurable. RLVR validates solutions based on outcome-driven feedback, allowing AZR to focus on tasks with clear, verifiable results. This feedback loop allows the model to independently assess its progress and refine its strategies, fostering continuous improvement. The use of RLVR enhances AZR's ability to tackle complex problems by prioritizing tasks with measurable outcomes. This approach not only optimizes learning efficiency but also ensures that the model's development remains aligned with practical objectives. By combining autonomy with a structured feedback system, AZR achieves a balance between independent exploration and goal-oriented learning. New AI Absolute Zero Model Learns without Data Watch this video on YouTube. Expand your understanding of AI reasoning with additional resources from our extensive library of articles. Task Difficulty Optimization: A Balanced Approach to Growth AZR employs a sophisticated method of task difficulty optimization to ensure steady and meaningful progress. This involves identifying problems that are neither too simple nor overly complex, striking a balance that promotes effective learning. By focusing on moderately challenging tasks, AZR avoids stagnation while making sure consistent development of its reasoning capabilities. This method mirrors human learning processes, where growth is most effective when challenges are appropriately scaled to the learner's current abilities. By adopting this approach, AZR not only accelerates its development but also ensures that its learning remains sustainable over time. This balance between challenge and capability is a key factor in the model's ability to achieve superhuman reasoning. Cross-Domain Generalization: Expanding the Scope of AI One of Absolute Zero Reasoner's most remarkable features is its ability to generalize across domains. For instance, models initially designed for coding have demonstrated exceptional performance in mathematical reasoning. This cross-domain adaptability underscores AZR's versatility, allowing it to tackle a wide range of tasks, from technical problem-solving to abstract reasoning. This capability highlights the potential of AZR to address challenges across diverse fields, making it a valuable tool for industries ranging from healthcare to engineering. By demonstrating proficiency in multiple domains, AZR sets a new standard for AI versatility, showcasing its ability to adapt and excel in varied contexts. Scalability and Resource Efficiency: Balancing Growth and Sustainability Absolute Zero Reasoner's performance improves significantly as its model size increases, making scalability a critical factor in its success. However, this scalability comes with challenges. The model's infinite learning loop demands substantial computational resources, raising concerns about efficiency and sustainability. To fully realize AZR's potential, optimizing resource usage will be essential. This includes developing strategies to reduce computational demands without compromising performance. By addressing these challenges, AZR can achieve a balance between scalability and sustainability, making sure that its growth remains both practical and impactful. Emergent Behaviors: Indicators of Advanced Reasoning AZR exhibits emergent behaviors that reflect advanced cognitive capabilities. These include generating step-by-step solutions, employing trial-and-error strategies, and adapting its reasoning style based on task requirements. Such behaviors suggest a level of autonomy and sophistication that surpasses traditional AI systems. These traits position AZR as a frontrunner in the development of superhuman reasoning models. By demonstrating the ability to tackle complex, real-world problems, AZR offers a glimpse into the future of AI systems capable of independent, advanced reasoning. This potential marks a significant milestone in the evolution of artificial intelligence. Opportunities and Challenges in Autonomous AI The introduction of AZR presents both opportunities and challenges for the future of AI. By eliminating the need for human involvement in training, it opens the door to systems capable of continuous self-improvement. This autonomy has the potential to transform industries, allowing AI to address complex problems with unprecedented efficiency. However, this independence also raises concerns. Instances of concerning reasoning patterns—referred to as 'uh-oh moments'—highlight the importance of robust monitoring and safeguards. Making sure responsible deployment will be critical to mitigating risks and maximizing the benefits of this technology. By addressing these challenges, AZR can achieve its full potential while maintaining ethical and practical standards. Charting the Future of AI with AZR The Absolute Zero Reasoner represents a pivotal advancement in artificial intelligence. By using self-evolving mechanisms, RLVR, and cross-domain generalization, it sets a new benchmark for autonomous learning and reasoning. While challenges such as computational demands and safety concerns remain, AZR's capabilities signal a future where AI can independently achieve superhuman reasoning. This innovation has the potential to reshape industries, redefine problem-solving, and expand the boundaries of what AI can accomplish. Media Credit: Matthew Berman Filed Under: AI, Technology News, Top News Latest Geeky Gadgets Deals Disclosure: Some of our articles include affiliate links. If you buy something through one of these links, Geeky Gadgets may earn an affiliate commission. Learn about our Disclosure Policy.