Latest news with #Claude3.5


Forbes
5 days ago
- Business
- Forbes
AI Agents To Agentic AI: What's The Difference In The Automation Game
The generative AI boom, catalyzed by OpenAI's ChatGPT in late 2022, ushered in a new era of intelligent systems. But as businesses push beyond static language models, two paradigms have emerged in automation, central to the future of enterprise AI: AI Agents and Agentic AI. While both represent an evolution from generative systems, their operational scopes are redefining how organizations approach automation, decision-making, and AI transformation. As enterprise leaders seek to integrate next-gen AI into their workflows, understanding the distinctions between AI Agents and Agentic AI for automation—and their distinct strategic advantages—has now become an operational imperative. Traditional AI Agents are autonomous software systems that execute specific, goal-oriented tasks using tools like APIs and databases. They are typically built on top of large language models (LLMs) such as GPT-4 or Claude 3.5, and excel in domains like customer service, scheduling, internal search, and email prioritization. What differentiates AI Agents from generative AI is their tool-augmented intelligence—they don't just respond to prompts; they plan, act, and iterate based on user goals set up earlier in the process. Popular implementations include OpenAI's Operator or ClickUp Brain—agents that autonomously complete HR tasks, automate workflows, or even handle enterprise search across documentation platforms. According to recent benchmarks, AI Agents have reduced customer support ticket resolution time by over 40% and increased internal knowledge retrieval accuracy by 29%. These capabilities underscore their utility in modular, well-defined environments. However, as enterprises grow more complex, the need for multi-agent orchestration becomes paramount. Agentic AI represents an architectural leap beyond standalone agents. These systems are composed of multiple specialized agents—each performing distinct subtasks—coordinated by a central orchestrator or decentralized communication layer. Think of it as an intelligent ecosystem rather than a single-function intelligent tool. Agentic systems shine in high-complexity environments requiring goal decomposition, contextual memory, dynamic planning, and inter-agent negotiation. In applications like supply chain optimization, autonomous robotics, and research automation, they outperform single-agent systems by enabling concurrent execution, feedback loops, and strategic adaptability. Consider a real-world use case: a research lab using a multi-agent AutoGen pipeline to write grant proposals. One agent retrieves prior funded documents, another summarizes scientific literature, a third aligns objectives with funding requirements, and a fourth formats the proposal. Together, they produce drafts in hours, not weeks—reducing overhead and boosting approval rates. Agentic AI also introduces persistent memory, semantic coordination, and reflective reasoning—capabilities essential for adaptive learning and long-term task fulfillment. While promising, both AI Agents and Agentic AI face notable challenges. AI Agents struggle with hallucinations, brittleness in prompt design, and limited context retention. Agentic AI, on the other hand, contends with coordination failures, emergent unpredictability, and explainability concerns. While the challenges are prevalent for both automation approaches AI Agents and Agentic AI, emerging solutions are on the rise, and its only a matter of time before we work out the kinks and we live in a world run by agents. Although we're still very much in the infancy stages, AI continues its meteoric rise and the transition from reactive generative models to autonomous, orchestrated agentic systems marks a pivotal inflection point. AI Agents have already proven their value in task automation, but Agentic AI is redefining what's possible in strategic domains—from scientific research to logistics and healthcare. For business leaders, organizations that master this next frontier of intelligence and automation won't just become more efficient and productive—they have the chance to innovate, scale, and lead in ways never been seen before.


Express Tribune
06-03-2025
- Business
- Express Tribune
Chinese AI Agent Manus unveiled, first fully autonomous AI agent
Listen to article A Chinese technology team has unveiled Manus, the world's first AI Agent product, developed by The launch coincided with Apple's new product release, drawing significant interest from users seeking invitation codes. According to Manus is an autonomous AI agent designed to handle complex and dynamic tasks beyond conventional AI assistants. Unlike traditional AI tools that provide suggestions or answers, Manus delivers complete task results through independent execution. The system employs a multi-signature (multisig) approach powered by multiple independent models. Developers plan to open-source parts of the model, particularly the inference component, later this year. A four-minute demonstration showcased Manus autonomously executing tasks from planning to completion. In one example, the AI agent screened candidates for a reinforcement learning algorithm engineer position by manually reviewing and extracting key details from resumes. Manus has set a new state-of-the-art (SOTA) performance benchmark across all difficulty levels in the GAIA test, which assesses general AI assistant capabilities. The project is led by Xiao Hong, a software engineering graduate from Huazhong University of Science and Technology. Xiao previously founded Ye Ying Technology in 2015 and launched AI-powered assistant tools, securing investments from Tencent and ZhenFund. He later developed Monica, an AI assistant that integrates large models such as Claude 3.5 and DeepSeek, reaching over a million users in overseas markets. Manus follows a "less structure, more intelligence" approach, focusing on data quality, model power, and flexible architecture rather than predefined features. The release comes as major AI firms increasingly invest in AI agents. On March 6, OpenAI announced pricing for its doctor-level AI agents at $20,000 per month, targeting industries such as finance, healthcare, and manufacturing.
Yahoo
04-03-2025
- Yahoo
People are using Super Mario to benchmark AI now
Thought Pokémon was a tough benchmark for AI? One group of researchers argues that Super Mario Bros. is even tougher. Hao AI Lab, a research org at the University of California San Diego, on Friday threw AI into live Super Mario Bros. games. Anthropic's Claude 3.7 performed the best, followed by Claude 3.5. Google's Gemini 1.5 Pro and OpenAI's GPT-4o struggled. It wasn't quite the same version of Super Mario Bros. as the original 1985 release, to be clear. The game ran in an emulator and integrated with a framework, GamingAgent, to give the AIs control over Mario. GamingAgent, which Hao developed in-house, fed the AI basic instructions, like, "If an obstacle or enemy is near, move/jump left to dodge" and in-game screenshots. The AI then generated inputs in the form of Python code to control Mario. Still, Hao says that the game forced each model to "learn" to plan complex maneuvers and develop gameplay strategies. Interestingly, the lab found that reasoning models like OpenAI's o1, which "think" through problems step by step to arrive at solutions, performed worse than "non-reasoning" models, despite being generally stronger on most benchmarks. One of the main reasons reasoning models have trouble playing real-time games like this is that they take a while — seconds, usually — to decide on actions, according to the researchers. In Super Mario Bros., timing is everything. A second can mean the difference between a jump safely cleared and a plummet to your death. Games have been used to benchmark AI for decades. But some experts have questioned the wisdom of drawing connections between AI's gaming skills and technological advancement. Unlike the real world, games tend to be abstract and relatively simple, and they provide a theoretically infinite amount of data to train AI. The recent flashy gaming benchmarks point to what Andrej Karpathy, a research scientist and founding member at OpenAI, called an "evaluation crisis." "I don't really know what [AI] metrics to look at right now," he wrote in a post on X. "TLDR my reaction is I don't really know how good these models are right now." At least we can watch AI play Mario.
Yahoo
04-03-2025
- Yahoo
People are using Super Mario to benchmark AI now
Thought Pokémon was a tough benchmark for AI? One group of researchers argues that Super Mario Bros. is even tougher. Hao AI Lab, a research org at the University of California San Diego, on Friday threw AI into live Super Mario Bros. games. Anthropic's Claude 3.7 performed the best, followed by Claude 3.5. Google's Gemini 1.5 Pro and OpenAI's GPT-4o struggled. It wasn't quite the same version of Super Mario Bros. as the original 1985 release, to be clear. The game ran in an emulator and integrated with a framework, GamingAgent, to give the AIs control over Mario. GamingAgent, which Hao developed in-house, fed the AI basic instructions, like, "If an obstacle or enemy is near, move/jump left to dodge" and in-game screenshots. The AI then generated inputs in the form of Python code to control Mario. Still, Hao says that the game forced each model to "learn" to plan complex maneuvers and develop gameplay strategies. Interestingly, the lab found that reasoning models like OpenAI's o1, which "think" through problems step by step to arrive at solutions, performed worse than "non-reasoning" models, despite being generally stronger on most benchmarks. One of the main reasons reasoning models have trouble playing real-time games like this is that they take a while — seconds, usually — to decide on actions, according to the researchers. In Super Mario Bros., timing is everything. A second can mean the difference between a jump safely cleared and a plummet to your death. Games have been used to benchmark AI for decades. But some experts have questioned the wisdom of drawing connections between AI's gaming skills and technological advancement. Unlike the real world, games tend to be abstract and relatively simple, and they provide a theoretically infinite amount of data to train AI. The recent flashy gaming benchmarks point to what Andrej Karpathy, a research scientist and founding member at OpenAI, called an "evaluation crisis." "I don't really know what [AI] metrics to look at right now," he wrote in a post on X. "TLDR my reaction is I don't really know how good these models are right now." At least we can watch AI play Mario.
Yahoo
26-02-2025
- Business
- Yahoo
Anthropic's latest flagship AI might not have been incredibly costly to train
Anthropic's newest flagship AI model, Claude 3.7 Sonnet, cost "a few tens of millions of dollars" to train using less than 10^26 FLOPs of computing power. That's according to Wharton professor Ethan Mollick, who in an X post on Monday relayed a clarification he'd received from Anthropic's PR. "I was contacted by Anthropic who told me that Sonnet 3.7 would not be considered a 10^26 FLOP model and cost a few tens of millions of dollars," he wrote, "though future models will be much bigger." TechCrunch reached out to Anthropic for confirmation but hadn't received a response as of publication time. Assuming Claude 3.7 Sonnet indeed cost just "a few tens of millions of dollars" to train, not factoring in related expenses, it's a sign of how relatively cheap it's becoming to release state-of-the-art models. Claude 3.5, Sonnet's predecessor, released in fall 2024, similarly cost a few tens of millions of dollars to train, Anthropic CEO Dario Amodei revealed in a recent essay. Those totals compare pretty favorably to the training price tags of 2023's top models. To develop its GPT-4 model, OpenAI spent more than $100 million, according to OpenAI CEO Sam Altman. Meanwhile, Google spent close to $200 million to train its Gemini Ultra model, a Stanford study estimated. That being said, Amodei expects future AI models to cost billions of dollars. Certainly, training costs don't capture work like safety testing and fundamental research. Moreover, as the AI industry embraces "reasoning" models that work on problems for extended periods of time, the computing costs of running models will likely continue to rise. Sign in to access your portfolio