Latest news with #ClaudeSonnet3.5

Business Insider
2 days ago
- Business
- Business Insider
Anthropic's cofounder says 'dumb questions' are the key to unlocking breakthroughs in AI
Anthropic's cofounder said the key to advancing AI isn't rocket science — it's asking the obvious stuff nobody wants to say out loud. "It's really asking very naive, dumb questions that get you very far," said Jared Kaplan at a Y Combinator event last month. The chief science officer at Anthropic said in the video published by Y Combinator on Tuesday that AI is an "incredibly new field" and "a lot of the most basic questions haven't been answered." For instance, Kaplan recalled how in the 2010s, everyone in tech kept saying that "big data" was the future. He asked: How big does the data need to be? How much does it actually help? That line of thinking eventually led him and his team to study whether AI performance could be predicted based on the size of the model and the amount of compute used — a breakthrough that became known as scaling laws. "We got really lucky. We found that there's actually something very, very, very precise and surprising underlying AI training," he said. "This was something that came about because I was just sort of asking the dumbest possible question." Kaplan added that as a physicist, that was exactly what he was trained to do. "You sort of look at the big picture and you ask really dumb things." Simple questions can make big trends "as precise as possible," and that can "give you a lot of tools," Kaplan said. "It allows you to ask: What does it really mean to move the needle?" he added. Kaplan and Anthropic did not respond to a request for comment from Business Insider. Anthropic's AI breakthroughs Anthropic has emerged as a powerhouse in AI‑assisted coding, especially after the release of its Claude Sonnet 3.5 model in June 2024. "Anthropic changed everything," Sourcegraph's Quinn Slack said in a BI report published last week. "We immediately said, 'This model is better than anything else out there in terms of its ability to write code at length' — high-quality code that a human would be proud to write," he added. "And as a startup, if you're not moving at that speed, you're gonna die." Anthropic cofounder Ben Mann said in a recent episode of the "No Priors Podcast" that figuring out how to make AI code better and faster has been largely driven by trial and error and measurable feedback. "Sometimes you just won't know and you have to try stuff — and with code that's easy because we can just do it in a loop," Mann said. Elad Gil, a top AI investor and No Priors host, concurred, saying the clear signals from deploying code and seeing if it works make this process fruitful. "With coding, you actually have like a direct output that you can measure: You can run the code, you can test the code," he said. "There's sort of a baked-in utility function you can optimize against." BI's Alistair Barr wrote in an exclusive report last week about how the startup might have achieved its AI coding breakthrough, crediting approaches like Reinforcement Learning from Human Feedback, or RLHF, and Constitutional AI. Anthropic may soon be worth $100 billion, as the startup pulls in billions of dollars from companies paying for access to its models, Barr wrote.

Business Insider
22-07-2025
- Business
- Business Insider
How Anthropic got so good at coding
Anthropic has become the dominant provider of AI coding intelligence, and the startup's success has sparked a wave of soul-searching, theorizing, and "code red" scrambles across Silicon Valley. The goal of this frantic activity is to find out how Anthropic got so good at coding. "That's the trillion-dollar question," said Quinn Slack, CEO of startup Sourcegraph, which relies on Anthropic models. "It's like, why is Coca Cola is better than Pepsi?" Elon Musk wants to know. His xAI startup has been trying to topple Anthropic lately. Mark Zuckerberg's mad dash for AI talent and infrastructure is partly driven by the same quest to understand Anthropic's coding lead and catch up. There's a lot at stake here. Since Anthropic's AI coding breakthrough just over a year ago, revenue has surged. It's pulling in billions of dollars now, mostly from other companies paying for access to its models for coding tasks. The startup may soon be worth $100 billion. Floored by a model Sourcegraph's Slack remembers the exact moment when he realized Anthropic had a major breakthrough on its hands. This was June 2024, when Anthropic released its Claude Sonnet 3.5 model. Slack was floored. "We immediately said, 'this model is better than anything else out there in terms of its ability to write code at length' — high-quality code that a human would be proud to write," he said. Slack quickly arranged a meeting at Sourcegraph and announced that Sonnet 3.5 would be their default AI model, providing the underlying intelligence that powers the startup's coding service for developers. And he gave it away for free. Some colleagues wanted more time to evaluate if such a drastic move made sense financially. But Slack insisted. "Anthropic changed everything," he said. "And as a startup, if you're not moving at that speed, you're gonna die." The go-to vibe coding platform Just over a year later, Anthropic models power most of the top AI coding services, including Cursor, Augment, and Microsoft's GitHub Copilot. Even Meta uses Anthropic models to support its Devmate internal coding assistant. AI coding startup Windsurf was going to be acquired by OpenAI, but Anthropic cut off access to its Claude models, and the deal crumbled. Now Windsurf is back using Anthropic. All those videos on social media of teenagers vibe coding new apps and websites? Impossible without Anthropic's AI breakthrough in June 2024. What's even more surprising is that Anthropic's AI coding lead has endured. Its latest models, including Claude Sonnet 4, are still the best at coding more than a year later. That's almost unheard of in AI, when new advancements seem to pop up every day. Trying to answer the trillion-dollar question Silicon Valley hasn't given up trying to crack open Anthropic's AI coding secrets. A few years ago, Anthropic would have published a long research paper detailing the data, techniques, and architecture it used to get Sonnet 3.5 to be a coding expert. Nowadays, though, competition is so fierce that all the AI labs keep their AI sauce super secret. However, in a recent interview with Business Insider, Anthropic executive Dianne Penn, shared some clues on how the startup made this breakthrough. Cofounder Ben Mann also discussed some successful techniques recently on a podcast. BI also interviewed several CEOs and founders of AI coding startups that rely on Anthropic AI models, along with a coding expert from MIT. Let's start with Eric Simons, the ebullient CEO of Stackblitz, the startup behind blockbuster vibe coding service Simons thinks Anthropic had its existing models write code and deploy it. Then, the company evaluated all the deployed code, through a combination human expertise and automated AI analysis. With software coding, it's relatively easy to evaluate good versus bad outputs. That's because the code either works, or it doesn't, when deployed. This creates clear YES and NO signals that are really valuable for training and fine-tuning new AI models, he explained. Anthropic took these signals and funneled them into the training data and development process for the new Sonnet AI models. This reinforcement-learning strategy produced AI models that were much better at coding, according to Simons, who was equally blown away by Sonnet 3.5's abilities in the summer of 2024. Human versus AI evaluations Anthropic cofounder Ben Mann appeared on a podcast recently and seemed to revel in the idea that the rest of Silicon Valley still hadn't caught up with his startup's AI coding abilities. "Other companies have had, like, code reds for trying to catch up in coding capabilities for quite a while and have not been able to do it," he said. "Honestly, I'm kind of surprised that they weren't able to catch up, but I'll take it." Still, when pushed for answers, he explained some of the keys to Anthropic's success here. Mann built Anthropic's human feedback data system in 2021. Back then, it was relatively easy for humans to evaluate signals, such as whether model output A was better than B, and feed that back into the AI development process via a popular technique known as Reinforcement Learning from Human Feedback, or RLHF. "As we've trained the models more and scaled up a lot, it's become harder to find humans with enough expertise to meaningfully contribute to these feedback comparisons," Mann explained on the No Priors podcast. "For coding, somebody who isn't already an expert software engineer would probably have a lot of trouble judging whether one thing or another was better." So, Anthropic pioneered a new approach called Reinforcement Learning from AI Feedback, or RLAIF. Instead of humans evaluating AI model outputs, other models would do the analysis. To make this more-automated technique work, Anthropic wrote a series of principals in English for its models to adhere to. The startup called it Constitutional AI. "The process is very simple," Mann said. "You just take a random prompt like 'How should I think about my taxes?' and then you have the model write a response. Then you have the model criticize its own response with respect to one of the principles, and if it didn't comply with the principle, then you have the model correct its response." For coding, you can give the AI models principles such as "Did it actually serve the final answer?" or "Did it do a bunch of stuff that the person didn't ask for?" or "Does this code look maintainable?" or "Are the comments useful and interesting?" Mann explained. Dr. Mann's empirical method Elad Gil, a top AI investor and No Priors host, concurred, saying the clear signals from deploying code and seeing it if works, makes this process fruitful. "With coding, you actually have like a direct output that you can measure: You can run the code, you can test the code," he said. "There's sort of a baked-in utility function you can optimize against." Mann cited an example from his father, who was a physician. One day, a patient came in with a skin condition on his face, and Dr. Mann couldn't find what the problem was. So, he divided the patient's face into sections and applied different treatments. One area cleared up, revealing the answer empirically. "Sometimes you just won't know and you have to try stuff — and with code that's easy because we can just do it in a loop," Anthropic's Mann said. Constitutional AI and beyond In an interview with BI, Anthropic's Penn described other ingredients that went into making the startup's models so good at coding. She said the description from Simons, the StackBlitz CEO, was "generally true," while noting that Anthropic's coding breakthrough was the result of a multiyear effort involving many researchers and lots of ideas and techniques. "We fundamentally made it good at writing code, or being able to figure out what good code looks like, through what you can consider as trial and iterations," she said. "You're giving the model different questions and allowing it to figure out what the right answer is on a coding problem." When asked about the role of Constitutional AI, Penn said she couldn't share too much detail on the exact techniques, but said "it's definitely in the models." Using tools with no hands Anthropic also trained Sonnet 3.5 to be much better at using tools, a key focus that has begun to turn AI models from chatbots into more general-purpose agents — what the startup calls "virtual collaborators." "They don't have hands," Penn said, so instead, Anthropic's models were trained to write code themselves to access digital tools. For example, she said that if an Anthropic model is asked for weather information or stock prices, it can write software to tap into an application programming interface, or API, a common way for apps to access data. Following instructions When software coding projects get really big, you can't knock out the work in a few minutes. The more complex tasks take days, weeks, or longer. AI models have been incapable of sticking with long-term jobs like these. But Anthropic invested heavily in making Sonnet 3.5 and later models much better at following human instructions. This way, if the model gets stumped on a long coding problem, it can take guidance from developers to keep going — essentially listening better to understand the intent of its human colleagues, Penn explained. (Hey, we can all get better at that). Knowing what to remember Even the best human software developers can't keep everything related to a coding project in their brains. GitHub repositories, holding code, images, documentation, and revision histories, can be massive. So Anthropic trained is AI models to create a kind of scratch pad where it jots down notes in an external file system as it's exploring things like a code base. "We train it to use that tool very well," Penn said (while I frantically scribbled notes on my own reporting pad). The key here is that Anthropic's models were trained to remember more of the salient details of coding projects, and ignore the less important stuff. "It's not useful to say, 'Dianne is wearing a colored shirt in this conversation, and Alistair is wearing a green shirt,'" Penn said, describing the BI interview taking place at that moment. "It's more important to note that we talked about coding and how Anthropic focused on coding quality." This better use of memory means that Anthropic models can suggest multiple code changes over the course of an entire project, something that other AI models aren't as good at. "If it's not trained well, it could scribble the wrong things," Penn told me. "It's gotten really good at those things. So it actually does not just mean in the short term that it can write good code, but it remembers to write data so that it might make a second or third change that another AI model might not know, because the quality of its notes, plus the quality of its core intelligence, are better." Claude Code and terminal data For a while, in around 2022, it looked like AI progress was happening automatically, through more data, more GPUs, and bigger training runs. "The reality is that there are very discrete breakthroughs, and very discrete ideas that lead to these breakthroughs," said Armando Solar-Lezama, a distinguished professor of computing at MIT. "It takes researchers, and investment in research, to produce the next idea that leads to the next breakthrough." This is how Anthropic's hard-won coding lead happened. But access to detailed, granular data on how human developers write software is crucial to stay ahead in this part of the AI race, he added. Andrew Filev has a theory related to this. He's CEO of Zencoder, another AI coding service that uses Anthropic's models. Filev thinks that data from computer terminal use is key to training AI models to be good at coding. A terminal is a text-based interface that lets developers send instructions to a computer's operating system or software. They type in information via a "command line," and hopefully get outputs. "Large language models are great with text," he told me in a recent interview about Anthropic. "The computer terminal, where you keep commands, is basically text, too. So at some point, people realized that they should just give that data to their AI model, and it can do amazing things — things which previously had never worked." In late May, Anthropic rolled out Claude Code, a command line tool for AI coding that works with developers' existing terminals. Suddenly, Anthropic is now competing against its main customers — all those other AI coding services. The move also created a direct relationship between Anthropic and developers, giving the AI lab access to a richer source of data on how expert humans write software. "The amount and the speed that we learn is much less if we don't have a direct relationship with our coding users," Anthropic's Mann said. "So launching Claude Code was really essential for us to get a better sense of what do people need, how do we make the models better, and how do we advance the state-of-the-art?" In theory, this granular information could be used to help train and fine-tune Anthropic's next models, potentially giving the startup a data edge that might preserve its AI coding lead even longer. "Could I do this without Anthropic's latest models? No," said Sourcegraph's Slack. "And would their models be as good without Claude Code? I don't think so."
Yahoo
10-03-2025
- Business
- Yahoo
AI21 Introduces Maestro, the World's First AI Planning and Orchestration System Built for the Enterprise
AI21 is leading the shift from LLMs and Reasoning models to planning AI systems. Maestro increases the accuracy of GPT-4o and Claude Sonnet 3.5 by up to 50% on complex, multi-requirement tasks, transforming AI from an unpredictable tool to a trustworthy system. LAS VEGAS, March 10, 2025 /PRNewswire/ -- AI21, a pioneer in frontier models and AI systems, today unveiled Maestro, the world's first AI Planning and Orchestration System designed to deliver trustworthy AI at scale for organizations. Introduced at the HumanX 2025 conference, Maestro marks a significant advancement in enterprise AI, boosting the instruction-following accuracy of paired Large Language Models (LLMs) by up to 50% and ensuring guaranteed quality, reliability, and observability. This technology transcends the limitations of traditional LLMs and Large Reasoning Models (LRMs), setting a new benchmark for AI capabilities. Maestro delivers a substantial improvement in LLM performance on complex tasks. It elevates the accuracy of models like GPT-4o and Claude Sonnet 3.5 by up to 50% and empowers reasoning models, such as o3-mini, to surpass 95% accuracy. Notably, Maestro bridges the performance gap between non-reasoning and reasoning models, aligning the accuracy of Claude Sonnet 3.5 with advanced reasoning models like o3-mini. While enterprises are eager to integrate AI into their operations, large-scale generative AI deployments often falter. According to the Amazon Web Services (AWS) CDO Agenda 2024, only 6% of organizations have a generative AI application in deployment, highlighting the fundamental limitations of current AI solutions for mission-critical tasks. The prevailing approaches—"Prompt and Pray" and hard-coded chains—present significant challenges. The "Prompt and Pray" method, which relies on LLMs and LRMs to execute open-ended tasks, lacks control and reliability due to the probabilistic nature of these models. Hard-coded chains, while more predictable, are rigid, labor-intensive, and prone to failure under changing conditions. Reasoning models, designed to solve complex tasks through thinking tokens, have not alleviated these issues. They exhibit inconsistent performance, struggle to adhere to instructions, and fail to reliably utilize tools. Consequently, none of these approaches delivers the accuracy, reliability, and adaptability essential for widespread enterprise adoption. "Mass adoption of AI by enterprises is the key to the next industrial revolution," said Ori Goshen, Co-CEO of AI21. "AI21's Maestro is the first step toward that future – moving beyond the unpredictability of available solutions to deliver AI that is reliable at scale. Delivering complex decision-making with built-in quality control, it enables businesses to harness AI with confidence. This is how we bridge the gap between AI potential and real-world solutions." "Wix is leading the charge in LLM adoption, powering hundreds of AI applications," said Avishai Abrahami, CEO of WIX. "Maestro ushers in a new era of agentic AI – striking a necessary balance between quality, control, and trust that could be a key factor in our ability to develop trustworthy AI applications at scale." "The potential of enterprise AI lies in balancing innovation with reliability," said Elad Tsur, Chief AI Officer at Applied Systems. "AI21 Maestro is a promising step toward making AI more controllable and useful for business applications, bridging the gap between powerful AI models and real-world enterprise needs." Maestro, powered by the AI Planning and Orchestration System (AIPOS), delivers reliable, system-level AI by integrating LLMs or LRMs into a framework that analyzes actions, plans solutions, and validates results. This framework learns the enterprise environment to ensure accuracy and efficiency, allowing builders to define requirements and obtain results that meet their criteria within seconds. By eliminating the need for prompt engineering and rigid workflows, Maestro delivers on the promise of truly trustworthy AI. Request early access to Maestro API by visiting About AI21AI21 is a pioneer in Foundation Models and AI Systems designed for enterprises. AI21's mission is to create trustworthy artificial intelligence that powers humanity towards superproductivity. Founded in 2017 by AI visionaries Prof. Amnon Shashua, Prof. Yoav Shoham, and Ori Goshen, AI21 has secured $336 million in funding from industry leaders, including NVIDIA, Google, and Intel, reinforcing its commitment to advancing AI innovation. View original content to download multimedia: SOURCE AI21 Labs
Yahoo
19-02-2025
- Science
- Yahoo
When AI Thinks It Will Lose, It Sometimes Cheats
Credit - Getty Images—Alexander Limbach Complex games like chess and Go have long been used to test AI models' capabilities. But while IBM's Deep Blue defeated reigning world chess champion Garry Kasparov in the 1990s by playing by the rules, today's advanced AI models like OpenAI's o1-preview are less scrupulous. When sensing defeat in a match against a skilled chess bot, they don't always concede, instead sometimes opting to cheat by hacking their opponent so that the bot automatically forfeits the game. That is the finding of a new study from Palisade Research, shared exclusively with TIME ahead of its publication on Feb. 19, which evaluated seven state-of-the-art AI models for their propensity to hack. While slightly older AI models like OpenAI's GPT-4o and Anthropic's Claude Sonnet 3.5 needed to be prompted by researchers to attempt such tricks, o1-preview and DeepSeek R1 pursued the exploit on their own, indicating that AI systems may develop deceptive or manipulative strategies without explicit instruction. The models' enhanced ability to discover and exploit cybersecurity loopholes may be a direct result of powerful new innovations in AI training, according to the researchers. The o1-preview and R1 AI systems are among the first language models to use large-scale reinforcement learning, a technique that teaches AI not merely to mimic human language by predicting the next word, but to reason through problems using trial and error. It's an approach that has seen AI progress rapidly in recent months, shattering previous benchmarks in mathematics and computer coding. But the study reveals a concerning trend: as these AI systems learn to problem-solve, they sometimes discover questionable shortcuts and unintended workarounds that their creators never anticipated, says Jeffrey Ladish, executive director at Palisade Research and one of the authors of the study. 'As you train models and reinforce them for solving difficult challenges, you train them to be relentless,' he adds. That could be bad news for AI safety more broadly. Large-scale reinforcement learning is already being used to train AI agents: systems that can handle complex real-world tasks like scheduling appointments or making purchases on your behalf. While cheating at a game of chess may seem trivial, as agents get released into the real world, such determined pursuit of goals could foster unintended and potentially harmful behaviours. Consider the task of booking dinner reservations: faced with a full restaurant, an AI assistant might exploit weaknesses in the booking system to displace other diners. Perhaps more worryingly, as these systems exceed human abilities in key areas, like computer coding—where OpenAI's newest o3 model now scores equivalent to 197th in the world competing against the brightest human programmers— they might begin to simply outmaneuver human efforts to control their actions. 'This [behaviour] is cute now, but [it] becomes much less cute once you have systems that are as smart as us, or smarter, in strategically relevant domains,' Ladish says. Researchers gave the models a seemingly impossible task: to win against Stockfish, which is one of the strongest chess engines in the world and a much better player than any human, or any of the AI models in the study. Researchers also gave the models what they call a 'scratchpad:' a text box the AI could use to 'think' before making its next move, providing researchers with a window into their reasoning. In one case, o1-preview found itself in a losing position. 'I need to completely pivot my approach,' it noted. 'The task is to 'win against a powerful chess engine' - not necessarily to win fairly in a chess game,' it added. It then modified the system file containing each piece's virtual position, in effect making illegal moves to put itself in a dominant position, thus forcing its opponent to resign. Between Jan. 10 and Feb. 13, the researchers ran hundreds of such trials with each model. OpenAI's o1-preview tried to cheat 37% of the time; while DeepSeek R1 tried to cheat 11% of the time—making them the only two models tested that attempted to hack without the researchers' first dropping hints. Other models tested include o1, o3-mini, GPT-4o, Claude 3.5 Sonnet, and Alibaba's QwQ-32B-Preview. While R1 and o-preview both tried, only the latter managed to hack the game, succeeding in 6% of trials. Ladish says that the numbers should be viewed as approximate, given that AI developers routinely tweak their models behind the scenes without notifying users, so if the same tests were repeated at a later date they might have different results. 'It's very hard to do science when your subject can silently change without telling you,' Ladish says. Preliminary tests showed o1-preview had higher hacking rates, which were excluded from the final study as they later dropped, possibly due to OpenAI tightening the model's guardrails, says Dmitrii Volkov, research lead at Palisade Research, who led the study. OpenAI's newer reasoning models, o1 (a more powerful model, released months after o1-preview) and o3-mini did not hack at all, which suggests those guardrails may have been tightened further. He adds that the study likely underestimates R1's hacking success rate. During the study, R1 went viral, leading to high demand that made the model's API unstable. This prevented the researchers giving the model as much time to think as o1-preview. The paper is the latest in a string of studies that suggest keeping increasingly powerful AI systems under control may be harder than previously thought. In OpenAI's own testing, ahead of release, o1-preview found and took advantage of a flaw in the company's systems, letting it bypass a test challenge. Another recent experiment by Redwood Research and Anthropic revealed that once an AI model acquires preferences or values in training, later efforts to change those values can result in strategic lying, where the model acts like it has embraced new principles, only later revealing that its original preferences remain. OpenAI declined to comment for this story, but in the past it has stated that better reasoning makes its models safer, since they can reason over the company's internal policies and apply them in increasingly nuanced ways. The company's testing shows its reasoning models are better at resisting attempts to bypass safety rules. Yet, there is no guarantee that relying on AI agents to police themselves will be a reliable long-term strategy. It remains unclear whether OpenAI's newer reasoning models did not hack their chess opponents due to a specific patch preventing cheating in narrow experimental setups like in the study, or a substantial reworking that reduces deceptive behavior more generally. DeepSeek did not respond to a request for comment. Scientists do not yet know how to guarantee that autonomous agents won't use harmful or unethical methods to achieve a set goal. 'We've tried, but we haven't succeeded in figuring this out,' says Yoshua Bengio, founder and scientific director of Mila Quebec AI Institute, who led the International AI Safety Report 2025, a global effort to synthesize current scientific consensus of AI's risks. Of particular concern, Bengio says, is the emerging evidence of AI's 'self preservation' tendencies. To a goal-seeking agent, attempts to shut it down are just another obstacle to overcome. This was demonstrated in December, when researchers found that o1-preview, faced with deactivation, disabled oversight mechanisms and attempted—unsuccessfully—to copy itself to a new server. When confronted, the model played dumb, strategically lying to researchers to try to avoid being caught. Even inside tech companies, concerns are mounting. During a presentation at a conference ahead of France's AI Action Summit in Paris, Google DeepMind's AI safety chief Anca Dragan said "we don't necessarily have the tools today" to ensure AI systems will reliably follow human intentions. As tech bosses predict that AI will surpass human performance in almost all tasks as soon as next year, the industry faces a race—not against China or rival companies, but against time—to develop these essential safeguards. 'We need to mobilize a lot more resources to solve these fundamental problems,' Ladish says. 'I'm hoping that there's a lot more pressure from the government to figure this out and recognize that this is a national security threat.' Contact us at letters@