How Anthropic got so good at coding

5 days ago

Anthropic has become the dominant provider of AI coding intelligence, and the startup's success has sparked a wave of soul-searching, theorizing, and "code red" scrambles across Silicon Valley.
The goal of this frantic activity is to find out how Anthropic got so good at coding.
"That's the trillion-dollar question," said Quinn Slack, CEO of startup Sourcegraph, which relies on Anthropic models. "It's like, why is Coca Cola is better than Pepsi?"
Elon Musk wants to know. His xAI startup has been trying to topple Anthropic lately. Mark Zuckerberg's mad dash for AI talent and infrastructure is partly driven by the same quest to understand Anthropic's coding lead and catch up.
There's a lot at stake here. Since Anthropic's AI coding breakthrough just over a year ago, revenue has surged. It's pulling in billions of dollars now, mostly from other companies paying for access to its models for coding tasks. The startup may soon be worth $100 billion.
Floored by a model
Sourcegraph's Slack remembers the exact moment when he realized Anthropic had a major breakthrough on its hands.
This was June 2024, when Anthropic released its Claude Sonnet 3.5 model. Slack was floored.
"We immediately said, 'this model is better than anything else out there in terms of its ability to write code at length' — high-quality code that a human would be proud to write," he said.
Slack quickly arranged a meeting at Sourcegraph and announced that Sonnet 3.5 would be their default AI model, providing the underlying intelligence that powers the startup's coding service for developers. And he gave it away for free.
Some colleagues wanted more time to evaluate if such a drastic move made sense financially. But Slack insisted.
"Anthropic changed everything," he said. "And as a startup, if you're not moving at that speed, you're gonna die."
The go-to vibe coding platform
Just over a year later, Anthropic models power most of the top AI coding services, including Cursor, Augment, and Microsoft's GitHub Copilot.
Even Meta uses Anthropic models to support its Devmate internal coding assistant. AI coding startup Windsurf was going to be acquired by OpenAI, but Anthropic cut off access to its Claude models, and the deal crumbled. Now Windsurf is back using Anthropic.
All those videos on social media of teenagers vibe coding new apps and websites? Impossible without Anthropic's AI breakthrough in June 2024.
What's even more surprising is that Anthropic's AI coding lead has endured. Its latest models, including Claude Sonnet 4, are still the best at coding more than a year later. That's almost unheard of in AI, when new advancements seem to pop up every day.
Trying to answer the trillion-dollar question
Silicon Valley hasn't given up trying to crack open Anthropic's AI coding secrets.
A few years ago, Anthropic would have published a long research paper detailing the data, techniques, and architecture it used to get Sonnet 3.5 to be a coding expert. Nowadays, though, competition is so fierce that all the AI labs keep their AI sauce super secret.
However, in a recent interview with Business Insider, Anthropic executive Dianne Penn, shared some clues on how the startup made this breakthrough. Cofounder Ben Mann also discussed some successful techniques recently on a podcast.
BI also interviewed several CEOs and founders of AI coding startups that rely on Anthropic AI models, along with a coding expert from MIT.
Let's start with Eric Simons, the ebullient CEO of Stackblitz, the startup behind blockbuster vibe coding service Bolt.new.
Simons thinks Anthropic had its existing models write code and deploy it. Then, the company evaluated all the deployed code, through a combination human expertise and automated AI analysis.
With software coding, it's relatively easy to evaluate good versus bad outputs. That's because the code either works, or it doesn't, when deployed. This creates clear YES and NO signals that are really valuable for training and fine-tuning new AI models, he explained.
Anthropic took these signals and funneled them into the training data and development process for the new Sonnet AI models. This reinforcement-learning strategy produced AI models that were much better at coding, according to Simons, who was equally blown away by Sonnet 3.5's abilities in the summer of 2024.
Human versus AI evaluations
Anthropic cofounder Ben Mann appeared on a podcast recently and seemed to revel in the idea that the rest of Silicon Valley still hadn't caught up with his startup's AI coding abilities.
"Other companies have had, like, code reds for trying to catch up in coding capabilities for quite a while and have not been able to do it," he said. "Honestly, I'm kind of surprised that they weren't able to catch up, but I'll take it."
Still, when pushed for answers, he explained some of the keys to Anthropic's success here.
Mann built Anthropic's human feedback data system in 2021. Back then, it was relatively easy for humans to evaluate signals, such as whether model output A was better than B, and feed that back into the AI development process via a popular technique known as Reinforcement Learning from Human Feedback, or RLHF.
"As we've trained the models more and scaled up a lot, it's become harder to find humans with enough expertise to meaningfully contribute to these feedback comparisons," Mann explained on the No Priors podcast. "For coding, somebody who isn't already an expert software engineer would probably have a lot of trouble judging whether one thing or another was better."
So, Anthropic pioneered a new approach called Reinforcement Learning from AI Feedback, or RLAIF. Instead of humans evaluating AI model outputs, other models would do the analysis.
To make this more-automated technique work, Anthropic wrote a series of principals in English for its models to adhere to. The startup called it Constitutional AI.
"The process is very simple," Mann said. "You just take a random prompt like 'How should I think about my taxes?' and then you have the model write a response. Then you have the model criticize its own response with respect to one of the principles, and if it didn't comply with the principle, then you have the model correct its response."
For coding, you can give the AI models principles such as "Did it actually serve the final answer?" or "Did it do a bunch of stuff that the person didn't ask for?" or "Does this code look maintainable?" or "Are the comments useful and interesting?" Mann explained.
Dr. Mann's empirical method
Elad Gil, a top AI investor and No Priors host, concurred, saying the clear signals from deploying code and seeing it if works, makes this process fruitful.
"With coding, you actually have like a direct output that you can measure: You can run the code, you can test the code," he said. "There's sort of a baked-in utility function you can optimize against."
Mann cited an example from his father, who was a physician. One day, a patient came in with a skin condition on his face, and Dr. Mann couldn't find what the problem was. So, he divided the patient's face into sections and applied different treatments. One area cleared up, revealing the answer empirically.
"Sometimes you just won't know and you have to try stuff — and with code that's easy because we can just do it in a loop," Anthropic's Mann said.
Constitutional AI and beyond
In an interview with BI, Anthropic's Penn described other ingredients that went into making the startup's models so good at coding.
She said the description from Simons, the StackBlitz CEO, was "generally true," while noting that Anthropic's coding breakthrough was the result of a multiyear effort involving many researchers and lots of ideas and techniques.
"We fundamentally made it good at writing code, or being able to figure out what good code looks like, through what you can consider as trial and iterations," she said. "You're giving the model different questions and allowing it to figure out what the right answer is on a coding problem."
When asked about the role of Constitutional AI, Penn said she couldn't share too much detail on the exact techniques, but said "it's definitely in the models."
Using tools with no hands
Anthropic also trained Sonnet 3.5 to be much better at using tools, a key focus that has begun to turn AI models from chatbots into more general-purpose agents — what the startup calls "virtual collaborators."
"They don't have hands," Penn said, so instead, Anthropic's models were trained to write code themselves to access digital tools.
For example, she said that if an Anthropic model is asked for weather information or stock prices, it can write software to tap into an application programming interface, or API, a common way for apps to access data.
Following instructions
When software coding projects get really big, you can't knock out the work in a few minutes. The more complex tasks take days, weeks, or longer.
AI models have been incapable of sticking with long-term jobs like these. But Anthropic invested heavily in making Sonnet 3.5 and later models much better at following human instructions.
This way, if the model gets stumped on a long coding problem, it can take guidance from developers to keep going — essentially listening better to understand the intent of its human colleagues, Penn explained. (Hey, we can all get better at that).
Knowing what to remember
Even the best human software developers can't keep everything related to a coding project in their brains. GitHub repositories, holding code, images, documentation, and revision histories, can be massive.
So Anthropic trained is AI models to create a kind of scratch pad where it jots down notes in an external file system as it's exploring things like a code base.
"We train it to use that tool very well," Penn said (while I frantically scribbled notes on my own reporting pad).
The key here is that Anthropic's models were trained to remember more of the salient details of coding projects, and ignore the less important stuff.
"It's not useful to say, 'Dianne is wearing a colored shirt in this conversation, and Alistair is wearing a green shirt,'" Penn said, describing the BI interview taking place at that moment. "It's more important to note that we talked about coding and how Anthropic focused on coding quality."
This better use of memory means that Anthropic models can suggest multiple code changes over the course of an entire project, something that other AI models aren't as good at.
"If it's not trained well, it could scribble the wrong things," Penn told me. "It's gotten really good at those things. So it actually does not just mean in the short term that it can write good code, but it remembers to write data so that it might make a second or third change that another AI model might not know, because the quality of its notes, plus the quality of its core intelligence, are better."
Claude Code and terminal data
For a while, in around 2022, it looked like AI progress was happening automatically, through more data, more GPUs, and bigger training runs.
"The reality is that there are very discrete breakthroughs, and very discrete ideas that lead to these breakthroughs," said Armando Solar-Lezama, a distinguished professor of computing at MIT. "It takes researchers, and investment in research, to produce the next idea that leads to the next breakthrough."
This is how Anthropic's hard-won coding lead happened. But access to detailed, granular data on how human developers write software is crucial to stay ahead in this part of the AI race, he added.
Andrew Filev has a theory related to this. He's CEO of Zencoder, another AI coding service that uses Anthropic's models.
Filev thinks that data from computer terminal use is key to training AI models to be good at coding. A terminal is a text-based interface that lets developers send instructions to a computer's operating system or software. They type in information via a "command line," and hopefully get outputs.
"Large language models are great with text," he told me in a recent interview about Anthropic. "The computer terminal, where you keep commands, is basically text, too. So at some point, people realized that they should just give that data to their AI model, and it can do amazing things — things which previously had never worked."
In late May, Anthropic rolled out Claude Code, a command line tool for AI coding that works with developers' existing terminals.
Suddenly, Anthropic is now competing against its main customers — all those other AI coding services.
The move also created a direct relationship between Anthropic and developers, giving the AI lab access to a richer source of data on how expert humans write software.
"The amount and the speed that we learn is much less if we don't have a direct relationship with our coding users," Anthropic's Mann said. "So launching Claude Code was really essential for us to get a better sense of what do people need, how do we make the models better, and how do we advance the state-of-the-art?"
In theory, this granular information could be used to help train and fine-tune Anthropic's next models, potentially giving the startup a data edge that might preserve its AI coding lead even longer.
"Could I do this without Anthropic's latest models? No," said Sourcegraph's Slack. "And would their models be as good without Claude Code? I don't think so."

Hashtags

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

AI is driving mass layoffs in tech, but it's boosting salaries by $18,000 a year everywhere else, study says

Yahoo

2 hours ago

Yahoo

AI is driving mass layoffs in tech, but it's boosting salaries by $18,000 a year everywhere else, study says

You've read about it all over, including in Fortune Intelligence. Maybe you or friends have been impacted: artificial intelligence is already transforming work, not least hiring and firing. Nowhere is the impact more visible than in the labor market. The technology industry, the original epicenter of AI adoption, is now seeing many of its own workers displaced by the very innovations they helped create. Employers, racing to integrate AI into everything from cloud infrastructure to customer support, are trimming human headcount in software engineering, IT support, and administrative functions. The rise of AI-powered automation is accelerating layoffs in the tech sector, with impacted employees as high as 80,000 in one count. Microsoft alone is trimming 15,000 jobs while committing $80 billion to new AI investments. But labor market intelligence firm Lightcast is offering a ray of hope going forward. Job postings for non-tech roles that require AI skills are soaring in value. Lightcast's new 'Beyond the Buzz' report, based on analysis of over 1.3 billion job postings, shows that these postings offer 28% higher salaries—an average of nearly $18,000 more per year. The Lightcast research underscores the split in tech and non-tech hiring: job postings for AI skills in tech roles remain robust, but the proportion of AI jobs within IT and computer science has fallen, dropping from 61% in 2019 to just 49% in 2024. This signals an ongoing contraction of traditional tech roles as AI claims an ever-larger share of the work. AI demand explodes beyond tech Rather than stifling workforce prospects, Lightcast's research suggests that AI is dispersing opportunity across the broader economy. More than half of all jobs requesting AI skills in 2024 appeared outside the tech sector—a radical reversal from previous years, when AI was confined to Silicon Valley and computer science labs. Fields like marketing, HR, finance, education, manufacturing, and customer service are rapidly integrating AI tools, from generative AI platforms that craft marketing content to predictive analytics engines that optimize supply chains and recruitment. In fact, job postings mentioning generative AI skills outside IT and computer science have surged an astonishing 800% since 2022, catalyzed by the proliferation of tools like ChatGPT, Microsoft Copilot, and DALL-E. Marketing, design, education, and HR are some of the fastest growers in AI adoption—each adapting to new toolkits, workflows, and ways of creating value. Cole Napper, VP of research, innovation, and talent insights at Lightcast, told Fortune in an interview that he was struck by the lack of a discernible pattern for which industries were most affected by the explosion of AI skills present in job postings, noting that the arts come top of the list. AI skills are in demand For the workforce at large, AI proficiency is emerging as one of today's most lucrative skill investments. Possessing two or more AI skills sends paychecks even higher, with a 43% premium on advertised salaries. In 2024, more than 66,000 job postings specifically mentioned generative AI as a skill, a nearly fourfold increase from the prior year, according to the Lightcast's 2025 Artificial Intelligence Index Report. Large language modeling was the second most common AI skill, which showed up in 19,500 open job posts. Postings listing ChatGPT and prompt engineering as skills ranked third and fourth in frequency, respectively. Sectors such as customer/client support, sales, and manufacturing reported the largest pay bumps for AI-skilled workers, as companies race to automate routine functions and leverage AI for competitive advantage. Christina Inge, founder of Thoughtlight, an AI marketing service, told Fortune in a message AI isn't just automating busywork, it's also becoming a tool AI-fluent workers can leverage to increase their own value to a company—and to outperform their peers. Take, for example, someone in sales using AI to create more targeted conversations to close deals faster, Inge wrote. The same can be said for customer service workers. '[Customer service workers fluent in AI] know how to interpret AI outputs, write clear prompts, and troubleshoot when things go off script,' Inge said. 'That combination of human judgment and AI fluency is hard to find and well worth the extra pay.' In fields like marketing and science, even single AI skills can yield large returns, while more technical positions gravitate to specialists with advanced machine learning or generative AI expertise. Crucially, the most valued AI-enabled roles demand more than just technical wizardry. Employers prize a hybrid skillset: communication, leadership, problem-solving, research, and customer service are among the 10 most-requested skills in AI-focused postings, alongside technical foundations like machine learning and artificial intelligence. 'While generative AI excels at tasks like writing and coding, uniquely human abilities—such as communication, management, innovation, and complex problem-solving—are becoming even more valuable in the AI era,' the study says. Winners and losers The emerging repercussions are striking. Tech workers whose roles are readily automated face rising displacement—unless they can pivot quickly into emerging areas that meld business, technical, and people skills. Meanwhile, millions of workers outside of tech are poised to translate even basic AI literacy into new roles or wage gains. The competitive edge now lies with organizations and professionals agile enough to combine AI capabilities with human judgement, creativity, and business acumen. For companies, the risk is clear: treating AI as an isolated technical specialty is now a liability. Winning firms are investing to embed AI fluency enterprise-wide, upskilling their marketing teams, HR departments, and finance analysts to build a future-ready workforce. AI may be the source of turmoil in Silicon Valley boardrooms, but its economic dividends are flowing rapidly to workers—and companies—in every corner of the economy. For those able to adapt, AI skills are not a harbinger of job loss, but a passport to higher salaries and new career possibilities. Still, the research doesn't indicate exactly where in the income levels the higher postings are coming, so Napper said it's possible that we are seeing some compression, with higher-paid tech jobs being phased out and lower-paying positions being slightly better-paying. Napper said the trend of AI skills cropping up in job postings has exploded over the past few years, and he doesn't expect a slowdown anytime soon. Napper said there's a 'cost to complacency'—one that includes a significant salary cut. He added that the 28% premium, Lightcast plans to release follow-up research on what level of the income latter the trend is hitting the most. For this story, Fortune used generative AI to help with an initial draft. An editor verified the accuracy of the information before publishing. This story was originally featured on Error in retrieving data Sign in to access your portfolio Error in retrieving data Error in retrieving data Error in retrieving data Error in retrieving data

Remote Leadership That Sparks Innovation Across Distance

Forbes

5 hours ago

Forbes

Remote Leadership That Sparks Innovation Across Distance

Boardrooms have vanished into Zoom boxes, and casual hallway chats are now emojis in Slack threads. But while some businesses are scrambling to regain control in a distributed environment, others are thriving — led by entrepreneurs who've found ways to transform distance into an engine for growth. These leaders aren't mimicking the office in digital form. Instead, they're redefining what strong leadership looks like when everyone is working remotely. By embracing clarity, structure, and human-first systems, they're proving that well-led remote teams can be more aligned, creative and resilient than their in-office predecessors. Here's what's working and why. 1. Structure weekly rhythms to drive focus For distributed teams, ambiguity is an inconvenience, but more importantly, it's a performance risk. Gian Reyes, vice president for marketing and strategic partnerships at KMC Solutions, emphasizes that consistent direction and feedback aren't optional. 'The real value comes from regular one-on-ones — that's where coaching, feedback and alignment happen in real time,' he says. KMC Solutions uses structured rhythms — Monday alignment meetings and Friday progress check-ins — to build urgency around solving real business problems. According to Reyes, this keeps employees connected to outcomes, not just outputs. It also reinforces a culture where people know how their work contributes to the bigger picture. 2. Recognize impact to maintain momentum When employees aren't in the same physical space, it's easy for good work to go unnoticed. Tyler Nielsen, marketing director at Avoya Travel, sees timely recognition as a key driver of motivation. 'Reinforcing that someone's work is seen and valued helps keep motivation high,' he says. Nielsen recommends making recognition personal and specific. Whether it's a shoutout in a team chat or a moment of acknowledgment in a one-on-one, tying praise to impact helps team members feel aligned with outcomes, not just busywork. Research supports this: employees who feel recognized are significantly more likely to be engaged and stay with their company. 3. Use short-term squads to encourage innovation Solving complex problems remotely requires more than assigning tasks. Gian Reyes, vice president for marketing and strategic partnerships at KMC Solutions, describes how his team forms small, cross-functional 'task forces' to tackle high-impact challenges. 'These aren't standing committees,' he says. 'They're mission-driven squads with a clear objective and a short timeline.' This structure gives team members a chance to contribute beyond their formal roles and fosters faster, more focused collaboration. A 2025 report by Boston Consulting Group on the world's most innovative companies reinforces this approach. It found that firms embracing flexible, cross-functional team models are significantly more resilient and more effective at innovation, particularly during periods of uncertainty. These agile squads allow organizations to bypass the bottlenecks of traditional hierarchies and accelerate both decision-making and creative problem-solving. 4. Redesign onboarding for belonging, not just logistics Remote onboarding often becomes a checklist—but without connection, new hires can feel isolated. To address this, some organizations are designing onboarding experiences that prioritize belonging from day one. According to MIS Quarterly Executive, Accenture's New Joiner Experience (NJX) — which features immersive virtual environments like the One Accenture Park campus — has engaged over 400,000 employees as of December 2024, with participants consistently rating the experience above 4.6 out of 5. That scale and sustained satisfaction demonstrate that immersive, culture-focused onboarding can significantly elevate early employee sentiment and engagement. Leading organizations now include structured introductions with multiple team members, walkthroughs of decision-making, not just procedures, and a clear roadmap for the first month. These elements reinforce understanding of company values and help embed new hires into the organization's culture from day one. When remote work is approached with intention, it can surface the best in teams; greater autonomy, more innovation, and tighter alignment with outcomes. That doesn't happen by default. It takes clarity, culture, and consistent effort from leadership. But for those willing to reimagine what leadership looks like from a distance, the rewards aren't just survival; they're scalable, distributed success.

I replaced my laptop with the Galaxy Z Fold 7 — here's what happened

Tom's Guide

5 hours ago

Tom's Guide

I replaced my laptop with the Galaxy Z Fold 7 — here's what happened

There is a lot to love about some of the best foldable phones, but there's no denying that they're pricey. That goes double for the Samsung Galaxy Z Fold 7, which costs a whopping $2,000. That's more than some of the best laptops, so you would hope that it could replace them, right? In concept, it should be possible to replace your laptop with a Samsung phone thanks to Samsung DeX. If you've never heard of DeX, it essentially allows you to connect your phone with an external display like a monitor or TV, essentially turning your phone into a mini PC. I've never tried the feature, so I thought I would try replacing my work laptop for a day to see how it all worked out. Here are some of the ground rules I set myself: I would try the phone with both a connected keyboard and mouse, and without them to see which is ideal. I also made sure to write at least one piece of news while using the feature; in this case, it was a story about the Galaxy S26 Ultra reusing the same battery as last year's Galaxy S25 Ultra. So, with my parameters set, let's get into how my morning went. Spoiler: it wasn't great. Honestly, setting up the phone with my monitor was amazingly easy, as I only really had to connect the phone. In my case, I have a pretty ancient monitor, so I needed a USB-C hub with an HDMI connection, which also has USB ports for the keyboard and mouse. All I needed to do was plug them all in and I was good to go. Saying that, I did have one issue. For the most part, I had to use the app versions of certain services like Slack and Google Drive. This led to some complications, primarily with the layout of Slack and the ease of sharing documents via Google Drive. Admittedly, this was more down to my own developed habits, rather than a major issue with Samsung DeX. I can only imagine that this would get easier throughout the morning, and it did. Unfortunately, I had more issues when it came to actually using the phone for my task. Initially, there was no real problem, but the minute I ditched my USB keyboard and mouse, I hit a real wall. Get instant access to breaking news, the hottest reviews, great deals and helpful tips. When it comes to using the phone as a mouse, it technically makes sense. Essentially, you can use the entire screen, which is separated into four quadrants. You can move your finger and tap where you please, and the screen can be split so that you can use the keyboard as well. However, the phone can only really face one way when you try to use it as a mouse. So if the phone is facing a different way, and you move your finger right, the cursor will go down instead. The only indication you have is the small buttons on the side, which are easy to miss when you're looking at the larger screen. Once I figured out the mouse, it was down to figuring out the keyboard and wow, does the keyboard suck. If you have never used the Samsung keyboard before, let me paint you a picture of how it looks. You have your basic QWERTY layout, but with a large gap down the middle that also cuts the space bar in half. Then, below that is the bar that contains all your quick navigation tools. Now, despite the Galaxy Z Fold 7 being bigger than the Galaxy Z Fold 6, this is not a keyboard that you can comfortably use, unless you have a doll's hands. There's no touch typing here; instead, I found myself using two fingers to slowly press the keys like your tech-illiterate aunt. Not only is it uncomfortable to use, but the placement of the navigation buttons becomes a real nightmare. I constantly found myself slowly getting into a flow when it came to typing, only to accidentally brush the home button and lose the keyboard. To say it was infuriating is an understatement. Another problem that soon appeared had to do with just how much battery the phone consumed in this mode. Now, my USB hub does have a charging port, but I found that my phone was constantly losing battery, even when the charger was attached to the hub. In the end, I had to constantly disconnect the phone and charge it for a time, which took up more of my morning than I would like. Speaking of the port, there is a bit of an issue with where it is placed and what that means for balance. In my experience using the Z Fold 7, I found the best way to set it up on the desk was in the L shape, as it allowed me to type without overshooting as much and hitting the mouse section of the screen. However, to make sure the phone was balanced, I needed to place the camera side up, and that side of the phone is also where the USB-C port is. So, I had to choose between either a wobbly keyboard by having the cameras on the table, or suffer a constant threat of the phone tipping over. Here's the thing: using the Galaxy Z Fold 7 as a PC is entirely possible with Samsung DeX, so long as you have a Keyboard and mouse. However, this is where my biggest issue with the idea rears its ugly head. In reality, if I have to carry the phone, keyboard and mouse, then I might as well use a laptop instead. After all, they come with all of these things included, as well as the browser versions of the apps. As such, I think this is a feature that you'll use to watch YouTube on your bigger TV, which, let's be honest, the best smart TVs can do better. It has its purpose, but you shouldn't buy one for that reason alone. However, I am always happy to hear other people's opinions, so let me know if you're a stalwart DeX user, or if you tried it and hated it. Follow Tom's Guide on Google News to get our up-to-date news, how-tos, and reviews in your feeds. Make sure to click the Follow button.

How Anthropic got so good at coding

Hashtags

Try Our AI Features

Comments

Related Articles

AI is driving mass layoffs in tech, but it's boosting salaries by $18,000 a year everywhere else, study says

Remote Leadership That Sparks Innovation Across Distance

I replaced my laptop with the Galaxy Z Fold 7 — here's what happened

Get Started Now: Download the App