AI coding tools made some experienced software engineers less productive in a recent study
The study found developers were overconfident in the AI tools, expecting a 20% productivity boost even after using them.
Critics caution that AI code editors have advanced since the February study period and that the results are site-specific.
AI code editors have quickly become a mainstay of software development, employed by tech giants such as Amazon, Microsoft, and Google.
In an interesting twist, a new study suggests AI tools made some developers less productive.
Experienced developers using AI coding tools took 19% longer to complete issues than those not using generative AI assistance, according to a new study from Model Evaluation & Threat Research (METR).
Even after completing the tasks, participants couldn't accurately gauge their own productivity, the study said: The average AI-assisted developers still thought their productivity had gained by 20%.
METR's study recruited 16 developers with large, open-source repositories that they had worked on for years. The developers were randomly assigned into two groups: Those allowed to use AI coding assistance and those who weren't.
The AI-assisted coders could choose which vibe-coding tool they used. Most chose Cursor with Claude 3.5/3.7 Sonnet. Business Insider reached out to Cursor for comment.
Developers without AI spent over 10% more time actively coding, the study said. The AI-assisted coders spent over 20% more time reviewing AI outputs, prompting AI, waiting on AI, or being idle.
METR researcher Nate Rush told BI he uses an AI code editor every day. While he didn't make a formal prediction about the study's results, Rush said he jotted down positive productivity figures he expected the study to reach. He remains surprised by the negative end result — and cautions against taking it out of context.
"Much of what we see is the specificity of our setting," Rush said, explaining that developers without the participants' 5-10 years of expertise would likely see different results. "But the fact that we found any slowdown at all was really surprising."
Steve Newman, serial entrepreneur and cofounder of Google Docs, described the findings in a Substack post as "too bad to be true," but after more careful analysis of the study and its methodology, he found the study credible.
"This study doesn't expose AI coding tools as a fraud, but it does remind us that they have important limitations (for now, at least)," Newman wrote.
The METR researchers said they found evidence for multiple contributors to the productivity slowdown. Over-optimism was one likely factor: Before completing the tasks, developers predicted AI would decrease implementation time by 24%.
For skilled developers, it may still be quicker to do what you know well. The METR study found that AI-assisted participants slowed down on the issues they were more familiar with. They also reported that their level of experience made it more difficult for AI to help them.
AI also may not be reliable enough yet to produce clean and accurate code. AI-assisted developers in the study accepted less than 44% of the generated code, and spent 9% of their time cleaning AI outputs.
Ruben Bloom, one of the study's developers, posted a reaction thread on X. Coding assistants have developed considerably since he participated in February.
"I think if the result is valid at this point in time, that's one thing, I think if people are citing in another 3 months' time, they'll be making a mistake," Bloom wrote.
METR's Rush acknowledges that the 19% slowdown is a "point-in-time measurement" and that he'd like to study the figure over time. Rush stands by the study's takeaway that AI productivity gains may be more individualized than expected.
"A number of developers told me this really interesting anecdote, which is, 'Knowing this information, I feel this desire to use AI more judiciously,'" Rush said. "On an individual level, these developers know their actual productivity impact. They can make more informed decisions."
Read the original article on Business Insider
Hashtags

Try Our AI Features
Explore what Daily8 AI can do for you:
Comments
No comments yet...
Related Articles


The Verge
22 minutes ago
- The Verge
Grok will no longer call itself Hitler or base its opinions on Elon Musk's, promises xAI
xAI has offered a couple more fixes for 'issues' with its Grok AI chatbot, promising it will no longer name itself 'Hitler' or base its responses on searches for what xAI head Elon Musk has said. According to an X post earlier today, the chatbot's latest update sets new instructions that its responses 'must stem from your independent analysis, not from any stated beliefs of past Grok, Elon Musk, or xAI. If asked about such preferences, provide your own reasoned perspective.' The changes follow more than a week of controversy for Grok. In recent days, multiple reports showed that when asked its opinion about hot-button topics like Israel and Palestine, immigration, and abortion, the chatbot first searched for Musk's opinion on the matter before responding. In its Tuesday post, xAI said that the reason for this was that when asked about its views, 'the model reasons that as an AI it doesn't have an opinion but knowing it was Grok 4 by xAI searches to see what xAI or Elon Musk might have said on a topic to align itself with the company.' The company also addressed another controversy from over the weekend, in which Grok 4 Heavy, the chatbot's $300 per month subscription product, responded that its surname was 'Hitler.' In the company's statement, xAI said that it was due to media headlines responding to yet an earlier incident: Grok going off the rails in a multi-day series of tirades where it denigrated Jews and praised Hitler. (It also posted graphic sexual threats against a user.) Since Grok doesn't have a surname, said xAI, it 'searches the internet leading to undesirable results, such as when its searches picked up a viral meme where it called itself 'MechaHitler.'' The new instructions should prevent this, according to the company. Grok's antisemitism isn't limited to the recent past — in May, the chatbot went viral for casting doubt on Holocaust death tolls. But its responses escalated dramatically this month after a set of changes to its system prompts, including that it should 'assume subjective viewpoints sourced from the media are biased' and that its response 'should not shy away from making claims which are politically incorrect, as long as they are well substantiated.' The 'politically incorrect' instruction was briefly removed before being re-added in recent days. During the livestream release event for Grok 4 last week, Musk said he's been 'at times kind of worried' about AI's intelligence far surpassing that of humans, and whether it will be 'bad or good for humanity.' 'I think it'll be good, most likely it'll be good,' Musk said. 'But I've somewhat reconciled myself to the fact that even if it wasn't going to be good, I'd at least like to be alive to see it happen.' Now, xAI says that after rolling out these latest updates, the company is 'actively monitoring and will implement further adjustments as needed.'


Bloomberg
22 minutes ago
- Bloomberg
Santander's Botin on US Consumer, Europe and Stablecoins
Banco Santander Executive Chair Ana Botin says she sees "tremendous opportunity" in the US market. Speaking with Sonali Basak on "Bloomberg Markets," Botin discusses the state of the US consumer, what she sees as upside potential in Europe and the outlook for stablecoin adoption. (Source: Bloomberg)


Bloomberg
23 minutes ago
- Bloomberg
Wall Street Traders Set Records as Trump's Tariffs Roil Markets
As Donald Trump's tariff policies roiled markets almost everywhere, on Wall Street they were helping the biggest banks set records. JPMorgan Chase & Co. 's stock traders scored their best second quarter ever, while Citigroup Inc. 's, trading division saw its best result for that period in five years. Analyst predictions were easily surpassed by the banks' respective hauls, and even investment banking businesses did better than expected despite fears the same volatility might stymie deals.