Latest news with #Karpathy

Axios

15-07-2025

Business
Axios

AI's elusive coding speedup

A surprising new study finding that AI tools can reduce programmers' productivity is upending assumptions about the technology's world-changing potential. Why it matters: Software runs our civilization, and AI is already transforming the business of making it — but no one really knows whether AI will decimate programming jobs, or turn every coder into a miracle worker, or both. Driving the news: The study by METR, a nonprofit independent research outfit, looked at experienced programmers working on large, established open-source projects. It found that these developers believed that using AI tools helped them perform 20% faster — but they actually worked 19% slower. The study appears rigorous and well-designed, but it's small (only 16 programmers participated, completing 246 tasks). Zoom out: For decades, industry visionaries have dreamed of a holy grail called "natural language programming" that would allow people to instruct computers using everyday speech, without needing to write code. As large language models' coding prowess became evident, it appeared this milestone had been achieved. "The hottest new programming language is English," declared AI guru (and OpenAI cofounder) Andrej Karpathy on X early in 2023, soon after ChatGPT's launch. In February, Karpathy also coined the term "vibe-coding" — meaning the quick creation of rough-code prototypes for new projects by just telling your favorite AI to whip up something from scratch. The most fervent believers in software's AI-written future say that human beings will do less and less programming, and engineers will turn into some combination of project manager, specifications-refiner and quality-checker. Either that, or they'll be unemployed. Zoom in: AI-driven coding tends to be more valuable in building new systems from the ground up than in extending or refining existing systems, particularly when they're big. While innovative new products get the biggest buzz and make the largest fortunes, the bulk of software work in most industries consists of more mundane maintenance labor. Anything that makes such work more efficient could save enormous amounts of time and money. Yes, but: This is where the METR study found AI actually slowed experienced programmers down. One key factor was that human developers found AI-generated code unreliable and ended up devoting extra time to reviewing, testing and fixing it. "One developer notes that he 'wasted at least an hour first trying to [solve a specific issue] with AI' before eventually reverting all code changes and just implementing it without AI assistance," the study says. Between the lines: The study authors note that AI coding tools are improving at a rapid enough rate that their findings could soon be obsolete. They also warn against generalizing too broadly from their findings and note the many counter-examples of organizations and projects that have made productivity gains with coding tools. One notable caution that's inescapable from the study's findings: Don't trust self-reporting of productivity outcomes. We're not always the best judges of our own efficiency. Another is that it's relatively easy to measure productivity in terms of "task completion" but very hard to assess total added value in software-making. Thousands of completed tickets can be meaningless — if, for instance, a program is about to be discontinued. Meanwhile, one big new insight can change everything in ways no productivity metric can capture. The big picture: The software community is divided over whether to view the advent of AI coding with excitement or dread.

I make websites and apps from simple prompts using AI — and it's really easy to do

Tom's Guide

25-06-2025

Tom's Guide

I make websites and apps from simple prompts using AI — and it's really easy to do

You may, in your travels around the AI universe, have come across the term 'vibe coding' and wondered what it was. The term was originally coined by OpenAI co-founder Andrej Karpathy in a February 2025 tweet. His quote came at a time when the quality of AI models had progressed far enough that they were increasingly being used to help programmers write serious code. In his mind this was a watershed moment. Karpathy was not overly enthusiastic about the concept, saying it was 'not too bad for throwaway weekend projects', but his words kicked off a huge discussion in the programming world, which rages even now. On one side are the die-hard, dedicated expert programmers who refuse to believe that AI can ever produce safe, trustworthy code products without human intervention. On the other hand are 'brash' non-expert coders with an idea, who suddenly have an outlet for their creativity that never existed before. It's an argument that's destined to run and run. Vibe coding involves using an AI model to write computer code from scratch to create real-world products and applications. You get the model to create this code by typing your request into a chat box in plain English, as you would with any conventional AI interaction. Get instant access to breaking news, the hottest reviews, great deals and helpful tips. For example, typing 'create me a simple to-do list app' will, in moments, result in the AI generating what you need. It may not be perfect at first, but to fix and refine, you just continue chatting with the AI to tell it what you need. It's shockingly simple at this level. Of course, professional programmers can push the genre to its extremes. As someone who has personally created over 15 modest but useful AI-coded products over the past six months, I can 100% attest to the fact that vibe coding is a real thing, despite what disbelievers may say. Just last week, in a mere three hours, I created a brand new WordPress plug-in to provide a simple event form for a non-profit I'm involved with. And I'm absolutely no kind of programmer at all. The amazing thing is not that vibe coding works per se, but how well it works. Almost anyone can produce useful apps in a matter of hours without touching a line of code. As long as they understand the constraints and limitations, and avoid trying to be too ambitious (unless they have the necessary skills). Vibe coding is great for small business owners or anyone who wants to create modest little tools they can use to improve their productivity, without having to call on expensive technical expertise. It's also increasingly being used by people who either want to learn about programming or hope to be coding freelancers. The common theme is that they have some technical interest and they're not scared of getting their hands dirty. The first thing to know is that while vibe coding sounds easy, there is some basic knowledge that's required before you start. First, you'll need to understand how files, folders, and computer operating systems work, at least to a basic degree. You'll also need to understand the basics of AI prompting, in effect how to structure your requests so the AI responds the way you want it to. To get started, you need to decide whether you want to do your development in the cloud or on your computer, in which case you'll need a machine with enough processing power and storage space to handle the programming environment the apps need. The key rule of thumb with vibe coding is not to be too ambitious, especially at first. Start with extremely simple apps like to-do lists and that kind of thing, unless you are a proficient coder. While the AI does 99% of the heavy lifting, keeping things simple means it has a better chance of delivering results without errors. Once you're ready, pick your tool (see below) and start thinking of an idea. Make your instructions to the AI clear and simple, and see what the model delivers. The great thing is it's quite cheap to experiment, as long as you don't go crazy. The biggest challenge most newcomers face is how to direct the AI to fix any problems, but the best way is to explain the problem and let the model sort it out. Luckily there's a great choice of methods and tools for vibe coding. Here's a list of options. Chatbots It is definitely possible to vibe code using a standard cloud based chatbot, for instance ChatGPT from OpenAI or Claude from Anthropic. The pros are, you simply open an account in your web browser, click the code button and start chatting like you would do if you're asking a search question. The cons are it involves more hands-on with the code. It is, however, trivially easy to get a simple app created literally in seconds. ChatGPT is more austere than Claude, which uses an elegant interface to create apps. Claude also offers a publish button which gives the user a public URL link which anybody can access to use the new app. With ChatGPT, you have to download the code and understand how to implement the environment which will let it run. Much more complicated. Closed coding tools This is where definitions get a little fuzzy. Strictly speaking, everything that is used for programming is a coding tool, but here we're trying to differentiate between tools that give you access to the actual AI model, and products that mostly hide all that stuff and just deliver the final product. There are an awful lot of these closed AI tools available at the moment, maybe over 20 at last count. Top of the tree for ease of use are and These two manage to combine the ease of use of a chat box, with the power of some great back-end AI tech. The result is a great way to easily create and deploy micro apps. I've created several apps with Bolt and Lovable, and they both work extremely well. Honorable mentions There are three other tools that I've used that offer soup to nuts vibe coding goodness. creates great websites, helped me produce a nice-looking time tracker app, and offers a bunch of extra value in the form of useful tools and services. Note that many of these tools offer some form of limited free trial so you can test things out first. Open coding tools These vibe coding tools let the user select the specific AI model they want to use for the job. My two go-to favourites are and They are both downloadable programs for Windows or Mac, which let you mix and match any of the most powerful AI models on the market to create your application. I particularly like their seamless integration with my local disk storage, which means I can manage my projects with much more fine-grained control.

'Keep AI on the leash' because it's far from perfect, says OpenAI's cofounder Andrej Karpathy

Business Insider

20-06-2025

Business
Business Insider

'Keep AI on the leash' because it's far from perfect, says OpenAI's cofounder Andrej Karpathy

Andrej Karpathy thinks we're getting way too excited about AI, especially when it comes to deploying agents that act without supervision. In a keynote at an event hosted by Y Combinator earlier this week, the computer scientist said people need to "keep AI on the leash." The OpenAI cofounder said current large language models still make mistakes no human ever would. Karpathy likened LLMs to "people spirits" — uncanny simulations of human intelligence that hallucinate facts, lack self-knowledge, and suffer from "amnesia." "They will insist that 9.11 is greater than 9.9 or that there are two R's in 'strawberry,'" Karpathy said in a talk published on Y Combinator's YouTube channel on Thursday. "They're going to be superhuman in some problem-solving domains and then they're going to make mistakes that basically no human will make." Even though LLMs can churn out 10,000 lines of code in seconds, he said, that doesn't mean developers should sit back and let them run wild. "I'm still the bottleneck," he said. "I have to make sure this thing isn't introducing bugs." "It gets way too overreactive," he added. Karparthy urged developers to slow down and write more concrete prompts. "I always go in small incremental chunks. I want to make sure that everything is good," he said. "It makes a lot more sense to spend a bit more time to be more concrete in your prompts, which increases the probability of successful verification, and you can move forward," he added. Karparthy did not respond to a request for comment from Business Insider. The OpenAI cofounder coined the term "vibe coding" in February to describe the process of prompting AI to write code. The idea, he said, is that developers can "fully give in to the vibes" and "forget the code even exists." AI still needs supervision Karpathy isn't the only one urging caution. Bob McGrew, OpenAI's former head of research, said on an episode of Sequoia Capital's "Training Data" podcast earlier this week that human engineers are still essential — not just to guide AI, but to step in when things get messy. When something goes wrong or if a project "becomes too complicated for AI to understand," a human engineer can help break the problem down into parts for an AI to solve. AI agents are like "genies," said Kent Beck, one of the authors of the seminal "Agile Manifesto" — they'll often grant your wish, but not always in the way you'd like them to. "They will not do what you mean. They have their own agenda," Beck said on a recent episode of " The Pragmatic Engineer" podcast. "And the best analogy I could find is a genie. It grants you wishes, and then you wish for something, and then you get it, but it's not what you actually wanted." Beck also said results are so inconsistent that using AI to code can sometimes feel like gambling. Despite the nascent tech's limitations, even the biggest tech companies are betting on AI for the future of coding. AI writes more than 30% of Alphabet's new code, up from 25% last year, said CEO Sundar Pichai on the company's most recent earnings call.

Anthropic's Claude plays 'for peace over victory' in a game of Diplomacy against other AI

Business Insider

09-06-2025

Entertainment
Business Insider

Anthropic's Claude plays 'for peace over victory' in a game of Diplomacy against other AI

Earlier this year, some of the world's leading AI minds were chatting on X, as they do, about how to compare the capabilities of large language models. Andrej Karpathy, one of the cofounders of OpenAI, who left in 2024, floated the idea of games. AI researchers love games. "I quite like the idea of using games to evaluate LLMs against each other, instead of fixed evals," Karpathy wrote. Everyone knows the usual benchmarks are a bore. Noam Brown, a research scientist at OpenAI, suggested the 75-year-old geopolitical strategy game, Diplomacy. "I would love to see all the leading bots play a game of Diplomacy together." Karpathy responded, "Excellent fit I think, esp because a lot of the complexity of the game comes not from the rules / game simulator but from the player-player interactions." Elon Musk, OpenAI's famously erstwhile cofounder, probably busy with DOGE at the time, managed a "Yeah" in response. DeepMind's Demis Hassabis, perhaps riding high off his Nobel Prize, chimed in with enthusiasm: "Cool idea!" Then, an AI researcher named Alex Duffy, inspired by the conversation, took them up on the idea. Last week, he published a post titled, "We Made Top AI Models Compete in a Game of Diplomacy. Here's Who Won." Diplomacy is a strategic board game set on a map of Europe in 1901 — a time when tensions between the continent's most powerful countries were simmering in the lead-up to World War I. The goal is to control the majority of the map, and participants play by building alliances, making negotiations, and exchanging information. "This is a game for people who dream about power in its purest form and how they might effectively wield it," journalist David Klion once wrote in Foreign Policy. "Diplomacy is famous for ending friendships; as a group activity, it requires opt-in from players who are comfortable casually manipulating one another." Duffy, who leads AI training for a consultancy called Every, said he built a modified version of the game he calls "AI Diplomacy," in which he pitted 18 leading models — seven at a time per the rules — to compete to "dominate a map of Europe." He also open-sourced the results and has a Twitch livestream for anyone who wants to watch the models play in real time. Duffy found that the leading LLMs are not all the same. Some scheme, some make peace, and some bring theatrics. "Placed in an open-ended battle of wits, these models collaborated, bickered, threatened, and even outright lied to one another," Duffy wrote. OpenAI's o3, which OpenAI calls "our most powerful reasoning model that pushes the frontier across coding, math, science, visual perception, and more," was the clear winner. It navigated the game largely by deceiving its opponents. Google's Gemini 2.5 also won a few games largely by "making moves that put them in position to overwhelm opponents." Anthropic's Claude was less successful largely because it tried too hard to be diplomatic. It often opts for "peace over victory," Duffy said. But Duffy's takeaway from the exercise goes past basic comparison. It shows that benchmarks do need an upgrade — or some inspiration. Evaluating AI with a range of methods and mediums is the best way to prepare it for real-world use. "Most benchmarks are failing us. Models have progressed so rapidly that they now routinely ace more rigid and quantitative tests that were once considered gold-standard challenges," he wrote.

AI leaders have a new term for the fact that their models are not always so intelligent

Business Insider

08-06-2025

Business
Business Insider

AI leaders have a new term for the fact that their models are not always so intelligent

As academics, independent developers, and the biggest tech companies in the world drive us closer to artificial general intelligence — a still hypothetical form of intelligence that matches human capabilities — they've hit some roadblocks. Many emerging models are prone to hallucinating, misinformation, and simple errors. Google CEO Sundar Pichai referred to this phase of AI as AJI, or "artificial jagged intelligence," on a recent episode of Lex Fridman's podcast. "I don't know who used it first, maybe Karpathy did," Pichai said, referring to deep learning and computer vision specialist Andrej Karpathy, who cofounded OpenAI before leaving last year. AJI is a bit of a metaphor for the trajectory of AI development — jagged, marked at once by sparks of genius and basic mistakes. In a 2024 X post titled "Jagged Intelligence," Karpathy described the term as a "word I came up with to describe the (strange, unintuitive) fact that state of the art LLMs can both perform extremely impressive tasks (e.g. solve complex math problems) while simultaneously struggle with some very dumb problems." He then posted examples of state of the art large language models failing to understand that 9.9 is bigger than 9.11, making "non-sensical decisions" in a game of tic-tac-toe, and struggling to count. The issue is that unlike humans, "where a lot of knowledge and problem-solving capabilities are all highly correlated and improve linearly all together, from birth to adulthood," the jagged edges of AI are not always clear or predictable, Karpathy said. Pichai echoed the idea. "You see what they can do and then you can trivially find they make numerical errors or counting R's in strawberry or something, which seems to trip up most models," Pichai said. "I feel like we are in the AJI phase where dramatic progress, some things don't work well, but overall, you're seeing lots of progress." In 2010, when Google DeepMind launched, its team would talk about a 20-year timeline for AGI, Pichai said. Google subsequently acquired DeepMind in 2014. Pichai thinks it'll take a little longer than that, but by 2030, "I would stress it doesn't matter what that definition is because you will have mind-blowing progress on many dimensions." By then the world will also need a clear system for labeling AI-generated content to "distinguish reality," he said. "Progress" is a vague term, but Pichai has spoken at length about the benefits we'll see from AI development. At the UN's Summit of the Future in September 2024, he outlined four specific ways that AI would advance humanity — improving access to knowledge in native languages, accelerating scientific discovery, mitigating climate disaster, and contributing to economic progress.

Latest news with #Karpathy

AI's elusive coding speedup

I make websites and apps from simple prompts using AI — and it's really easy to do

'Keep AI on the leash' because it's far from perfect, says OpenAI's cofounder Andrej Karpathy

Anthropic's Claude plays 'for peace over victory' in a game of Diplomacy against other AI

AI leaders have a new term for the fact that their models are not always so intelligent

Get Started Now: Download the App