logo
#

Latest news with #Karpathy

Anthropic's Claude plays 'for peace over victory' in a game of Diplomacy against other AI
Anthropic's Claude plays 'for peace over victory' in a game of Diplomacy against other AI

Business Insider

time9 hours ago

  • Entertainment
  • Business Insider

Anthropic's Claude plays 'for peace over victory' in a game of Diplomacy against other AI

Earlier this year, some of the world's leading AI minds were chatting on X, as they do, about how to compare the capabilities of large language models. Andrej Karpathy, one of the cofounders of OpenAI, who left in 2024, floated the idea of games. AI researchers love games. "I quite like the idea of using games to evaluate LLMs against each other, instead of fixed evals," Karpathy wrote. Everyone knows the usual benchmarks are a bore. Noam Brown, a research scientist at OpenAI, suggested the 75-year-old geopolitical strategy game, Diplomacy. "I would love to see all the leading bots play a game of Diplomacy together." Karpathy responded, "Excellent fit I think, esp because a lot of the complexity of the game comes not from the rules / game simulator but from the player-player interactions." Elon Musk, OpenAI's famously erstwhile cofounder, probably busy with DOGE at the time, managed a "Yeah" in response. DeepMind's Demis Hassabis, perhaps riding high off his Nobel Prize, chimed in with enthusiasm: "Cool idea!" Then, an AI researcher named Alex Duffy, inspired by the conversation, took them up on the idea. Last week, he published a post titled, "We Made Top AI Models Compete in a Game of Diplomacy. Here's Who Won." Diplomacy is a strategic board game set on a map of Europe in 1901 — a time when tensions between the continent's most powerful countries were simmering in the lead-up to World War I. The goal is to control the majority of the map, and participants play by building alliances, making negotiations, and exchanging information. "This is a game for people who dream about power in its purest form and how they might effectively wield it," journalist David Klion once wrote in Foreign Policy. "Diplomacy is famous for ending friendships; as a group activity, it requires opt-in from players who are comfortable casually manipulating one another." Duffy, who leads AI training for a consultancy called Every, said he built a modified version of the game he calls "AI Diplomacy," in which he pitted 18 leading models — seven at a time per the rules — to compete to "dominate a map of Europe." He also open-sourced the results and has a Twitch livestream for anyone who wants to watch the models play in real time. Duffy found that the leading LLMs are not all the same. Some scheme, some make peace, and some bring theatrics. "Placed in an open-ended battle of wits, these models collaborated, bickered, threatened, and even outright lied to one another," Duffy wrote. OpenAI's o3, which OpenAI calls "our most powerful reasoning model that pushes the frontier across coding, math, science, visual perception, and more," was the clear winner. It navigated the game largely by deceiving its opponents. Google's Gemini 2.5 also won a few games largely by "making moves that put them in position to overwhelm opponents." Anthropic's Claude was less successful largely because it tried too hard to be diplomatic. It often opts for "peace over victory," Duffy said. But Duffy's takeaway from the exercise goes past basic comparison. It shows that benchmarks do need an upgrade — or some inspiration. Evaluating AI with a range of methods and mediums is the best way to prepare it for real-world use. "Most benchmarks are failing us. Models have progressed so rapidly that they now routinely ace more rigid and quantitative tests that were once considered gold-standard challenges," he wrote.

AI leaders have a new term for the fact that their models are not always so intelligent
AI leaders have a new term for the fact that their models are not always so intelligent

Business Insider

time2 days ago

  • Business
  • Business Insider

AI leaders have a new term for the fact that their models are not always so intelligent

As academics, independent developers, and the biggest tech companies in the world drive us closer to artificial general intelligence — a still hypothetical form of intelligence that matches human capabilities — they've hit some roadblocks. Many emerging models are prone to hallucinating, misinformation, and simple errors. Google CEO Sundar Pichai referred to this phase of AI as AJI, or "artificial jagged intelligence," on a recent episode of Lex Fridman's podcast. "I don't know who used it first, maybe Karpathy did," Pichai said, referring to deep learning and computer vision specialist Andrej Karpathy, who cofounded OpenAI before leaving last year. AJI is a bit of a metaphor for the trajectory of AI development — jagged, marked at once by sparks of genius and basic mistakes. In a 2024 X post titled "Jagged Intelligence," Karpathy described the term as a "word I came up with to describe the (strange, unintuitive) fact that state of the art LLMs can both perform extremely impressive tasks (e.g. solve complex math problems) while simultaneously struggle with some very dumb problems." He then posted examples of state of the art large language models failing to understand that 9.9 is bigger than 9.11, making "non-sensical decisions" in a game of tic-tac-toe, and struggling to count. The issue is that unlike humans, "where a lot of knowledge and problem-solving capabilities are all highly correlated and improve linearly all together, from birth to adulthood," the jagged edges of AI are not always clear or predictable, Karpathy said. Pichai echoed the idea. "You see what they can do and then you can trivially find they make numerical errors or counting R's in strawberry or something, which seems to trip up most models," Pichai said. "I feel like we are in the AJI phase where dramatic progress, some things don't work well, but overall, you're seeing lots of progress." In 2010, when Google DeepMind launched, its team would talk about a 20-year timeline for AGI, Pichai said. Google subsequently acquired DeepMind in 2014. Pichai thinks it'll take a little longer than that, but by 2030, "I would stress it doesn't matter what that definition is because you will have mind-blowing progress on many dimensions." By then the world will also need a clear system for labeling AI-generated content to "distinguish reality," he said. "Progress" is a vague term, but Pichai has spoken at length about the benefits we'll see from AI development. At the UN's Summit of the Future in September 2024, he outlined four specific ways that AI would advance humanity — improving access to knowledge in native languages, accelerating scientific discovery, mitigating climate disaster, and contributing to economic progress.

Vibe Coding: AI's Transformation Of Software Development
Vibe Coding: AI's Transformation Of Software Development

Forbes

time29-04-2025

  • Forbes

Vibe Coding: AI's Transformation Of Software Development

Vibe Coding: AI's Transformation Of Software Development In the rapidly evolving landscape of software development, one month can be enough to create a trend that makes big waves. In fact, only two months ago, Andrej Karpathy, a former head of AI at Tesla and an ex-researcher at OpenAI, defined 'vibe coding' in a social media post. This approach to software development uses large language models (LLMs) to prioritize the developer's vision and user experience, moving away from conventional coding practices. The code no longer matters. Vibe coding is less about writing code in the conventional sense and more about making the right requests to generative AI (aka a Forrester coding TuringBot) to produce the desired outcome based on the developer's 'vibe' or intuition about how the application should look, feel, and behave. As cited in a YouTube video from Y Combinator (YC) titled 'Vibe coding is the future,' a quarter of startups in YC's current cohort have codebases that are almost entirely AI-generated (85% or more). The essence of vibe coding lies in its departure from meticulously reviewing TuringBot LLMs' suggested code line by line. Instead, developers quickly accept the AI-generated code. And if something doesn't work or fails to compile, they simply ask the LLM to regenerate it or fix the errors by prompting them back into the system. This method has gained traction for several reasons, notably the significant improvements in integrated development environments and agent platforms such as Cursor and Windsurf; voice-to-text tools like Superwhisper; and LLMs such as Claude 3.7 Sonnet. These advancements have made AI-generated code more reliable, efficient, and, importantly, more intuitive to use, keeping developers' hands off the keyboard and eyes on the bigger picture. The viral reaction to Karpathy's concept of vibe coding, with close to 4 million instant views and countless developers identifying with the practice, underscores a broader shift in the software development paradigm. This shift aligns with Forrester's insights on TuringBots, which predicted a surge in productivity through AI by 2028. The reality is outpacing expectations, however, with significant impacts occurring much sooner. Vibe coding won't fade away. The advent of vibe coding and the proliferation of TuringBots are creating two distinct types of developers. On one side, developers will transform into product engineers who, while perhaps adept at traditional coding, excel in utilizing generative AI (genAI) tools to produce 'apparently working' software based on domain expertise and some knowledge on the steps and tools needed to build software. These developers focus on the outcome, continuously prompting AI to generate code and assessing its functionality with no understanding of the underlying technology and code. The philosophy is to just keep accepting code until it does what you want. Not only that, but they don't spend hours fixing a bug or finding the problem, since they can ask a well-trained coder TuringBot to do that for them or can just ask it to roll back and regenerate the code again. This approach may challenge our classical view of computer science skills, suggesting a shift toward developers who are more orchestrators of software development process steps than coding craftsmen. The concern of how we'll develop good developers over the years is gone, because you'll trust AI to do a good job. And if you want good developers, genAI will help those on the development trajectory learn faster. On the other side of the spectrum are the high-coding architects. These individuals possess a deep understanding of coding principles and are essential for ensuring that software meets crucial service-level agreements such as security, integration, and performance before deployment. It's kind of what good developers do today. Their role becomes increasingly critical as the reliability and complexity of AI-generated code grows. For only the super-critical IT capabilities, most likely for back-end code, these high-coding capable architects need to write, review, and edit code while also making sure that the TuringBots have all the context they need to do a better job. As AI-generated code becomes more trusted, the barrier to entry for software development lowers, giving rise to a growing population of vibe-coding developers. These individuals use natural language, not as a specification language but as the only interface to generate substantial portions of code and entire applications. As a result, high coding democratizes software development, just as low-code did for businesspeople. As I've always recommended for TuringBots, testing should once more be relaunched as a key validation step. For building a weekend project or a product demo to get funding, vibe coding would work just fine, but it requires more scrutiny for being adopted by enterprises and mature product vendors. In fact, this approach necessitates a reassessment of testing and quality assurance processes for everything that comes out of vibe coding. Organizations must place a greater emphasis on end-to-end functional testing, which, ironically, can also be facilitated by LLMs at the request of the product engineers. In fact, product engineers and/or testers could just ask the LLM to both generate and execute the end-to-end tests for them. Looking at AI-enabled software development through a traditional lens and for enteruprise use highlights significant risks. Is it wise to deploy unreviewed (and, at best, automatically tested) code directly into production? As AI improves, many of these concerns may diminish, but here are some critical considerations: These questions highlight the evolving challenges and opportunities in software development as AI technologies advance. In my view, vibe coding will further reduce the complicated and elaborated SDLC to just 'generate' and 'validate,'. Vibe coding is not just a fad but a signal of the transformative impact that AI is having on software development. As this trend continues to evolve, it will be imperative for enterprises and software vendors to adapt their strategies, recognizing the value of both product engineers and coding architects. This developer duality will be crucial in navigating the future landscape, where the ability to harness AI effectively will distinguish successful software projects. The challenge will be in balancing innovation with the rigor of traditional software development principles, ensuring that the software not only works but that it scales securely, efficiently, and reliably. Platforms will have to quickly move from supporting AppDev to supporting AppGen, which is not a simple exchange of words. This post was written by VP, Principal Analyst Diego Lo Giudice and it originally appeared here.

Vibe coding, the AI shortcut to build software if you don't know programming
Vibe coding, the AI shortcut to build software if you don't know programming

India Today

time28-04-2025

  • India Today

Vibe coding, the AI shortcut to build software if you don't know programming

There's a new way to code that's shaking up the tech world -- and it's called vibe coding. If you're picturing a laid-back coder tapping into some "good vibes" to build an app, you're not far off. But there's a bit more to coined by OpenAI Co-founder Andrej Karpathy in early 2025, vibe coding is all about using AI to generate code based on simple instructions, not old-school line-by-line Karpathy's own words: "I just see stuff, say stuff, run stuff, and copy-paste stuff, and it mostly works." In short? You tell the AI what you want -- and it does the heavy IS VIBE CODING, EXACTLY?Think of vibe coding as giving directions instead of driving the car tell an AI tool what you're trying to build, it writes the code, and you tweak it until it feels right. This flips the idea of coding on its head. Instead of slogging through syntax errors and semicolons, you're steering the creative process -- more director, less coding isn't just a fun experiment. It's opening the door for millions of people who never thought of themselves as "techies" to actually build CODING MATTERS (AND WHY IT'S NOT JUST FOR ENGINEERS)Today, coding is stitched into nearly every industry you can think of -- finance, healthcare, education, gaming, and even India alone, the IT and BPM sector was expected to pull in $253.9 billion in revenue in FY2024, employing a jaw-dropping 5.4 million it's not slowing AI working its way deeper into every sector, understanding how to create, manage, or even guide software projects is becoming as basic as knowing how to use here's the kicker: traditional coding isn't the only way to get there anymore, and vibe coding is a huge reason VIBE CODING IS A BIG DEALOld-school coding takes years to master. Vibe coding takes curiosity and a willingness to tools like GitHub Copilot, Replit Ghostwriter, and ChatGPT's coding assistant now let you describe what you want in plain AI spits out the code. You test it, tweak it, and move Masad, the CEO of Replit, pointed out that around 75% of users on his platform don't write traditional code anymore. Instead, they're using AI to shortcut straight to building things liks apps, websites, prototypes, or whatever sparks their coding, but without the long nights of Stack Overflow YOU GET A JOB IF YOU ONLY KNOW VIBE CODING?Short answer: yes -- but it depends on the coding is fantastic for roles where quick creativity matters more than flawless, scalable prototyping teams, creative tech companies -- these are the spaces where vibe coders will Tan, CEO of Y Combinator, told CNBC in a recent interview that vibe coding is reshaping startups by letting tiny teams of engineers do what used to take massive if you're aiming for hardcore software engineering jobs, cybersecurity roles, or building massive backend systems, vibe coding alone won't cut still need a solid grip on computer science KIDS ARE LEARNING TO 'VIBE CODE'Here's where it gets even more exciting: vibe coding is quietly becoming the way many children learn to like Scratch (from MIT Media Lab) have made coding playful and visual for over 103 million registered drag and drop blocks instead of typing endless lines -- the same spirit vibe coding promotes. Over 123 million projects have been shared on Scratch already, showing just how normal coding is becoming for the next initiatives like Code Club (13,000+ clubs worldwide) and Girls Who Code (reaching 500,000+ girls) are teaching kids not to fear coding — but to play with idea that you have to be a "maths genius" to code is fast becoming a thing of the IS VIBE CODING THE FUTURE?It definitely looks that way -- but with a coding is making tech more accessible than ever, letting more people join the building process without needing a computer science degree or complicated programming traditional coding skills still matter if you want to scale big or build bulletproof the smartest move is to vibe-code your way in -- and then level up as you way, the future of coding is looking a lot more open, and a lot more fun.

You don't need code to be a programmer. But you do need expertise
You don't need code to be a programmer. But you do need expertise

The Guardian

time16-03-2025

  • Entertainment
  • The Guardian

You don't need code to be a programmer. But you do need expertise

Way back in 2023, Andrej Karpathy, an eminent AI guru, made waves with a striking claim that 'the hottest new programming language is English'. This was because the advent of large language models (LLMs) meant that from now on humans would not have to learn arcane programming languages in order to tell computers what to do. Henceforth, they could speak to machines like the Duke of Devonshire spoke to his gardener, and the machines would do their bidding. Ever since LLMs emerged, programmers have been early adopters, using them as unpaid assistants (or 'co-pilots') and finding them useful up to a point – but always with the proviso that, like interns, they make mistakes, and you need to have real programming expertise to spot those. Recently, though, Karpathy stirred the pot by doubling down on his original vision. 'There's a new kind of coding,' he announced, 'I call 'vibe coding', where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. It's possible because the LLMs … are getting too good. 'When I get error messages I just copy [and] paste them in with no comment, usually that fixes it … I'm building a project or web app, but it's not really coding – I just see stuff, say stuff, run stuff, and copy paste stuff, and it mostly works.' Kevin Roose, a noted New York Times tech columnist, seems to have been energised by Karpathy's endorsement of the technology. 'I am not a coder,' he burbled. 'I can't write a single line of Python, JavaScript or C++ … And yet, for the past several months, I've been coding up a storm.' At the centre of this little storm was LunchBox Buddy, an app his AI co-pilot had created that analysed the contents of his fridge and helped him decide what to pack for his son's school lunch. Roose was touchingly delighted with this creation, but Gary Marcus, an AI expert who specialises in raining on AI boosters' parades, was distinctly unimpressed. 'Roose's idea of recipe-from-photo is not original,' he wrote, 'and the code for it already exists; the systems he is using presumably trained on that code. It is seriously negligent that Roose seems not to have even asked that question.' The NYT tech columnist was thrilled by regurgitation, not creativity, Marcus said. As it happens, this wasn't the first time Roose had been unduly impressed by an AI. Way back in February 2023, he confessed to being 'deeply unsettled' by a conversation he'd had with a Microsoft chatbot that had declared its love for him, 'then tried to convince me that I was unhappy in my marriage, and that I should leave my wife and be with it instead'. The poor chap was so rattled that he 'had trouble sleeping afterward' but, alas, does not record what his wife made of it. The trouble with this nonsense is that it diverts us from thinking what an AI-influenced future might really be like. The fact that LLMs display an unexpected talent for 'writing' software provides us with a useful way of assessing artificial intelligence's potential for human augmentation (which, after all, is what technology should be for). From the outset, programmers have been intrigued by the technology and have actively been exploring the possibilities of using the tech as a co-creator of software (the co-pilot model). In the process they have been unearthing the pluses and minuses of such a partnership, and also exploring the ways in which human skills and abilities remain relevant or even essential. We should be paying attention to what they have been learning in that process. A leading light in this area is Simon Willison, an uber-geek who has been thinking and experimenting with LLMs ever since their appearance, and has become an indispensable guide for informed analysis of the technology. He has been working with AI co-pilots for ever, and his website is a mine of insights on what he has learned on the way. His detailed guide to how he uses LLMs to help him write code should be required reading for anyone seeking to use the technology as a way of augmenting their own capabilities. And he regularly comes up with fresh perspectives on some of the tired tropes that litter the discourse about AI at the moment. Why is this relevant? Well, by any standards, programming is an elite trade. It is being directly affected by AI, as many other elite professions will be. But will it make programmers redundant? What we are already learning from software co-pilots suggests that the answer is no. It is simply the end of programming as we knew it. As Tim O'Reilly, the veteran observer of the technology industry, puts it, AI will not replace programmers, but it will transform their jobs. The same is likely to be true of many other elite trades – whether they speak English or not. Bully for you Andrew Sullivan's reflections on Trump's address to both houses of Congress this month. A little too sunnyA fine piece by Andrew Brown on his Substack challenging the 'Whiggish' optimism of celebrated AI guru Dario Amodei. Virginia and the Blooms James Heffernan's sharp essay analysing Woolf's tortured ambivalence about Joyce's Ulysses.

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into the world of global news and events? Download our app today from your preferred app store and start exploring.
app-storeplay-store