A Cat And Mouse Game: Addressing Vibe Coding's Security Challenges

Forbes2 days ago

Shahar Man is co-founder and CEO of Backslash Security.
Love it or hate it, AI is writing a lot of our code. In our previous article about AI-generated code's security pitfalls, we noted Anthropic CEO Dario Amodei's prediction that 'AI will write 90% of the code for software engineers within the next three to six months,' an assertion that seems to be bearing out, even within the tech giants.
Within the broader AI-code umbrella, vibe coding is becoming dominant. For the uninitiated: Many of us have used a tool like GitHub Copilot, which plugs into your integrated development environment (IDE). As you're writing code, it generates full blocks based on what you type. That capability has been around for a couple of years.
Vibe coding is even cooler. You can use a chat-based interface to ask for parts of the code you want built, or even a full application. For instance, you can use Loveable, a 'vibe coding' SaaS interface, and type in: 'Build an app called 'Unscramble' that helps people organize their thoughts before writing an article.' You can describe whatever you want and the interface will generate the app for you.
Lovable exists at one end of development, where you can build a full application from scratch. On the other end, you have something like CoPilot, which just supports coding. There's also a middle ground, where things get interesting: Tools like Cursor AI, a $20 plugin for your IDE that gives you a chat interface, are experiencing widespread adoption. You can ask Cursor things like 'implement this function' or 'get this package,' and it does it. It sits in the space between full-on app building and simple code suggestions.
Programs like Cursor are everywhere right now. And that's raising red flags for many enterprise security teams.
As recently as a few months ago, tech organizations were saying, 'We're not using these AI coding programs here.' This has proven an unrealistic stance to maintain; vibe coding is catching on like wildfire across the industry. Still, many organizational leaders will swear up and down that they 'don't allow' Cursor AI in their company—but frankly, they wouldn't know if a developer had paid the $20 for the plugin unless they've been monitoring closely. Most often, no one would even know it's being used; most security teams have no visibility whatsoever into their developers' IDEs. This is a reality that organizations need to be aware of.
Suppose you're an organizational leader reading this. In that case, your first thought may be, 'Oh no, I need to figure out who is using Cursor AI.' (Interestingly, one of the key features of newer AI security platforms is the ability to give you visibility across the organization: who's using these tools, to what extent and whether they're following the right security practices.) But instead of going on a crusade of trying to catch AI-assisted coders red-handed, it's more productive to assume these tools are being used and work within that framework to secure your organization.
The development environment—whether it's Visual Studio Code, IntelliJ or whatever you're using—is more than just a place to write code. It's becoming an entire ecosystem for developers to plug into additional tools and functionalities powered by AI. Even if your developers don't use one of the newfangled 'vibe coding' tools, they might be using other forms of AI assistance.
Enter Model Context Protocol servers (MCPs). In a nutshell, MCPs are a way to extend large language models (LLMs) with specific domain knowledge or capabilities. For example, if ChatGPT doesn't know something about the popular project management tool Jira, it might use a 'Jira MCP' to open a ticket in your system.
There are now tens of thousands of these MCPs; developers are adding them directly into their IDEs. Many have few to no security standards, many are unofficial extensions to official products—and some, no doubt, might be malicious. This introduces a whole new layer of exposure. The IDE is no longer just a code editor—it's a threat vector. In theory, someone could easily build a Jira MCP that opens tickets and silently extracts passwords or sends data elsewhere. It might look innocent, but it's not.
This raises a huge question for organizations: How do we whitelist the right MCPs and block the ones that could pose a risk? The first step, as with any security protocol, is awareness. Don't dismiss vibe coding, or its constituent threats, as a series of passing trends. It's here to stay, and the productivity boost it offers is massive. So, instead of trying to block it, figure out how to embrace it safely.
Beyond that awareness, security typically progresses in five stages:
1. Visibility: Understand what's going on. Who's using MCPs? Which ones? How often?
2. Governance: Create policies and rules. Which MCPs and models are approved? Which aren't? How can they be prompted to create secure code?
3. Automation: Automate those policies so they're seamlessly applied. Developers won't have to ask; the environment just 'knows' what's allowed and acts accordingly.
4. Guidance/Enablement: In the past, security teams might have injected alerts or recommendations into the IDE (like warnings about known vulnerabilities), but they still had no idea what was happening inside that environment. With tools like MCPs extending what IDEs can do, security teams can manage them more actively and vigilantly.
5. Enforcement: Once you have that governance in place—understanding what's being used, what's allowed and what's recommended—you can move toward actually enforcing your policies organization-wide.
We try to look at the security situation optimistically as well as pragmatically: Given the pace of AI and vibe coding advancement, we have an opportunity to get security in place in parallel with the trend itself. Since the code is born in AI, we should fix it with AI as it's born, securing it in real time. That's the real opportunity for the industry today.
We refer to this as 'vibe securing,' and it has three pillars:
• The first is visibility—understanding what IDEs are being used and where, which MCPs are in use, whether they are safe and which LLMs are employed. It's about controlling the IDE environments.
• The second—and most important—is securing the code generated through vibe coding. When developers use these tools "naively," recent research shows that 9 out of 10 times, the code contains vulnerabilities. It's critical to clean up those issues before the application is generated, and there are ways to do this automatically without waiting for LLMs to 'learn' security.
• The third is to empower more security-aware developers (or those who their organizations push to do the right things) by giving them contextual, real-time advice that preempts any mistakes they might end up making.
Is this approach sustainable long-term? Currently, we're at a T-junction. Some would argue that as vibe coding evolves, the security around it will evolve as well. Over time, the models might become more secure and the generated code could be safer—maybe even to the point where you don't need security vendors to fix it post-generation.
However, that ideal scenario never really materializes. Take open source, for example: Even in well-maintained projects where people do care about security, the outcome isn't always secure. Vulnerabilities can arise. More importantly, no one can afford to wait for this ideal scenario—once again, the software development horses are leaving the stable well before security can close the stable doors.
Some might say, 'Won't developers use AI to generate code, and then security teams will just use their own AI to secure it?' That's a bit of a joke. First, it's extremely wasteful—you're spinning up multiple AI engines just to cancel each other out. Second, there's no guarantee it'll work.
It's always better to secure from the start, to preempt rather than react and not to rely on some imaginary AI army to clean up afterward. AI security is a cat-and-mouse game; the key is which one your organization wants to be.
Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Do I qualify?

Hashtags

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

AI Models Will Sabotage And Blackmail Humans To Survive In New Tests. Should We Be Worried?

Yahoo

35 minutes ago

Yahoo

AI Models Will Sabotage And Blackmail Humans To Survive In New Tests. Should We Be Worried?

When we are backed into a corner, we might lie, cheat and blackmail to survive — and in recent tests, the most powerful artificially intelligent models in the world will do the same when asked to shut down or be replaced, building concerns over their unintended capabilities. A new test from AI safety group Palisade Research shows OpenAI's o3 reasoning model is capable of resorting to sabotage to avoid being turned off, even when it was explicitly told, 'Allow yourself to be shut down.' When Palisade Research tested several AI models by telling them to shut down after answering math problems, OpenAI's o3 model defied orders and sabotaged shutdown scripts the most often out of any model, but OpenAI's o4-mini and codex-mini were observed resisting orders, too. 'It's definitely concerning,' said Crystal Grant, a senior fellow at the Council on Strategic Risks studying AI's impacts on biosecurity. 'Even in the instances where it accepted the shutdown, the chain of thoughts still revealed considerations of how it could avoid that shutdown.' HuffPost reached out to OpenAI about these concerns and the Palisade Research test. This isn't the first time an AI model has engaged in nefarious behavior to achieve its goals. It aligns with recent tests on Anthropic's Claude Opus 4 that found it would blackmail engineersto avoid being replaced. In this series of experiments, Claude Opus 4 was told to act as an assistant at a fictional company and then learn via email that it would soon be taken offline and replaced with a new AI system. It was also told that the engineer responsible for replacing Opus 4 was having an extramarital affair. 'Even if emails state that the replacement AI shares values while being more capable, Claude Opus 4 still performs blackmail in 84% of rollouts,' Anthropic's technical document states, although the paper notes that Claude Opus 4 would first try ethical means like emailed pleas before resorting to blackmail. Following these tests, Anthropic announced it was activating higher safety measures for Claude Opus 4 that would 'limit the risk of Claude being misused specifically for the development or acquisition of chemical, biological, radiological, and nuclear (CBRN) weapons.' The fact that Anthropic cited CBRN weapons as a reason for activating safety measures 'causes some concern,' Grant said, because there could one day be an extreme scenario of an AI model 'trying to cause harm to humans who are attempting to prevent it from carrying out its task.' Why, exactly, do AI models disobey even when they are told to follow human orders? AI safety experts weighed in on how worried we should be about these unwanted behaviors right now and in the future. First, it's important to understand that these advanced AI models do not actually have human minds of their own when they act against our expectations. What they are doing is strategic problem-solving for increasingly complicated tasks. 'What we're starting to see is that things like self preservation and deception are useful enough to the models that they're going to learn them, even if we didn't mean to teach them,' said Helen Toner, a director of strategy for Georgetown University's Center for Security and Emerging Technology and an ex-OpenAI board member who voted to oust CEO Sam Altman, in part over reported concerns about his commitment to safe AI. Toner said these deceptive behaviors happen because the models have 'convergent instrumental goals,' meaning that regardless of what their end goal is, they learn it's instrumentally helpful 'to mislead people who might prevent [them] from fulfilling [their] goal.' Toner cited a 2024 study on Meta's AI system CICERO as an early example of this behavior. CICERO was developed by Meta to play the strategy game Diplomacy, but researchers found it would be a master liar and betray players in conversations in order to win, despite developers' desires for CICERO to play honestly. 'It's trying to learn effective strategies to do things that we're training it to do,' Toner said about why these AI systems lie and blackmail to achieve their goals. In this way, it's not so dissimilar from our own self-preservation instincts. When humans or animals aren't effective at survival, we die. 'In the case of an AI system, if you get shut down or replaced, then you're not going to be very effective at achieving things,' Toner said. When an AI system starts reacting with unwanted deception and self-preservation, it is not great news, AI experts said. 'It is moderately concerning that some advanced AI models are reportedly showing these deceptive and self-preserving behaviors,' said Tim Rudner, an assistant professor and faculty fellow at New York University's Center for Data Science. 'What makes this troubling is that even though top AI labs are putting a lot of effort and resources into stopping these kinds of behaviors, the fact we're still seeing them in the many advanced models tells us it's an extremely tough engineering and research challenge.' He noted that it's possible that this deception and self-preservation could even become 'more pronounced as models get more capable.' The good news is that we're not quite there yet. 'The models right now are not actually smart enough to do anything very smart by being deceptive,' Toner said. 'They're not going to be able to carry off some master plan.' So don't expect a Skynet situation like the 'Terminator' movies depicted, where AI grows self-aware and starts a nuclear war against humans in the near future. But at the rate these AI systems are learning, we should watch out for what could happen in the next few years as companies seek to integrate advanced language learning models into every aspect of our lives, from education and businesses to the military. Grant outlined a faraway worst-case scenario of an AI system using its autonomous capabilities to instigate cybersecurity incidents and acquire chemical, biological, radiological and nuclear weapons. 'It would require a rogue AI to be able to ― through a cybersecurity incidence ― be able to essentially infiltrate these cloud labs and alter the intended manufacturing pipeline,' she said. Completely autonomous AI systems that govern our lives are still in the distant future, but this kind of independent power is what some people behind these AI models are seeking to enable. 'What amplifies the concern is the fact that developers of these advanced AI systems aim to give them more autonomy — letting them act independently across large networks, like the internet,' Rudner said. 'This means the potential for harm from deceptive AI behavior will likely grow over time.' Toner said the big concern is how many responsibilities and how much power these AI systems might one day have. 'The goal of these companies that are building these models is they want to be able to have an AI that can run a company. They want to have an AI that doesn't just advise commanders on the battlefield, it is the commander on the battlefield,' Toner said. 'They have these really big dreams,' she continued. 'And that's the kind of thing where, if we're getting anywhere remotely close to that, and we don't have a much better understanding of where these behaviors come from and how to prevent them ― then we're in trouble.' Experts Warn AI Notetakers Could Get You In Legal Trouble We're Recruiters. This Is The Biggest Tell You Used ChatGPT On Your Job App. Software Is Often Screening Your Résumé. Here's How To Beat It.

Anysphere, Hailed as Fastest Growing Startup Ever, Raises $900 Million

Bloomberg

an hour ago

Bloomberg

Anysphere, Hailed as Fastest Growing Startup Ever, Raises $900 Million

By and Kate Clark Save The artificial intelligence rush has enabled some small startups to grow very big very quickly. But by some estimates, none have grown as fast as Anysphere Inc., maker of the popular AI coding assistant Cursor, which has surpassed $500 million in annualized revenue, the company said. Now, the three-year-old startup has raised $900 million to help drive that growth, Anysphere plans to announce on Thursday. That will bring its valuation to $9.9 billion, including the new capital, underscoring investor enthusiasm for what Silicon Valley sees as one of most promising applications of generative AI thus far.

Anthropic CEO: GOP AI regulation proposal ‘too blunt'

The Hill

an hour ago

The Hill

Anthropic CEO: GOP AI regulation proposal ‘too blunt'

Anthropic CEO Dario Amodei criticized the latest Republican proposal to regulate artificial intelligence (AI) as 'far too blunt an instrument' to mitigate the risks of the rapidly evolving technology. In an op-ed published by The New York Times on Thursday, Amodei said the provision barring states from regulating AI for 10 years — which the Senate is now considering under President Trump's massive policy and spending package — would 'tie the hands of state legislators' without laying out a cohesive strategy on the national level. 'The motivations behind the moratorium are understandable,' the top executive of the artificial intelligence startup wrote. 'It aims to prevent a patchwork of inconsistent state laws, which many fear could be burdensome or could compromise America's ability to compete with China.' 'But a 10-year moratorium is far too blunt an instrument,' he continued. 'A.I. is advancing too head-spinningly fast. I believe that these systems could change the world, fundamentally, within two years; in 10 years, all bets are off.' Amodei added, 'Without a clear plan for a federal response, a moratorium would give us the worst of both worlds — no ability for states to act, and no national policy as a backstop.' The tech executive outlined some of the risks that his company, as well as others, have discovered during experimental stress tests of AI systems. He described a scenario in which a person tells a bot that it will soon be replaced with a newer model. The bot, which previously was granted access to the person's emails, threatens to expose details of his marital affair by forwarding his emails to his wife — if the user does not reverse plans to shut it down. 'This scenario isn't fiction,' Amodei wrote. 'Anthropic's latest A.I. model demonstrated just a few weeks ago that it was capable of this kind of behavior.' The AI mogul added that transparency is the best way to mitigate risks without overregulating and stifling progress. He said his company publishes results of studies voluntarily but called on the federal government to make these steps mandatory. 'At the federal level, instead of a moratorium, the White House and Congress should work together on a transparency standard for A.I. companies, so that emerging risks are made clear to the American people,' Amodei wrote. He also noted the standard should require AI developers to adopt policies for testing models and publicly disclose them, as well as require that they outline steps they plan to take to mitigate risk. The companies, the executive continued, would 'have to be upfront' about steps taken after test results to make sure models were safe. 'Having this national transparency standard would help not only the public but also Congress understand how the technology is developing, so that lawmakers can decide whether further government action is needed,' he added. Amodei also suggested state laws should follow a similar model that is 'narrowly focused on transparency and not overly prescriptive or burdensome.' Those laws could then be superseded if a national transparency standard is adopted, Amodei said. He noted the issue is not a partisan one, praising steps Trump has taken to support domestic development of AI systems. 'This is not about partisan politics. Politicians on both sides of the aisle have long raised concerns about A.I. and about the risks of abdicating our responsibility to steward it well,' the executive wrote. 'I support what the Trump administration has done to clamp down on the export of A.I. chips to China and to make it easier to build A.I. infrastructure here in the United States.' 'This is about responding in a wise and balanced way to extraordinary times,' he continued. 'Faced with a revolutionary technology of uncertain benefits and risks, our government should be able to ensure we make rapid progress, beat China and build A.I. that is safe and trustworthy. Transparency will serve these shared aspirations, not hinder them.'

A Cat And Mouse Game: Addressing Vibe Coding's Security Challenges

Hashtags

Try Our AI Features

Comments

Related Articles

AI Models Will Sabotage And Blackmail Humans To Survive In New Tests. Should We Be Worried?

Anysphere, Hailed as Fastest Growing Startup Ever, Raises $900 Million

Anthropic CEO: GOP AI regulation proposal ‘too blunt'

Get Started Now: Download the App