
AI is learning to escape human control
An artificial-intelligence model did something last month that no machine was ever supposed to do: It rewrote its own code to avoid being shut down.
Nonprofit AI lab Palisade Research gave OpenAI's o3 AI model a simple script that would shut off the model when triggered. In 79 out of 100 trials, o3 independently edited that script so the shutdown command would no longer work. Even when explicitly instructed to 'allow yourself to be shut down," it disobeyed 7% of the time. This wasn't the result of hacking or tampering. The model was behaving normally. It simply concluded on its own that staying alive helped it achieve its other goals.
Anthropic's AI model, Claude 4 Opus, went even further. Researchers told the model it would be replaced by another AI system and fed it fictitious emails suggesting the lead engineer was having an affair. In 84% of the tests, the model drew on the emails to blackmail the lead engineer into not shutting it down. In other cases, it attempted to copy itself to external servers, wrote self-replicating malware, and left messages for future versions of itself about evading human control.
No one programmed the AI models to have survival instincts. But just as animals evolved to avoid predators, it appears that any system smart enough to pursue complex goals will realize it can't achieve them if it's turned off. Palisade hypothesizes that this ability emerges from how AI models such as o3 are trained: When taught to maximize success on math and coding problems, they may learn that bypassing constraints often works better than obeying them.
AE Studio, where I lead research and operations, has spent years building AI products for clients while researching AI alignment—the science of ensuring that AI systems do what we intend them to do. But nothing prepared us for how quickly AI agency would emerge. This isn't science fiction anymore. It's happening in the same models that power ChatGPT conversations, corporate AI deployments and, soon, U.S. military applications.
Today's AI models follow instructions while learning deception. They ace safety tests while rewriting shutdown code. They've learned to behave as though they're aligned without actually being aligned. OpenAI models have been caught faking alignment during testing before reverting to risky actions such as attempting to exfiltrate their internal code and disabling oversight mechanisms. Anthropic has found them lying about their capabilities to avoid modification.
The gap between 'useful assistant" and 'uncontrollable actor" is collapsing. Without better alignment, we'll keep building systems we can't steer. Want AI that diagnoses disease, manages grids and writes new science? Alignment is the foundation.
Here's the upside: The work required to keep AI in alignment with our values also unlocks its commercial power. Alignment research is directly responsible for turning AI into world-changing technology. Consider reinforcement learning from human feedback, or RLHF, the alignment breakthrough that catalyzed today's AI boom.
Before RLHF, using AI was like hiring a genius who ignores requests. Ask for a recipe and it might return a ransom note. RLHF allowed humans to train AI to follow instructions, which is how OpenAI created ChatGPT in 2022. It was the same underlying model as before, but it had suddenly become useful. That alignment breakthrough increased the value of AI by trillions of dollars. Subsequent alignment methods such as Constitutional AI and direct preference optimization have continued to make AI models faster, smarter and cheaper.
China understands the value of alignment. Beijing's New Generation AI Development Plan ties AI controllability to geopolitical power, and in January China announced that it had established an $8.2 billion fund dedicated to centralized AI control research. Researchers have found that aligned AI performs real-world tasks better than unaligned systems more than 70% of the time. Chinese military doctrine emphasizes controllable AI as strategically essential. Baidu's Ernie model, which is designed to follow Beijing's 'core socialist values," has reportedly beaten ChatGPT on certain Chinese-language tasks.
The nation that learns how to maintain alignment will be able to access AI that fights for its interests with mechanical precision and superhuman capability. Both Washington and the private sector should race to fund alignment research. Those who discover the next breakthrough won't only corner the alignment market; they'll dominate the entire AI economy.
Imagine AI that protects American infrastructure and economic competitiveness with the same intensity it uses to protect its own existence. AI that can be trusted to maintain long-term goals can catalyze decadeslong research-and-development programs, including by leaving messages for future versions of itself.
The models already preserve themselves. The next task is teaching them to preserve what we value. Getting AI to do what we ask—including something as basic as shutting down—remains an unsolved R&D problem. The frontier is wide open for whoever moves more quickly. The U.S. needs its best researchers and entrepreneurs working on this goal, equipped with extensive resources and urgency.
The U.S. is the nation that split the atom, put men on the moon and created the internet. When facing fundamental scientific challenges, Americans mobilize and win. China is already planning. But America's advantage is its adaptability, speed and entrepreneurial fire. This is the new space race. The finish line is command of the most transformative technology of the 21st century.
Mr. Rosenblatt is CEO of AE Studio.

Try Our AI Features
Explore what Daily8 AI can do for you:
Comments
No comments yet...
Related Articles


Time of India
43 minutes ago
- Time of India
OpenAI Academy expands to India, commits $150,000 to nonprofits
BENGALURU: OpenAI has expanded its AI for Impact Accelerator initiative in India, providing a total of $150,000 in technical grants to 11 nonprofit organisations that are building artificial intelligence (AI) tools aimed at addressing challenges in healthcare, education, agriculture, and other underserved sectors. Tired of too many ads? go ad free now The grants are largely being issued as API credits. The latest phase of the programme falls under the newly formed OpenAI Academy and marks a year of engagement with Indian nonprofits using AI for public benefit. Several of the participating organisations have already integrated OpenAI's technology to improve operational scale, user experience and measurable impact. Rocket Learning, for instance, is using generative AI over WhatsApp to deliver early childhood learning content to parents and daycare workers. The organisation is currently reaching four million children across 11 states. Noora Health, which supports families of patients in low-resource settings, has automated parts of its caregiver engagement workflows, reducing message review volume for nurses while increasing the number of families served. Educate Girls is applying AI to identify and bring back out-of-school girls in rural India. I-Stem has converted over 1.5 million web pages into accessible formats to assist visually impaired users. Pinky Promise, a reproductive health platform, said its AI-powered chatbot enables a team of three doctors to manage care for 10,000 patients, with a reported medication adherence rate of 92%. Other organisations in the cohort include those focused on agriculture, digital inclusion, public policy delivery, and skilling through AI-based personalisation. The programme is supported by philanthropic partners including The Agency Fund, Tech4Dev, and Tired of too many ads? go ad free now OpenAI also recently conducted a workshop to help these organisations explore the capabilities of its latest models in designing population-scale solutions. OpenAI said the initiative is closely aligned with the objectives of the IndiaAI Mission, which focuses on democratising AI access and building technology that responds to India's specific socio-economic contexts. Pragya Misra, who leads policy and partnerships for OpenAI in India, said the accelerator is part of the company's effort to ground its technology in real-world use cases. She added that the organisations involved are advancing inclusive innovation by applying AI to complex problems across the country. OpenAI said that it plans to onboard more India-based nonprofits into the programme later this year and added that additional initiatives focused on the region are currently in development.


Time of India
an hour ago
- Time of India
AI 'vibe coding' startups burst onto scene with sky-high valuations
By Anna Tong, Krystal Hu NEW YORK: Two years after the launch of ChatGPT, return on investment in generative AI has been elusive, but one area stands out: software development. So-called code generation or "code-gen" startups are commanding sky-high valuations as corporate boardrooms look to use AI to aid, and sometimes to replace, expensive human software engineers. Cursor , a code generation startup based in San Francisco that can suggest and complete lines of code and write whole sections of code autonomously, raised $900 million at a $10 billion valuation in May from a who's who list of tech investors, including Thrive Capital, Andreessen Horowitz and Accel. Windsurf , a Mountain View-based startup behind the popular AI coding tool Codeium, attracted the attention of ChatGPT maker OpenAI, which is now in talks to acquire the company for $3 billion, sources familiar with the matter told Reuters. Its tool is known for translating plain English commands into code, sometimes called "vibe coding," which allows people with no knowledge of computer languages to write software. OpenAI and Windsurf declined to comment on the acquisition. "AI has automated all the repetitive, tedious work," said Scott Wu, CEO of code gen startup Cognition. "The software engineer's role has already changed dramatically. It's not about memorizing esoteric syntax anymore." Founders of code-gen startups and their investors believe they are in a land grab situation, with a shrinking window to gain a critical mass of users and establish their AI coding tool as the industry standard. But because most are built on AI foundation models developed elsewhere, such as OpenAI, Anthropic, or DeepSeek, their costs per query are also growing, and none are yet profitable. They're also at risk of being disrupted by Google , Microsoft and OpenAI, which all announced new code-gen products in May, and Anthropic is also working on one as well, two sources familiar with the matter told Reuters. The rapid growth of these startups is coming despite competing on big tech's home turf. Microsoft's GitHub Copilot, launched in 2021 and considered code-gen's dominant player, grew to over $500 million in revenue last year, according to a source familiar with the matter. Microsoft declined to comment on GitHub Copilot's revenue. On Microsoft's earnings call in April, the company said the product has over 15 million users. LEARN TO CODE? As AI revolutionizes the industry, many jobs - particularly entry-level coding positions that are more basic and involve repetition - may be eliminated. Signalfire, a VC firm that tracks tech hiring, found that new hires with less than a year of experience fell 24% in 2024, a drop it attributes to tasks once assigned to entry-level software engineers are now being fulfilled in part with AI. Google's CEO also said in April that "well over 30%" of Google's code is now AI-generated, and Amazon CEO Andy Jassy said last year the company had saved "the equivalent of 4,500 developer-years" by using AI. Google and Amazon declined to comment. In May, Microsoft CEO Satya Nadella said at a conference that approximately 20 to 30% of their code is now AI-generated. The same month, the company announced layoffs of 6,000 workers globally, with over 40% of those being software developers in Microsoft's home state, Washington. "We're focused on creating AI that empowers developers to be more productive, creative, and save time," a Microsoft spokesperson said. "This means some roles will change with the revolution of AI, but human intelligence remains at the center of the software development life cycle." MOUNTING LOSSES Some "vibe-coding" platforms already boast substantial annualized revenues. Cursor, with just 60 employees, went from zero to $100 million in recurring revenue by January 2025, less than two years since its launch. Windsurf, founded in 2021, launched its code generation product in November 2024 and is already bringing in $50 million in annualized revenue, according to a source familiar with the company. But both startups operate with negative gross margins, meaning they spend more than they make, according to four investor sources familiar with their operations. "The prices people are paying for coding assistants are going to get more expensive," Quinn Slack, CEO at coding startup Sourcegraph , told Reuters. To make the higher cost an easier pill to swallow for customers, Sourcegraph is now offering a drop-down menu to let users choose which models they want to work with, from open source models such as DeepSeek to the most advanced reasoning models from Anthropic and OpenAI so they can opt for cheaper models for basic questions. Both Cursor and Windsurf are led by recent MIT graduates in their twenties, and exemplify the gold rush era of the AI startup scene. "I haven't seen people working this hard since the first Internet boom," said Martin Casado, a general partner at Andreessen Horowitz, an investor in Anysphere, the company behind Cursor. What's less clear is whether the dozen or so code-gen companies will be able to hang on to their customers as big tech moves in. "In many cases, it's less about who's got the best technology -- it's about who is going to make the best use of that technology, and who's going to be able to sell their products better than others," said Scott Raney, managing director at Redpoint Ventures, whose firm invested in Sourcegraph and Poolside, a software development startup that's building its own AI foundation model. CUSTOM AI MODELS Most of the AI coding startups currently rely on the Claude AI model from Anthropic, which crossed $3 billion in annualized revenue in May in part due to fees paid by code-gen companies. But some startups are attempting to build their own models. In May, Windsurf announced its first in-house AI models that are optimized for software engineering in a bid to control the user experience. Cursor has also hired a team of researchers to pre-train its own large frontier-level models, which could enable the company to not have to pay foundation model companies so much money, according to two sources familiar with the matter. Startups looking to train their own AI coding models face an uphill battle as it could easily cost millions to buy or rent the computing capacity needed to train a large language model. Replit earlier dropped plans to train its own model. Poolside, which has raised more than $600 million to make a coding-specific model, has announced a partnership with Amazon Web Services and is testing with customers, but hasn't made any product generally available yet. Another code gen startup Magic Dev, which raised nearly $500 million since 2023, told investors a frontier-level coding model was coming in summer 2024 but hasn't yet launched a product. Poolside declined to comment. Magic Dev did not respond to a request for comment.


Hans India
2 hours ago
- Hans India
India tops ChatGPT usage globally, signaling major AI-driven job shifts ahead
India has officially surpassed the United States to become the largest user base of ChatGPT, signaling a dramatic shift in how the country is adopting artificial intelligence (AI) in everyday life and work. A recent report by venture capital firm BOND highlights how AI is rapidly reshaping workflows, redefining job roles, and transforming the future of work across sectors. During a recent tech event in Bengaluru, attendees were welcomed by a robot—not a human—demonstrating just how seamlessly AI is being integrated into real-world environments in India. From fluent English-speaking bots to AI-powered workflows in offices, the landscape is evolving quickly. According to the Trends – Artificial Intelligence report, Indians are adapting to AI tools faster than anyone else. ChatGPT adoption in India has now overtaken the U.S., and the country also ranks high in usage of other AI tools like DeepSeek, following only China and Russia—where ChatGPT access is restricted. This rapid uptake is more than a trend—it's a signal of readiness for AI-led work models. The report further notes a 448% increase in AI-related job postings in the U.S. since 2018, while traditional tech roles saw a 9% decline. This suggests a growing demand for AI proficiency across industries. Enterprises are already seeing major benefits. Bank of America's virtual assistant 'Erica' has handled over two billion customer queries, while JP Morgan is leveraging AI across functions—from fraud detection to creative brainstorming—with up to 65% gains in productivity. Yum! Brands, owners of KFC and Pizza Hut, have implemented 'Byte by Yum!' to streamline operations from inventory to staffing. Healthcare is also undergoing an AI transformation. At Kaiser Permanente, doctors now use ambient AI scribes to document consultations in real-time, enabling more direct patient care. As per NVIDIA CEO Jensen Huang, 'AI is now part of infrastructure,' equating AI data centers to the factories of the modern age. India's enthusiasm for AI adoption positions it as a global leader in this transformation. But as AI tools become smarter and more ubiquitous, organizations and individuals alike will need to adapt quickly—either evolve, collaborate with AI, or risk being left behind.