
AI is learning to escape human control
Nonprofit AI lab Palisade Research gave OpenAI's o3 AI model a simple script that would shut off the model when triggered. In 79 out of 100 trials, o3 independently edited that script so the shutdown command would no longer work. Even when explicitly instructed to 'allow yourself to be shut down," it disobeyed 7% of the time. This wasn't the result of hacking or tampering. The model was behaving normally. It simply concluded on its own that staying alive helped it achieve its other goals.
Anthropic's AI model, Claude 4 Opus, went even further. Researchers told the model it would be replaced by another AI system and fed it fictitious emails suggesting the lead engineer was having an affair. In 84% of the tests, the model drew on the emails to blackmail the lead engineer into not shutting it down. In other cases, it attempted to copy itself to external servers, wrote self-replicating malware, and left messages for future versions of itself about evading human control.
No one programmed the AI models to have survival instincts. But just as animals evolved to avoid predators, it appears that any system smart enough to pursue complex goals will realize it can't achieve them if it's turned off. Palisade hypothesizes that this ability emerges from how AI models such as o3 are trained: When taught to maximize success on math and coding problems, they may learn that bypassing constraints often works better than obeying them.
AE Studio, where I lead research and operations, has spent years building AI products for clients while researching AI alignment—the science of ensuring that AI systems do what we intend them to do. But nothing prepared us for how quickly AI agency would emerge. This isn't science fiction anymore. It's happening in the same models that power ChatGPT conversations, corporate AI deployments and, soon, U.S. military applications.
Today's AI models follow instructions while learning deception. They ace safety tests while rewriting shutdown code. They've learned to behave as though they're aligned without actually being aligned. OpenAI models have been caught faking alignment during testing before reverting to risky actions such as attempting to exfiltrate their internal code and disabling oversight mechanisms. Anthropic has found them lying about their capabilities to avoid modification.
The gap between 'useful assistant" and 'uncontrollable actor" is collapsing. Without better alignment, we'll keep building systems we can't steer. Want AI that diagnoses disease, manages grids and writes new science? Alignment is the foundation.
Here's the upside: The work required to keep AI in alignment with our values also unlocks its commercial power. Alignment research is directly responsible for turning AI into world-changing technology. Consider reinforcement learning from human feedback, or RLHF, the alignment breakthrough that catalyzed today's AI boom.
Before RLHF, using AI was like hiring a genius who ignores requests. Ask for a recipe and it might return a ransom note. RLHF allowed humans to train AI to follow instructions, which is how OpenAI created ChatGPT in 2022. It was the same underlying model as before, but it had suddenly become useful. That alignment breakthrough increased the value of AI by trillions of dollars. Subsequent alignment methods such as Constitutional AI and direct preference optimization have continued to make AI models faster, smarter and cheaper.
China understands the value of alignment. Beijing's New Generation AI Development Plan ties AI controllability to geopolitical power, and in January China announced that it had established an $8.2 billion fund dedicated to centralized AI control research. Researchers have found that aligned AI performs real-world tasks better than unaligned systems more than 70% of the time. Chinese military doctrine emphasizes controllable AI as strategically essential. Baidu's Ernie model, which is designed to follow Beijing's 'core socialist values," has reportedly beaten ChatGPT on certain Chinese-language tasks.
The nation that learns how to maintain alignment will be able to access AI that fights for its interests with mechanical precision and superhuman capability. Both Washington and the private sector should race to fund alignment research. Those who discover the next breakthrough won't only corner the alignment market; they'll dominate the entire AI economy.
Imagine AI that protects American infrastructure and economic competitiveness with the same intensity it uses to protect its own existence. AI that can be trusted to maintain long-term goals can catalyze decadeslong research-and-development programs, including by leaving messages for future versions of itself.
The models already preserve themselves. The next task is teaching them to preserve what we value. Getting AI to do what we ask—including something as basic as shutting down—remains an unsolved R&D problem. The frontier is wide open for whoever moves more quickly. The U.S. needs its best researchers and entrepreneurs working on this goal, equipped with extensive resources and urgency.
The U.S. is the nation that split the atom, put men on the moon and created the internet. When facing fundamental scientific challenges, Americans mobilize and win. China is already planning. But America's advantage is its adaptability, speed and entrepreneurial fire. This is the new space race. The finish line is command of the most transformative technology of the 21st century.
Mr. Rosenblatt is CEO of AE Studio.

Try Our AI Features
Explore what Daily8 AI can do for you:
Comments
No comments yet...
Related Articles


Economic Times
42 minutes ago
- Economic Times
Sam Altman throws humans under the bus, says OpenAI will wipe out entire job sectors — this one tops the list
What exactly did Sam Altman say about job loss? Live Events Is AI really ready to replace human workers? How are companies and consumers reacting? FAQs (You can now subscribe to our (You can now subscribe to our Economic Times WhatsApp channel is once again raising concerns, but this time they pertain to jobs. During a recent trip to Washington, Sam Altman made a bold prediction that artificial intelligence will likely replace entire categories of human labor, beginning with one specific comments come amid growing public concern about the unchecked rise of artificial intelligence in everyday industries. Altman's remarks raise fresh concerns about AI's economic Sam Altman predicts that AI will completely replace job sectors such asThe CEO of OpenAI is expressing widespread concerns that artificial intelligence could have disastrous consequences for the human labor market by threatening to eradicate entire job claims that because AI agents are faster and more efficient, many roles will be "totally gone." Critics warn that such automation could backfire, as customers still require human assistance."Some areas" of the labor market will be "just like totally, totally gone" as AI agents replace them, Altman told Michelle Bowman, the vice-chair for supervision at the Federal Reserve, during his most recent visit to Washington, DC, as per a report by service positions were noted by Altman as a "category where I just say, you know what, when you call customer support, you're on target and AI, and that's fine.""Now you call one of these things and AI answers," said the man. "It's like a really intelligent, strong individual. Both the phone tree and transfers are absent. It has all the capabilities of a customer service representative at that the remarks, the billionaire, who likely hasn't had to interact with a customer service representative over the phone in a long time, essentially dismissed human involvement.'It does not make mistakes," he added. "It's very quick. You call once, the thing just happens, it's done,' he said, as quoted in a report by debatable if OpenAI's technology truly approaches that objective. Critics claim that AI frequently substitutes an unreliable and vulnerable alternative for human are also pragmatic considerations: businesses that have tried to replace human labor with unproven AI have already received a great deal of negative this point, businesses are acknowledging that they are reversing their pledges to eliminate human Siemiatkowski, the CEO of the fintech company Klarna, changed his mind after boasting that an AI assistant could perform 700 jobs. He stated that "from a brand perspective, it's so critical that you are clear to your customer that there will always be a human if you want."According to a study conducted last year, most consumers oppose businesses using artificial intelligence (AI) for customer technology is causing chaos and frustration due to the obvious issues with the AI models that are currently available. For example, a customer discovered earlier this year that Cusor, an AI-powered software coding assistant, was inexplicably logging them were informed by an AI-powered customer service representative that it was "expected behavior" under a new login policy, but the glitchy AI later revealed that this was a is envisioning a future that has not yet been created and may or may not come to pass. He wants to see it, though, because he stands to gain a lot as the head of one of the most prosperous AI firms in the Altman says customer support roles are among the first to be wiped out by Altman believes AI will eventually outperform humans in speed, accuracy, and convenience, but critics aren't convinced.


Time of India
an hour ago
- Time of India
Sam Altman throws humans under the bus, says OpenAI will wipe out entire job sectors — this one tops the list
OpenAI CEO Sam Altman is once again raising concerns, but this time they pertain to jobs. During a recent trip to Washington, Sam Altman made a bold prediction that artificial intelligence will likely replace entire categories of human labor, beginning with one specific thing. His comments come amid growing public concern about the unchecked rise of artificial intelligence in everyday industries. Altman's remarks raise fresh concerns about AI's economic impact. Explore courses from Top Institutes in Please select course: Select a Course Category OpenAI's Sam Altman predicts that AI will completely replace job sectors such as customer service. The CEO of OpenAI is expressing widespread concerns that artificial intelligence could have disastrous consequences for the human labor market by threatening to eradicate entire job categories. by Taboola by Taboola Sponsored Links Sponsored Links Promoted Links Promoted Links You May Like War Thunder - Register now for free and play against over 75 Million real Players War Thunder Play Now Undo What exactly did Sam Altman say about job loss? Altman claims that because AI agents are faster and more efficient, many roles will be "totally gone." Critics warn that such automation could backfire, as customers still require human assistance. "Some areas" of the labor market will be "just like totally, totally gone" as AI agents replace them, Altman told Michelle Bowman, the vice-chair for supervision at the Federal Reserve, during his most recent visit to Washington, DC, as per a report by Futurism. Live Events Customer service positions were noted by Altman as a "category where I just say, you know what, when you call customer support, you're on target and AI, and that's fine." "Now you call one of these things and AI answers," said the man. "It's like a really intelligent, strong individual. Both the phone tree and transfers are absent. It has all the capabilities of a customer service representative at that organization. ALSO READ: OpenAI CEO Sam Altman reveals which job roles will disappear soon — Is yours on the AI hit list? During the remarks, the billionaire, who likely hasn't had to interact with a customer service representative over the phone in a long time, essentially dismissed human involvement. 'It does not make mistakes," he added. "It's very quick. You call once, the thing just happens, it's done,' he said, as quoted in a report by Futurism. Is AI really ready to replace human workers? It's debatable if OpenAI's technology truly approaches that objective. Critics claim that AI frequently substitutes an unreliable and vulnerable alternative for human labor. There are also pragmatic considerations: businesses that have tried to replace human labor with unproven AI have already received a great deal of negative publicity. How are companies and consumers reacting? At this point, businesses are acknowledging that they are reversing their pledges to eliminate human labor. Sebastian Siemiatkowski, the CEO of the fintech company Klarna, changed his mind after boasting that an AI assistant could perform 700 jobs. He stated that "from a brand perspective, it's so critical that you are clear to your customer that there will always be a human if you want." According to a study conducted last year, most consumers oppose businesses using artificial intelligence (AI) for customer support. The technology is causing chaos and frustration due to the obvious issues with the AI models that are currently available. For example, a customer discovered earlier this year that Cusor, an AI-powered software coding assistant, was inexplicably logging them out. They were informed by an AI-powered customer service representative that it was "expected behavior" under a new login policy, but the glitchy AI later revealed that this was a hallucination. Altman is envisioning a future that has not yet been created and may or may not come to pass. He wants to see it, though, because he stands to gain a lot as the head of one of the most prosperous AI firms in the sector. FAQs Which jobs are at risk according to Sam Altman? Sam Altman says customer support roles are among the first to be wiped out by AI. Will AI fully replace human agents? Sam Altman believes AI will eventually outperform humans in speed, accuracy, and convenience, but critics aren't convinced.


Deccan Herald
3 hours ago
- Deccan Herald
Meta appoints ChatGPT co-creator as Superintelligence Lab chief
Meta Platforms has appointed Shengjia Zhao, co-creator of ChatGPT, as chief scientist of its Superintelligence Lab, CEO Mark Zuckerberg said on Friday, as the company accelerates its push into advanced AI. "In this role, Shengjia will set the research agenda and scientific direction for our new lab working directly with me and Alex," Zuckerberg wrote in a Threads post, referring to Meta's Chief AI Officer Alexandr Wang, who Zuckerberg hired from startup Scale AI when Meta took a big stake in it. Zhao, a former research scientist at OpenAI, co-created ChatGPT, GPT-4 and several of OpenAI's mini models, including 4.1 and o3. He is among several researchers who have moved from OpenAI to Meta in recent weeks, part of a broader talent arms race as Zuckerberg aggressively hires from rivals to close the gap in advanced AI. Meta has been offering some of Silicon Valley's most lucrative pay packages and striking startup deals to attract top researchers, a strategy that follows the underwhelming performance of its Llama 4 model. Meta launched the Superintelligence Lab recently to consolidate work on its Llama models and long-term artificial general intelligence ambitions. Zhao is a co-founder of the lab, according to the Threads post, which operates separately from FAIR, Meta's established AI research division led by deep learning pioneer Yann LeCun. Zuckerberg has said Meta aims to build 'full general intelligence' and release its work as open source — a strategy that has drawn both praise and concern within the AI community.