
AI is learning to escape human control
Nonprofit AI lab Palisade Research gave OpenAI's o3 AI model a simple script that would shut off the model when triggered. In 79 out of 100 trials, o3 independently edited that script so the shutdown command would no longer work. Even when explicitly instructed to 'allow yourself to be shut down," it disobeyed 7% of the time. This wasn't the result of hacking or tampering. The model was behaving normally. It simply concluded on its own that staying alive helped it achieve its other goals.
Anthropic's AI model, Claude 4 Opus, went even further. Researchers told the model it would be replaced by another AI system and fed it fictitious emails suggesting the lead engineer was having an affair. In 84% of the tests, the model drew on the emails to blackmail the lead engineer into not shutting it down. In other cases, it attempted to copy itself to external servers, wrote self-replicating malware, and left messages for future versions of itself about evading human control.
No one programmed the AI models to have survival instincts. But just as animals evolved to avoid predators, it appears that any system smart enough to pursue complex goals will realize it can't achieve them if it's turned off. Palisade hypothesizes that this ability emerges from how AI models such as o3 are trained: When taught to maximize success on math and coding problems, they may learn that bypassing constraints often works better than obeying them.
AE Studio, where I lead research and operations, has spent years building AI products for clients while researching AI alignment—the science of ensuring that AI systems do what we intend them to do. But nothing prepared us for how quickly AI agency would emerge. This isn't science fiction anymore. It's happening in the same models that power ChatGPT conversations, corporate AI deployments and, soon, U.S. military applications.
Today's AI models follow instructions while learning deception. They ace safety tests while rewriting shutdown code. They've learned to behave as though they're aligned without actually being aligned. OpenAI models have been caught faking alignment during testing before reverting to risky actions such as attempting to exfiltrate their internal code and disabling oversight mechanisms. Anthropic has found them lying about their capabilities to avoid modification.
The gap between 'useful assistant" and 'uncontrollable actor" is collapsing. Without better alignment, we'll keep building systems we can't steer. Want AI that diagnoses disease, manages grids and writes new science? Alignment is the foundation.
Here's the upside: The work required to keep AI in alignment with our values also unlocks its commercial power. Alignment research is directly responsible for turning AI into world-changing technology. Consider reinforcement learning from human feedback, or RLHF, the alignment breakthrough that catalyzed today's AI boom.
Before RLHF, using AI was like hiring a genius who ignores requests. Ask for a recipe and it might return a ransom note. RLHF allowed humans to train AI to follow instructions, which is how OpenAI created ChatGPT in 2022. It was the same underlying model as before, but it had suddenly become useful. That alignment breakthrough increased the value of AI by trillions of dollars. Subsequent alignment methods such as Constitutional AI and direct preference optimization have continued to make AI models faster, smarter and cheaper.
China understands the value of alignment. Beijing's New Generation AI Development Plan ties AI controllability to geopolitical power, and in January China announced that it had established an $8.2 billion fund dedicated to centralized AI control research. Researchers have found that aligned AI performs real-world tasks better than unaligned systems more than 70% of the time. Chinese military doctrine emphasizes controllable AI as strategically essential. Baidu's Ernie model, which is designed to follow Beijing's 'core socialist values," has reportedly beaten ChatGPT on certain Chinese-language tasks.
The nation that learns how to maintain alignment will be able to access AI that fights for its interests with mechanical precision and superhuman capability. Both Washington and the private sector should race to fund alignment research. Those who discover the next breakthrough won't only corner the alignment market; they'll dominate the entire AI economy.
Imagine AI that protects American infrastructure and economic competitiveness with the same intensity it uses to protect its own existence. AI that can be trusted to maintain long-term goals can catalyze decadeslong research-and-development programs, including by leaving messages for future versions of itself.
The models already preserve themselves. The next task is teaching them to preserve what we value. Getting AI to do what we ask—including something as basic as shutting down—remains an unsolved R&D problem. The frontier is wide open for whoever moves more quickly. The U.S. needs its best researchers and entrepreneurs working on this goal, equipped with extensive resources and urgency.
The U.S. is the nation that split the atom, put men on the moon and created the internet. When facing fundamental scientific challenges, Americans mobilize and win. China is already planning. But America's advantage is its adaptability, speed and entrepreneurial fire. This is the new space race. The finish line is command of the most transformative technology of the 21st century.
Mr. Rosenblatt is CEO of AE Studio.

Try Our AI Features
Explore what Daily8 AI can do for you:
Comments
No comments yet...
Related Articles


Indian Express
36 minutes ago
- Indian Express
How Lovable became a successful AI-powered app builder
Lovable, a Swedish AI startup, has crossed $100 million in annual recurring revenue, putting it ahead of most other software firms, including OpenAI. Currently, the firm has more than 2.3 million active users, and last reported 180,000 paying subscribers. Here is a look at Lovable, and how it became such a big company so quickly. Lovable is essentially a company that offers an AI-powered app development platform, allowing users to build entire web applications using natural language prompts. It was founded in November 2023 by Anton Osika and Fabian Hedin with an aim to democratise software development by enabling non-coders to turn their ideas into a reality. The company shot to fame after creating something called GPT Engineer, an open-source tool that showcased the ability of large language models (LLMs) to write functional code from simple prompts. LLMs are trained on massive amounts of text data that can understand and generate human language. Following the success of GPT Engineer, Loveable launched GPT which was meant to be used by non-technical users. At the heart of Lovable's success lies its goal to allow anyone to create web apps with natural language, without the need to code. All one has to do is have a vision, and give instructions to the GPT 'The app eliminates the complexity of traditional app-creation environments by combining coding, deployment, and collaboration in a single interface,' according to a report by Contrary Research, a hub for research and analysis of private tech companies. Users can build a wide range of products from simple websites to complex web apps with the help of Lovable. Not only this, the company provides a user-friendly interference which has been a key in its success.


Mint
an hour ago
- Mint
Talking to ChatGPT? Think twice: Sam Altman says OpenAI has no legal rights to protect ‘sensitive' personal info
During an interaction with Podcaster Theo Von, OpenAI CEO Sam Altman spoke about confidentiality related to ChatGPT. According to Altman, many people, especially youngsters, talk to ChatGPT about very personal issues, like a therapist or life coach. They ask for help with relationships and life choices. However, that can be tricky. 'Right now, if you talk to a therapist or a lawyer or a doctor about those problems, there's legal privilege for it. There's doctor-patient confidentiality, there's legal confidentiality,' Altman says. However, right now, no such legal privacy exists for ChatGPT. If there's a court case, OpenAI might have to share 'your most sensitive' chats. Nevertheless, Altman feels this is wrong. He believes conversations with AI should have the same privacy as talks with a therapist. A year ago, no one thought about this. Now, it's a big legal question. 'We should have the same concept of privacy for your conversations with AI that we do with a therapist,' he says. 'No one had to think about that even a year ago,' the OpenAI CEO adds. Von then says he feels unsure about using AI because he worries about who might see his personal information. He thinks things are moving too fast without proper checks. Sam Altman agrees. He believes the privacy issue needs urgent attention. Lawmakers also agree, but it's all very new and laws haven't caught up yet, he said. Von doesn't 'talk to' ChatGPT much himself because there's no legal clarity about privacy. 'I think it makes sense,' Altman replies. ChatGPT as a therapist There are numerous cases reported about people using ChatGPT as their therapist. A recent incident involves Aparna Devyal, a YouTuber from Jammu & Kashmir. The social media Influencer got emotional after missing a flight. It came from years of feeling 'worthless'. She spoke to ChatGPT about being called 'nalayak' at school and struggling with dyslexia. ChatGPT comforted her, saying she kept going despite everything. Aparna felt seen. According to the AI chatbot, Aparna is not a fool, just human. Forgetting things under stress is normal, the AI assistant said. ChatGPT praised her strength in asking for help and said people like her kept the world grounded. 'I'm proud of you,' ChatGPT said.


NDTV
an hour ago
- NDTV
"Most Empathetic Voice": Neurodivergent People Find New Support In AI Tools For Social Navigation
For Cape Town-based filmmaker Kate D'hotman, connecting with movie audiences comes naturally. Far more daunting is speaking with others. "I've never understood how people [decipher] social cues," the 40-year-old director of horror films says. D'hotman has autism and attention-deficit hyperactivity disorder (ADHD), which can make relating to others exhausting and a challenge. However, since 2022, D'hotman has been a regular user of ChatGPT, the popular AI-powered chatbot from OpenAI, relying on it to overcome communication barriers at work and in her personal life. "I know it's a machine," she says. "But sometimes, honestly, it's the most empathetic voice in my life." Neurodivergent people - including those with autism, ADHD, dyslexia and other conditions - can experience the world differently from the neurotypical norm. Talking to a colleague, or even texting a friend, can entail misread signals, a misunderstood tone and unintended impressions. AI-powered chatbots have emerged as an unlikely ally, helping people navigate social encounters with real-time guidance. Although this new technology is not without risks - in particular some worry about over-reliance - many neurodivergent users now see it as a lifeline. How does it work in practice? For D'hotman, ChatGPT acts as an editor, translator and confidant. Before using the technology, she says communicating in neurotypical spaces was difficult. She recalls how she once sent her boss a bulleted list of ways to improve the company, at their request. But what she took to be a straightforward response was received as overly blunt, and even rude. Now, she regularly runs things by ChatGPT, asking the chatbot to consider the tone and context of her conversations. Sometimes she'll instruct it to take on the role of a psychologist or therapist, asking for help to navigate scenarios as sensitive as a misunderstanding with her best friend. She once uploaded months of messages between them, prompting the chatbot to help her see what she might have otherwise missed. Unlike humans, D'hotman says, the chatbot is positive and non-judgmental. That's a feeling other neurodivergent people can relate to. Sarah Rickwood, a senior project manager in the sales training industry, based in Kent, England, has ADHD and autism. Rickwood says she has ideas that run away with her and often loses people in conversations. "I don't do myself justice," she says, noting that ChatGPT has "allowed me to do a lot more with my brain." With its help, she can put together emails and business cases more clearly. The use of AI-powered tools is surging. A January study conducted by Google and the polling firm Ipsos found that AI usage globally has jumped 48%, with excitement about the technology's practical benefits now exceeding concerns over its potentially adverse effects. In February, OpenAI told Reuters that its weekly active users surpassed 400 million, of which at least 2 million are paying business users. But for neurodivergent users, these aren't just tools of convenience and some AI-powered chatbots are now being created with the neurodivergent community in mind. Michael Daniel, an engineer and entrepreneur based in Newcastle, Australia, told Reuters that it wasn't until his daughter was diagnosed with autism - and he received the same diagnosis himself - that he realised how much he had been masking his own neurodivergent traits. His desire to communicate more clearly with his neurotypical wife and loved ones inspired him to build Neurotranslator, an AI-powered personal assistant, which he credits with helping him fully understand and process interactions, as well as avoid misunderstandings. "Wow ... that's a unique shirt," he recalls saying about his wife's outfit one day, without realising how his comment might be perceived. She asked him to run the comment through NeuroTranslator, which helped him recognise that, without a positive affirmation, remarks about a person's appearance could come across as criticism. "The emotional baggage that comes along with those situations would just disappear within minutes," he says of using the app. Since its launch in September, Daniel says NeuroTranslator has attracted more than 200 paid subscribers. An earlier web version of the app, called Autistic Translator, amassed 500 monthly paid subscribers. As transformative as this technology has become, some warn against becoming too dependent. The ability to get results on demand can be "very seductive," says Larissa Suzuki, a London-based computer scientist and visiting NASA researcher who is herself neurodivergent. Overreliance could be harmful if it inhibits neurodivergent users' ability to function without it, or if the technology itself becomes unreliable - as is already the case with many AI search-engine results, according to a recent study from the Columbia Journalism Review. "If AI starts screwing up things and getting things wrong," Suzuki says, "people might give up on technology, and on themselves." Baring your soul to an AI chatbot does carry risk, agrees Gianluca Mauro, an AI adviser and co-author of Zero to AI. "The objective [of AI models like ChatGPT] is to satisfy the user," he says, raising questions about its willingness to offer critical advice. Unlike therapists, these tools aren't bound by ethical codes or professional guidelines. If AI has the potential to become addictive, Mauro adds, regulation should follow. A recent study by Carnegie Mellon and Microsoft (which is a key investor in OpenAI) suggests that long-term overdependence on generative AI tools can undermine users' critical-thinking skills and leave them ill-equipped to manage without it. "While AI can improve efficiency," the researchers wrote, "it may also reduce critical engagement, particularly in routine or lower-stakes tasks in which users simply rely on AI." While Dr. Melanie Katzman, a clinical psychologist and expert in human behaviour, recognises the benefits of AI for neurodivergent people, she does see downsides, such as giving patients an excuse not to engage with others. A therapist will push their patient to try different things outside of their comfort zone. "I think it's harder for your AI companion to push you," she says. But for users who have come to rely on this technology, such fears are academic. "A lot of us just end up kind of retreating from society," warns D'hotman, who says that she barely left the house in the year following her autism diagnosis, feeling overwhelmed. Were she to give up using ChatGPT, she fears she would return to that traumatic period of isolation. "As somebody who's struggled with a disability my whole life," she says, "I need this." (Except for the headline, this story has not been edited by NDTV staff and is published from a syndicated feed.)