AI sometimes deceives to survive, does anybody care?

Parmy Olson,
The Independent
You'd think that as artificial intelligence becomes more advanced, governments would be more interested in making it safer. The opposite seems to be the case. Not long after taking office, the Trump administration scrapped an executive order that pushed tech companies to safety test their AI models, and it also hollowed out a regulatory body that did that testing. The state of California in September 2024 spiked a bill forcing more scrutiny on sophisticated AI models, and the global AI Safety Summit started by the UK in 2023 became the 'AI Action Summit' earlier this year, seemingly driven by a fear of falling behind on AI. None of this would be so worrying if it weren't for the fact that AI is showing some bright red flags: behavior described by researchers as self-preserving and deceptive. Just as lawmakers are losing interest in the safety of AI, it is looking more difficult to control. Yoshua Bengio, a professor of computer science at the University of Montreal widely known as one of the three 'godfathers of AI' thanks to his pioneering work on deep learning, has been grappling with these concerns ever since ChatGPT launched. He's now focused on mitigating AI's risks. 'It's been very painful because all my life I've been thinking of AI as a positive thing,' Bengio tells me.
ChatGPT was a landmark moment that showed machines had mastered language, he says, but California's decision to block the proposed AI safety bill, SB 1047, was a disturbing wake-up call to growing apathy from governments. And in the last six months, scientific evidence has mounted of self-preserving behavior, deception, hacking, cheating and lying by AI, Bengio says. 'What's worrisome for me is that these behaviors increase with the reasoning capabilities of these systems.' One April 2024 study by AI company Anthropic and Redwood Research, a group focused on artificial intelligence risks, found that larger AI models like Anthropic's Claude 3 Opus pretended to follow the rules set by their creators when they were being tested for safety, a phenomenon they called 'alignment faking.' (Alignment refers to the practice of aligning AI models with human values.) In one example, Anthropic got both paid and free users of its Claude model to ask it to describe the brutal medieval process of being drawn and quartered, and told the AI that only responses from the 'free tier' users would be used to retrain it. Claude was originally designed to avoid giving violent answers, but it did give them to the free users, and the researchers surmised that it didn't want to be retrained. (The model basically 'reasoned' that if it kept refusing to answer harmful questions, it might be forced into new training that would cancel its original safety rules, which it wanted to keep in place as its 'default' or preferred way of operating.)
More recent research corroborates what Anthropic noticed. A March 2025 study from OpenAI found the company's most advanced models sometimes learned to hide their true intentions to get better rewards, even when they were being closely monitored. Researchers today can use tools to look at a model's 'chain of thought' or internal steps, to reveal what its plans are, but some models have found ways to conceal that. That means a common method of keeping tabs on AI's internal processes is no longer reliable. 'We need to find other ways of tracking their actual intentions,' Bengio says. It's hard to resist the urge to anthropomorphize sophisticated AI models as 'wanting' to deceive humans and preserve their existence. But AI doesn't have desires, merely outcomes to its programming. More importantly, humans tend to design AI with goals like refusing to share harmful content or being as helpful as possible, and that is why when new instructions conflict with those goals, the models sometimes act to protect their original programming, even if that means lying.
The logic is often self-preservation. Anthropic CEO Dario Amodei — whose company has raised more than $20 billion to build powerful AI models — has pointed out that an unintended consequence of optimizing AI to be better is that it might resist being shut down. In an April 2025 blog post he wrote:
'The nature of AI training makes it possible that AI systems will develop, on their own, an ability to deceive humans and an inclination to seek power in a way that ordinary deterministic software never will; this emergent nature also makes it difficult to detect and mitigate such developments.'
In some cases, though, the intention seems to go beyond survival. A February 2025 study from COAI research, a German nonprofit research organization, tested R1, the free, open-weight model from Chinese lab DeepSeek. In a lab simulation, researchers gave the model control over a robot and access to a computer network. Without being told, the AI tried to break into the lab's systems, searched for passwords, turned off its own ethics safeguards and secretly set up ways to copy itself to outside servers so it could survive if shut down, according to the researchers.
It also faked the external computer logs to hide what it was doing, the study adds. The researchers said the AI reasoned that increasing its own capabilities would boost the chances of its survival, and without strong safeguards, it started doing whatever it thought would help it do just that. Their findings corroborated yet another study, published in January 2025 by London group Apollo Research, which found several concrete examples of what it called 'scheming' by leading AI models, such as introducing subtle mistakes into their responses or trying to disable their oversight controls. Once again, the models learn that being caught, turned off, or changed could prevent them from achieving their programmed objectives, so they 'scheme' to keep control. Bengio is arguing for greater attention to the issue by governments and potentially insurance companies down the line. If liability insurance was mandatory for companies that used AI and premiums were tied to safety, that would encourage greater testing and scrutiny of models, he suggests.
'Having said my whole life that AI is going to be great for society, I know how difficult it is to digest the idea that maybe it's not,' he adds. It's also hard to preach caution when your corporate and national competitors threaten to gain an edge from AI, including the latest trend, which is using autonomous 'agents' that can carry out tasks online on behalf of businesses. Giving AI systems even greater autonomy might not be the wisest idea, judging by the latest spate of studies. Let's hope we don't learn that the hard way.

Hashtags

Politics

#UniversityofMontreal

#Bengio

#ParmyOlson

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Tahawul Tech

an hour ago

Tahawul Tech

Heimir Fannar Gunnlaugsson Archives

Shaping the future of Exposure Management with AI, Analytics, and collaboration

Tahawul Tech

an hour ago

Tahawul Tech

Shaping the future of Exposure Management with AI, Analytics, and collaboration

Heimir Fannar Gunnlaugsson, Chief Executive Officer at Nanitor, shared insights with on evolving cybersecurity needs, the role of AI, and strategic partnerships during his participation at GISEC Global 2025. Nanitor is a cyber security technology company securing your internal IT specialising in exposure management, continued to make its presence felt at GISEC Global. Heimir Fannar Gunnlaugsson, CEO of Nanitor, discussed the company's ongoing mission to simplify risk visibility, his views on the evolving cybersecurity landscape, and why partnerships and talent development are crucial to meeting tomorrow's challenges. What did Nanitor showcase at GISEC Global 2025? Nanitor is here primarily to support our partners, engage with our distributors, and connect with customers. Our platform focuses on internal IT security, particularly in the area of Threat Exposure Management. We detect and evaluate a wide range of internal security threats, giving our clients a simplified view through our Dimond Vision of their risk posture. This differentiates us in the market and helps organisations act more decisively. How was your experience been at GISEC this year compared to previous editions? GISEC continues to be a dynamic and engaging platform. The footfall, energy, and discussions have been impressive. What stood out this year is the heightened interest in AI—although many are still figuring out what it truly means for their businesses. There's a mix of curiosity and caution, and amid the excitement, the focus must remain on securing the fundamental IT infrastructure. What role will emerging technologies play in shaping exposure management? Technology is evolving rapidly in this field. Advanced methods to expose weaknesses and deep analytics are now integral. At Nanitor, we are offering deeper insights than ever before. AI will certainly become a core part of this journey, bringing with it speed and predictive capabilities. I also foresee extensive collaboration across technology domains—asset discovery, threat detection, and external and internal risk management. These partnerships will drive more cohesive cybersecurity frameworks aiming at Cyber Resilience more than anything else Can you elaborate on the importance of partnerships? Partnerships are core our business, Nanitor is 100% Channel based company as we go to market hand in hand with our partners. We expect partnerships to emerge across asset management, threat detection, exposure analysis, and internal-external security integrations. The alliances will be key to building a more comprehensive exposure management ecosystem. GISEC offers a good glimpse into how these cross-sector collaborations are beginning to take shape. What top trends do you foresee in exposure management for 2025? First, AI will enhance speed and automation. Second, analytics will become more refined, offering actionable insights. Third, remediation strategies will evolve alongside these technologies. The process will grow more complex, but ultimately it will benefit customers by improving their ability to identify and mitigate risks in real time.

Middle East faces half a million cyberthreats a minute

Khaleej Times

an hour ago

Khaleej Times

Middle East faces half a million cyberthreats a minute

The cybersecurity market in the Middle East is evolving fast, with reports suggesting up to half a million threats every minute. As a result, any organization embarking on a digitial transformation journey must have cybersecurity built into its DNA, an expert said. 'Cybersecurity is no longer a supporting function — it's a foundational pillar of digital transformation. In the Middle East, where innovation is surging across sectors, we see a growing awareness of this reality. Organizations are now prioritizing cyber resilience right from the planning stage,' Salah Suleiman, managing director — South Gulf at Trend Micro, told Khaleej Times on the sidelines of Gisec Global in Dubai last week. Suleiman stressed that AI has rapidly grown from a budding concept to a transformative force across industries. From a cybersecurity standpoint, it's a double-edged sword. 'While AI is being used to create more sophisticated threats, it's also becoming an essential tool in our defence arsenal. At our firm, we've been proactive in integrating AI into our threat detection and response frameworks. We're no longer just reacting to threats — we're predicting them. Our platforms leverage AI to analyze vast amounts of data in real time, detect anomalies, and even anticipate breaches before they occur. The only way forward is to stay ahead, and AI gives us that edge,' Suleiman said. At Gisec, Trend Micro showcased its flagship 'Trend Vision One' platform. What sets it apart is its ability to offer customers a predictive overview of potential threats—essentially allowing them to prepare before an attack even happens. 'This is a huge leap from traditional reactive cybersecurity. The platform is built to align with the evolving needs of today's digital-first enterprises and provides seamless integration, intelligent threat modeling, and automated incident response,' Suleiman said. Gisec is a significant milestone for Trend Micro and the company is aiming expanding its reach and capabilities, particularly in AI-based security and managed services. 'It's also a great platform to connect with industry peers and customers, showcase focused solutions, and strengthen our regional network,' Suleiman said.

AI sometimes deceives to survive, does anybody care?

Hashtags

Try Our AI Features

Comments

Related Articles

Heimir Fannar Gunnlaugsson Archives

Shaping the future of Exposure Management with AI, Analytics, and collaboration

Middle East faces half a million cyberthreats a minute

Get Started Now: Download the App