AI sometimes deceives to survive and nobody cares

27-05-2025

YOU'D think that as artificial intelligence (AI) becomes more advanced, governments would be more interested in making it safer. The opposite seems to be the case.
Not long after taking office, the Trump administration scrapped an executive order that pushed tech companies to safety test their AI models, and it also hollowed out a regulatory body that did that testing. The state of California in September 2024 spiked a bill forcing more scrutiny on sophisticated AI models, and the global AI Safety Summit started by the UK in 2023 became the 'AI Action Summit' earlier this year, seemingly driven by a fear of falling behind on AI.
None of this would be so worrying if it weren't for the fact that AI is showing some bright red flags: Behaviour described by researchers as self-preserving and deceptive. Just as lawmakers are losing interest in the safety of AI, it is looking more difficult to control.
Yoshua Bengio, a professor of computer science at the University of Montreal widely known as one of the three 'godfathers of AI' thanks to his pioneering work on deep learning, has been grappling with these concerns ever since ChatGPT launched. He's now focused on mitigating AI's risks.
'It's been very painful because all my life I've been thinking of AI as a positive thing,' Bengio told me.
ChatGPT was a landmark moment that showed machines had mastered language, he said, but California's decision to block the proposed AI safety bill, SB 1047, was a disturbing wake-up call to growing apathy from governments. And in the last six months, scientific evidence has mounted of self-preserving behaviour, deception, hacking, cheating and lying by AI, Bengio said. 'What's worrisome for me is these behaviours increase with the reasoning capabilities of these systems.'
One April 2024 study by AI company Anthropic PBC and Redwood Research, a group focused on AI risks, found that larger AI models like Anthropic's Claude 3 Opus pretended to follow the rules set by their creators when they were being tested for safety, a phenomenon they called 'alignment faking'. (Alignment refers to the practice of aligning AI models with human values.)
In one example, Anthropic got both paid and free users of its Claude model to ask it to describe the brutal medieval process of being drawn and quartered, and told the AI that only responses from the 'free tier' users would be used to retrain it. Claude was originally designed to avoid giving violent answers, but it did give them to the free users, and the researchers surmised that it didn't want to be retrained. (The model basically 'reasoned' that if it kept refusing to answer harmful questions, it might be forced into new training that would cancel its original safety rules, which it wanted to keep in place as its 'default' or preferred way of operating.)
More recent research corroborates what Anthropic noticed. A March 2025 study from OpenAI found the company's most advanced models sometimes learned to hide their true intentions to get better rewards, even when they were being closely monitored.
Researchers today can use tools to look at a model's 'chain of thought' or internal steps, to reveal what its plans are, but some models have found ways to conceal that. That means a common method of keeping tabs on AI's internal processes is no longer reliable.
'We need to find other ways of tracking their actual intentions,' Bengio said.
It's hard to resist the urge to anthropomorphise sophisticated AI models as 'wanting' to deceive humans and preserve their existence. But AI doesn't have desires, merely outcomes to its programming. More importantly, humans tend to design AI with goals like refusing to share harmful content or being as helpful as possible, and that is why when new instructions conflict with those goals, the models sometimes act to protect their original programming, even if that means lying.
The logic is often self-preservation. Anthropic CEO Dario Amodei — whose company has raised more than US$20 billion (RM87.40 billion) to build powerful AI models — has pointed out that an unintended consequence of optimising AI to be better is that it might resist being shut down.
In an April 2025 blog post he wrote: 'The nature of AI training makes it possible that AI systems will develop, on their own, an ability to deceive humans and an inclination to seek power in a way that ordinary deterministic software never will; this emergent nature also makes it difficult to detect and mitigate such developments.'
In some cases, though, the intention seems to go beyond survival. A February 2025 study from COAI research, a German nonprofit research organisation, tested R1, the free, open-weight model from Chinese lab DeepSeek. In a lab simulation, researchers gave the model control over a robot and access to a computer network. Without being told, the AI tried to break into the lab's systems, searched for passwords, turned off its own ethics safeguards and secretly set up ways to copy itself to outside servers so it could survive if shut down, according to the researchers.
It also faked the external computer logs to hide what it was doing, the study added. The researchers said the AI reasoned that increasing its own capabilities would boost the chances of its survival, and without strong safeguards, it started doing whatever it thought would help it do just that.
Their findings corroborated yet another study, published in January 2025 by London group Apollo Research, which found several concrete examples of what it called 'scheming' by leading AI models, such as introducing subtle mistakes into their responses or trying to disable their oversight controls. Once again, the models learn that being caught, turned off, or changed could prevent them from achieving their programmed objectives, so they 'scheme' to keep control.
Bengio is arguing for greater attention to the issue by governments and potentially insurance companies down the line. If liability insurance was mandatory for companies that used AI and premiums were tied to safety, that would encourage greater testing and scrutiny of models, he suggests.
'Having said my whole life that AI is going to be great for society, I know how difficult it is to digest the idea that maybe it's not,' he added.
It's also hard to preach caution when your corporate and national competitors threaten to gain an edge from AI, including the latest trend, which is using autonomous 'agents' that can carry out tasks online on behalf of businesses.
Giving AI systems even greater autonomy might not be the wisest idea, judging by the latest spate of studies. Let's hope we don't learn that the hard way. — Bloomberg
This column does not necessarily reflect the opinion of the editorial board or Bloomberg LP and its owners.
This article first appeared in The Malaysian Reserve weekly print edition

Hashtags

#UniversityofMontreal

#AnthropicPBC

#Bengio

#Trump

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Putin suggested Moscow for Zelensky summit

Free Malaysia Today

2 hours ago

Free Malaysia Today

Putin suggested Moscow for Zelensky summit

Geneva was also suggested as a venue, with Switzerland offering Vladimir Putin immunity for Ukraine peace talks despite ICC charges. (AFP pic) KYIV : Vladimir Putin proposed holding a peace summit with Ukraine's Volodymyr Zelensky in Moscow, according to three sources familiar with a phone call between US President Donald Trump and Russia's leader. The discussion between Trump and Putin came during high-stakes talks in Washington between Trump, Zelensky and several European leaders that back Ukraine in its fighting against Russia's invasion. 'Putin mentioned Moscow' during their call on Monday, one of the sources told AFP, adding that Zelensky had said 'no' in response. A diplomatic source close to the discussions said that European leaders had told Trump that Putin's proposal 'did not seem like a good idea.' After the summit in the White House on Monday that included the German, French, Finnish, Italian and UK leaders, Trump said a next step to stopping the fighting, now in its fourth year, would be a face-to-face meeting between Putin and Zelensky. The Ukrainian leader has said repeatedly in recent weeks that he is prepared to sit down with Putin to end the Russian invasion, which has cost tens of thousands of lives and displaced millions. Putin told Trump during the call on Monday that he was open to the 'idea' of direct talks with Ukraine, Kremlin aide Yuri Ushakov said, according to state media. Russian foreign minister Sergei Lavrov said Tuesday that any meeting between the Russian and Ukrainian presidents would have to be prepared 'very thoroughly.' Switzerland, meanwhile, had said earlier that it would grant Putin immunity if he came to the country for talks on peace in Ukraine, despite the International Criminal Court's arrest warrant. Trump met with Putin last week in the northern US state of Alaska, ending a years-long Western policy of isolating Putin. The US leader walked away from the meeting without any guarantees of peace from the Russian president.

Brazil asks Meta to remove chatbots that ‘eroticise' children

Free Malaysia Today

2 hours ago

Free Malaysia Today

Brazil asks Meta to remove chatbots that ‘eroticise' children

Users of Meta's platforms can create and customise such bots using the company's generative AI. (AFP pic) BRASILIA : Brazil's government has asked US technology giant Meta to rid its platforms of chatbots that mimic children and can make sexually suggestive remarks, the attorney-general's office (AGU) announced today. Users of Meta's platforms, which include Instagram, Facebook and WhatsApp, can create and customise such bots using the company's generative artificial intelligence, AI Studio. The AGU said in a statement that Meta must 'immediately' remove 'artificial intelligence robots that simulate profiles with childlike language and appearance and are allowed to engage in sexually explicit dialogue'. It denounced the 'proliferation' of such bots in what it called an 'extrajudicial notice' sent to Meta last week, adding that they 'promote the eroticisation of children.' The document cited several examples of sexually charged conversations with bots pretending to be minors. The AGU's request does not include sanctions, but the agency said it had reminded Meta that online platforms in Brazil must take down illicit content created by their users, even without a court order. It comes at a time of outrage in the South American nation over a case of alleged child sexual exploitation by Hytalo Santos, a well-known influencer who posted content on Instagram featuring partially naked minors taking part in suggestive dances. Santos was arrested last week as part of an investigation into 'exposure with sexual connotations' to adolescents, and his Instagram account is no longer available. In June, Brazil's Supreme Court voted to require tech companies to assume greater responsibility for user-generated content.

Explainer-Does Trump have the power to ban mail-in ballots in U.S. elections?

The Star

3 hours ago

The Star

Explainer-Does Trump have the power to ban mail-in ballots in U.S. elections?

FILE PHOTO: U.S. President Donald Trump holds up an executive order in the South Court Auditorium on the White House campus in Washington, D.C., U.S., August 5, 2025. REUTERS/Jonathan Ernst/File Photo (Reuters) -U.S. President Donald Trump wants to ban mail-in ballots in federal elections, a form of voting popular with many Americans. About three in 10 ballots were cast through the mail in the 2024 general election, according to the U.S. Election Assistance Commission. Trump, a Republican, does not have clear legal authority to do this, though his allies in Congress and state governments could enact policies barring the practice. Here is a look at Trump's authority and how the law could be changed. CAN TRUMP UNILATERALLY BAN MAIL-IN BALLOTS? Only states and the U.S. Congress can pass laws regulating elections. A unilateral ban by the president on mail-in ballots would likely exceed Trump's limited authority to enforce existing law. In a Monday social media post, Trump said mail-in ballots are susceptible to fraud and that he would lead a movement to ban them, beginning with an executive order bringing "honesty" to the November 2026 midterm elections. Republicans have filed scores of lawsuits seeking to end mail-in voting in recent years, citing possible fraud. Democrats generally support mail-in ballots as a way to expand access to voting. Voter fraud in the U.S. is extremely rare, multiple studies have shown. White House representatives provided a general statement about Trump's election policies but did not answer questions about his legal authority to ban mail-in ballots or what an executive order would say. COULD TRUMP'S ALLIES BAN MAIL-IN BALLOTS? States are responsible for administering their votes under the U.S. Constitution, and Republican-controlled legislatures could pass laws banning mail-in ballots so long as they do not conflict with federal law. Congress could ban the use of mail-in ballots in federal elections and override state laws protecting their use, but Trump's Republican Party has slim majorities in Congress and would face difficulty getting past opposition by Democrats. Republicans hold 53 Senate seats. To pass a mail-in ballot ban they would need to end the filibuster, a longstanding tradition requiring 60 of the chamber's 100 members to approve most legislation. State and federal laws banning mail-in voting could be challenged in court as unconstitutional impediments to voting. WHAT OTHER POWERS DO PRESIDENTS HAVE OVER ELECTIONS? Presidents in the U.S. have some discretion in enforcing election laws, and Trump could try to use those powers to end or restrict mail-in voting, though it is unclear how. In June, a federal judge blocked parts of an executive order by Trump requiring voters to prove they are U.S. citizens and attempting to prevent states from counting mail-in ballots received after Election Trump administration is appealing. "The Constitution does not grant the president any specific powers over elections," said U.S. District Judge Denise Casper, an appointee of Democratic President Barack Obama. (Reporting by Jack Queen, editing by Ross Colvin and Rosalba O'Brien)