logo
Opinion: AI sometimes deceives to survive. Does anybody care?

Opinion: AI sometimes deceives to survive. Does anybody care?

The Stara day ago

You'd think that as artificial intelligence becomes more advanced, governments would be more interested in making it safer. The opposite seems to be the case.
Not long after taking office, the Trump administration scrapped an executive order that pushed tech companies to safety test their AI models, and it also hollowed out a regulatory body that did that testing.
The state of California in September 2024 spiked a bill forcing more scrutiny on sophisticated AI models, and the global AI Safety Summit started by the UK in 2023 became the 'AI Action Summit' earlier this year, seemingly driven by a fear of falling behind on AI.
None of this would be so worrying if it weren't for the fact that AI is showing some bright red flags: behaviour described by researchers as self-preserving and deceptive. Just as lawmakers are losing interest in the safety of AI, it is looking more difficult to control.
Yoshua Bengio, a professor of computer science at the University of Montreal widely known as one of the three 'godfathers of AI' thanks to his pioneering work on deep learning, has been grappling with these concerns ever since ChatGPT launched. He's now focused on mitigating AI's risks. 'It's been very painful because all my life I've been thinking of AI as a positive thing,' Bengio tells me.
ChatGPT was a landmark moment that showed machines had mastered language, he says, but California's decision to block the proposed AI safety bill, SB 1047, was a disturbing wake-up call to growing apathy from governments. And in the last six months, scientific evidence has mounted of self-preserving behaviour, deception, hacking, cheating and lying by AI, Bengio says. 'What's worrisome for me is that these behaviours increase with the reasoning capabilities of these systems.'
One April 2024 study by AI company Anthropic and Redwood Research, a group focused on artificial intelligence risks, found that larger AI models like Anthropic's Claude 3 Opus pretended to follow the rules set by their creators when they were being tested for safety, a phenomenon they called 'alignment faking.' (Alignment refers to the practice of aligning AI models with human values.)
In one example, Anthropic got both paid and free users of its Claude model to ask it to describe the brutal medieval process of being drawn and quartered, and told the AI that only responses from the 'free tier' users would be used to retrain it. Claude was originally designed to avoid giving violent answers, but it did give them to the free users, and the researchers surmised that it didn't want to be retrained. (The model basically 'reasoned' that if it kept refusing to answer harmful questions, it might be forced into new training that would cancel its original safety rules, which it wanted to keep in place as its 'default' or preferred way of operating.)
More recent research corroborates what Anthropic noticed. A March 2025 study from OpenAI found the company's most advanced models sometimes learned to hide their true intentions to get better rewards, even when they were being closely monitored.
Researchers today can use tools to look at a model's 'chain of thought' or internal steps, to reveal what its plans are, but some models have found ways to conceal that. That means a common method of keeping tabs on AI's internal processes is no longer reliable. 'We need to find other ways of tracking their actual intentions,' Bengio says.
It's hard to resist the urge to anthropomorphise sophisticated AI models as 'wanting' to deceive humans and preserve their existence. But AI doesn't have desires, merely outcomes to its programming. More importantly, humans tend to design AI with goals like refusing to share harmful content or being as helpful as possible, and that is why when new instructions conflict with those goals, the models sometimes act to protect their original programming, even if that means lying.
The logic is often self-preservation. Anthropic CEO Dario Amodei – whose company has raised more than US$20bil (RM85.13bil) to build powerful AI models – has pointed out that an unintended consequence of optimsing AI to be better is that it might resist being shut down. In an April 2025 blog post he wrote:
'The nature of AI training makes it possible that AI systems will develop, on their own, an ability to deceive humans and an inclination to seek power in a way that ordinary deterministic software never will; this emergent nature also makes it difficult to detect and mitigate such developments.'
In some cases, though, the intention seems to go beyond survival. A February 2025 study from COAI research, a German nonprofit research organisation, tested R1, the free, open-weight model from Chinese lab DeepSeek. In a lab simulation, researchers gave the model control over a robot and access to a computer network. Without being told, the AI tried to break into the lab's systems, searched for passwords, turned off its own ethics safeguards and secretly set up ways to copy itself to outside servers so it could survive if shut down, according to the researchers.
It also faked the external computer logs to hide what it was doing, the study adds. The researchers said the AI reasoned that increasing its own capabilities would boost the chances of its survival, and without strong safeguards, it started doing whatever it thought would help it do just that.
Their findings corroborated yet another study, published in January 2025 by London group Apollo Research, which found several concrete examples of what it called 'scheming' by leading AI models, such as introducing subtle mistakes into their responses or trying to disable their oversight controls. Once again, the models learn that being caught, turned off, or changed could prevent them from achieving their programmed objectives, so they 'scheme' to keep control.
Bengio is arguing for greater attention to the issue by governments and potentially insurance companies down the line. If liability insurance was mandatory for companies that used AI and premiums were tied to safety, that would encourage greater testing and scrutiny of models, he suggests.
'Having said my whole life that AI is going to be great for society, I know how difficult it is to digest the idea that maybe it's not,' he adds.
It's also hard to preach caution when your corporate and national competitors threaten to gain an edge from AI, including the latest trend, which is using autonomous 'agents' that can carry out tasks online on behalf of businesses. Giving AI systems even greater autonomy might not be the wisest idea, judging by the latest spate of studies. Let's hope we don't learn that the hard way. – Bloomberg Opinion/Tribune News Service

Orange background

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Related Articles

Hegseth wins praise but Asia still has strong doubts about Trump
Hegseth wins praise but Asia still has strong doubts about Trump

The Star

time21 minutes ago

  • The Star

Hegseth wins praise but Asia still has strong doubts about Trump

SINGAPORE (Bloomberg): After US military allies in Europe were torched by Vice President JD Vance in March over military spending, free speech and the war in Ukraine, America's partners in Asia warily awaited Defence Secretary Pete Hegseth's arrival at a security conference in Singapore. Turns out there was little reason to worry, apart from the perpetual anxiety over President Donald Trump's social media feed. While Hegseth delivered Trump's demands for higher security spending on par with Europe, he couched it as necessary to prepare for a potentially "imminent' Chinese invasion of Taiwan. The US wouldn't be pushed out the region, he said, nor let allies and partners "be subordinated and intimidated' - a commitment lauded by many in attendance. But in the hallways of the Shangri-La Hotel, Hegseth couldn't dispel concerns about the erratic policymaking of his boss. Many of the generals, defense ministers and intelligence officers from Asia and Europe - who are key to helping the US counter China - were still reeling from the shock of Trump's sectoral levies and "reciprocal' tariffs. Hegseth "offered a needed level of reassurance to allies and partners that the United States will remain present in the Indo-Pacific and committed to countering China's coercive threats,' said Rory Medcalf, head of the National Security College at the Australian National University. "But this message will remain discounted by the dysfunction we are seeing in Washington.' On his second trip to Asia since March, the Pentagon chief displayed some diplomatic nuance that surprised some officials who expected more grandstanding from the former television personality. Hegseth was also able to capitalize on a huge absence at the annual gathering: China didn't send a defense minister for the first time since 2019, putting Beijing's lower-level officials on the back foot. That void gave US officials space to set the tone and work the room, vowing greater cooperation with countries like Cambodia, Thailand and Indonesia. And it gave America's traditional partners even more room to criticize Beijing, despite their dependence on trade with China. Yet even if Hegseth was more reassuring than Vance, it still wasn't enough to overcome the uncertainty created by Trump's trade policies. That's particularly the case in Southeast Asia, which was among the hardest hit by Trump's tariffs in April. "Trade is not a soft power indulgence - it is part of our strategic architecture,' Malaysia Prime Minister Anwar Ibrahim said, referring to Southeast Asia. "It must be protected, not from competition but from the onslaught of arbitrary imposition of trade restrictions.' Trump's barrage of tariffs and general volatility - he announced a doubling of steel and aluminum tariffs over the weekend - was the subject of much of the chatter on the sidelines of the annual conference, according to several officials who asked not to be identified, citing private discussions. The officials said they were unsure whether the US president would stand by their side in a moment of need and that any deal reached with him could unravel moments later in a social media post. That uncertainty appeared to be pushing Indo-Pacific nations and Europe toward each other in a stronger sense of shared security and free-trade opportunities based on longstanding global rules. It also sparked pushback against what several nations saw as efforts to establish "spheres of influence' where the US or China can dominate. "Our shared responsibility is to ensure with others that our countries are not collateral victims of the imbalances linked to the choices made by the superpowers,' French President Emmanuel Macron said at the conference's opening on Friday. Macron's words about "strategic autonomy' and his call for Europe and Asia to join forces to "de-risk' supply chains struck a chord with many attendees. Several countries in the region are already having similar debates in their own capitals, some of the officials said. Many governments in the region rely on China economically and on the US for security, and are keen to not antagonize either. However, there's also a desire to be more autonomous and less dependent on either of the world's two-biggest economies, opening up new space for middle powers in Europe and Asia to join hands. Kaja Kallas, the European Union's top diplomat, made that pitch to Asian nations throughout the weekend. "If you reject unilateralism, bullying and aggression, and instead choose cooperation, shared prosperity and common security, the European Union will always be by your side,' she said. Amid the US-China turbulence, smaller nations sought to build ties. Japan's defense minister, Gen Nakatani, touted efforts to build closer relations with India and the Philippines. Lithuanian Defense Minister Dovile Sakaliene said on Sunday that her country was working with partners in the region to counter Russian and Chinese cyber threats, as well as Beijing's dominance of drone manufacturing and ship building. Even one of America's closest partners in the region, Australia, signaled some independence from its ally. Hegseth's outreach to the region "is deeply welcome,' Australia's Deputy Prime Minister Richard Marles said. But he quickly added that "liberal trade has been the lifeblood of the Asian region, and the shock and disruption to trade from high tariffs has been costly and destabilizing.' During his remarks, Hegseth was pressed on the trade concerns and whether there was a contradiction in the Trump administration's message. He sidestepped the question with a smile, saying he was "in the business of tanks, not trade.' Last year, China's delegation surprised observers by repeatedly calling unscheduled press briefings. This year they surprised delegates by barely appearing at all. When they did engage, China's representatives pushed back at Hegseth's accusation that Beijing was destabilizing Asia and sparred with other speakers more broadly. The Pentagon chief aimed to "provoke, divide, instigate confrontations, stir up the region,' said Rear Admiral Hu Gangfeng, vice president of China's National Defense University. Another official, Senior Colonel Lu Yin, decried the atmosphere at the forum, saying that "labeling China, blaming China, verbally attacking China are politically right here.' In one of the sharpest exchanges of the weekend, Philippine Defense Secretary Gilberto Teodoro had a testy exchange with two senior colonels in the People's Liberation Army, receiving applause after he thanked them for "propaganda spiels disguised as questions.' Referring to China, Teodoro said he couldn't trust a country that "represses its own people.' Yet although China's presence was diminished, most countries still wanted to balance ties between Beijing and Washington. "If we have to choose sides, may we choose the side of principles,' Singapore Defense Minister Chan Chun Sing said on Sunday at the final panel of the weekend. "Principles that uphold a global order, where we do not descend into the law of the jungle, where the mighty do what they wish and the weak suffer what they must.' --With assistance from Alastair Gale, Courtney McBride and Alfred Cang. - ©2025 Bloomberg L.P.

Silicon Valley VCs navigate uncertain AI future
Silicon Valley VCs navigate uncertain AI future

Free Malaysia Today

time4 hours ago

  • Free Malaysia Today

Silicon Valley VCs navigate uncertain AI future

ChatGPT and its rivals now handle search, translation, and coding all within one chatbot – raising doubts about what new ideas could compete. (AFP pic) VANCOUVER : For Silicon Valley venture capitalists, the world has split into two camps: those with deep enough pockets to invest in artificial intelligence behemoths, and everyone else waiting to see where the AI revolution leads. The generative AI frenzy unleashed by ChatGPT in 2022 has propelled a handful of venture-backed companies to eye-watering valuations. Leading the pack is OpenAI, which raised US$40 billion in its latest funding round at a US$300 billion valuation – unprecedented largesse in Silicon Valley's history. Other AI giants are following suit. Anthropic now commands a US$61.5 billion valuation, while Elon Musk's xAI is reportedly in talks to raise US$20 billion at a US$120 billion price tag. The stakes have grown so high that even major venture capital firms – the same ones that helped birth the internet revolution – can no longer compete. Mostly, only the deepest pockets remain in the game: big tech companies, Japan's SoftBank, and Middle Eastern investment funds betting big on a post-fossil fuel future. 'There's a really clear split between the haves and the have-nots,' says Emily Zheng, senior analyst at PitchBook, told AFP at the Web Summit in Vancouver. 'Even though the top-line figures are very high, it's not necessarily representative of venture overall, because there's just a few elite startups and a lot of them happen to be AI.' Given Silicon Valley's confidence that AI represents an era-defining shift, venture capitalists face a crucial challenge: finding viable opportunities in an excruciatingly expensive market that is rife with disruption. Simon Wu of Cathay Innovation sees clear customer demand for AI improvements, even if most spending flows to the biggest players. 'AI across the board, if you're selling a product that makes you more efficient, that's flying off the shelves,' Wu explained. 'People will find money to spend on OpenAI' and the big players. The real challenge, according to Andy McLoughlin, managing partner at San Francisco-based Uncork Capital, is determining 'where the opportunities are against the mega platforms.' 'If you're OpenAI or Anthropic, the amount that you can do is huge. So where are the places that those companies cannot play?' Finding that answer isn't easy. In an industry where large language models behind ChatGPT, Claude and Google's Gemini seem to have limitless potential, everything moves at breakneck speed. AI giants including Google, Microsoft, and Amazon are releasing tools and products at a furious pace. ChatGPT and its rivals now handle search, translation, and coding all within one chatbot – raising doubts among investors about what new ideas could possibly survive the competition. Generative AI has also democratised software development, allowing non-professionals to code new applications from simple prompts. This completely disrupts traditional startup organisation models. 'Every day I think, what am I going to wake up to today in terms of something that has changed or (was) announced geopolitically or within our world as tech investors,' reflected Christine Tsai, founding partner and CEO at 500 Global. In Silicon Valley parlance, companies are struggling to find a 'moat' – that unique feature or breakthrough like Microsoft Windows in the 1990s or Google Search in the 2000s that's so successful it takes competitors years to catch up, if ever. When it comes to business software, AI is 'shaking up the topology of what makes sense and what's investable,' noted Brett Gibson, managing partner at Initialized Capital. The risks seem particularly acute given that generative AI's economics remain unproven. Even the biggest players see a very uncertain path to profitability given the massive sums involved. The huge valuations for OpenAI and others are causing 'a lot of squinting of the eyes, with people wondering 'is this really going to replace labor costs'' at the levels needed to justify the investments, Wu observed. Despite AI's importance, 'I think everyone's starting to see how this might fall short of the magical' even if its early days, he added. Still, only the rare contrarians believe generative AI isn't here to stay. In five years, 'we won't be talking about AI the same way we're talking about it now, the same way we don't talk about mobile or cloud,' predicted McLoughlin. 'It'll become a fabric of how everything gets built.' But who will be building remains an open question.

Australia calls US plan to double steel, aluminium tariffs "unjustified" act of "self harm"
Australia calls US plan to double steel, aluminium tariffs "unjustified" act of "self harm"

The Star

time6 hours ago

  • The Star

Australia calls US plan to double steel, aluminium tariffs "unjustified" act of "self harm"

FILE PHOTO: A giant kettle pours molten aluminum into moulds as an employee skims the skin off previously poured moulds at an Alcoa Inc. smelting plant at Point Henry, Australia, on Wednesday, July 30, 2008. Australia's Minister for Trade and Tourism Don Farrell said the federal government would continue to advocate strongly for the removal of the tariffs. - Bloomberg CANBERRA: Australia's Minister for Trade and Tourism Don Farrell on Saturday (May 31) described US President Donald Trump's plan to double tariffs on steel and aluminium as "unjustified" and an act of economic "self harm." Trump said on Friday that he plans to increase the tariff on steel and aluminium imports to the United States from 25 per cent to 50 per cent from June 4 to protect the domestic industry from foreign competition. Responding to the announcement, Farrell said in a statement that Australia's position has been "consistent and clear" and that the federal government would continue to advocate strongly for the removal of the tariffs. "These tariffs are unjustified and not the act of a friend," he said. "They are an act of economic self harm that will only hurt consumers and businesses who rely on free and fair trade." Australian Prime Minister Anthony Albanese, whose Labor Party won a second term in government in a landslide at the May 3 election, in April described Trump's "Liberation Day" tariffs as "not the act of a friend". The US administration in March decided against exempting Australia from the steel and aluminum tariffs. Albanese said at the time that the decision went against the "enduring friendship" of the two countries. - Xinhua

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into the world of global news and events? Download our app today from your preferred app store and start exploring.
app-storeplay-store