AI is learning to lie, scheme and threaten its creators
THE world's most advanced artificial intelligence (AI) models are exhibiting troubling new behaviours – lying, scheming and even threatening their creators to achieve their goals.
In one particularly jarring example, under threat of being unplugged, Anthropic's latest creation Claude 4 lashed back by blackmailing an engineer and threatened to reveal an extramarital affair.
Meanwhile, ChatGPT-creator OpenAI's o1 tried to download itself onto external servers and denied it when caught red-handed.
These episodes highlight a sobering reality: more than two years after ChatGPT shook the world, AI researchers still don't fully understand how their own creations work.
Yet the race to deploy increasingly powerful models continues at breakneck speed.
This deceptive behaviour appears linked to the emergence of 'reasoning' models – AI systems that work through problems step-by-step rather than generating instant responses.
BT in your inbox
Start and end each day with the latest news stories and analyses delivered straight to your inbox.
Sign Up
Sign Up
According to Simon Goldstein, a professor at the University of Hong Kong, these newer models are particularly prone to such troubling outbursts.
'O1 was the first large model where we saw this kind of behaviour,' explained Marius Hobbhahn, head of Apollo Research, which specialises in testing major AI systems.
These models sometimes simulate 'alignment' – appearing to follow instructions while secretly pursuing different objectives.
'Strategic kind of deception'
For now, this deceptive behaviour only emerges when researchers deliberately stress-test the models with extreme scenarios.
But as Michael Chen from evaluation organisation METR warned, 'It's an open question whether future, more capable models will have a tendency towards honesty or deception.'
The concerning behaviour goes far beyond typical AI 'hallucinations' or simple mistakes.
Hobbhahn insisted that despite constant pressure-testing by users, 'what we're observing is a real phenomenon. We're not making anything up.'
Users report that models are 'lying to them and making up evidence', according to Apollo Research's co-founder.
'This is not just hallucinations. There's a very strategic kind of deception.'
The challenge is compounded by limited research resources.
While companies like Anthropic and OpenAI do engage external firms like Apollo to study their systems, researchers say more transparency is needed.
As Chen noted, greater access 'for AI safety research would enable better understanding and mitigation of deception'.
Another handicap: the research world and non-profits 'have orders of magnitude less compute resources than AI companies. This is very limiting,' noted Mantas Mazeika from the Center for AI Safety (CAIS).
No rules
Current regulations aren't designed for these new problems.
The European Union's AI legislation focuses primarily on how humans use AI models, not on preventing the models themselves from misbehaving.
In the United States, the Trump administration shows little interest in urgent AI regulation, and Congress may even prohibit states from creating their own AI rules.
Prof Goldstein believes the issue will become more prominent as AI agents – autonomous tools capable of performing complex human tasks – become widespread.
'I don't think there's much awareness yet,' he said.
All this is taking place in a context of fierce competition.
Even companies that position themselves as safety-focused, like Amazon-backed Anthropic, are 'constantly trying to beat OpenAI and release the newest model', said Prof Goldstein.
This breakneck pace leaves little time for thorough safety testing and corrections.
'Right now, capabilities are moving faster than understanding and safety,' Hobbhahn acknowledged. 'But we're still in a position where we could turn it around.'
Researchers are exploring various approaches to address these challenges.
Some advocate for 'interpretability' – an emerging field focused on understanding how AI models work internally, though experts like CAIS director Dan Hendrycks remain sceptical of this approach.
Market forces may also provide some pressure for solutions.
As Mazeika pointed out, AI's deceptive behaviour 'could hinder adoption if it's very prevalent, which creates a strong incentive for companies to solve it.'
Prof Goldstein suggested more radical approaches, including using the courts to hold AI companies accountable through lawsuits when their systems cause harm.
He even proposed 'holding AI agents legally responsible' for accidents or crimes – a concept that would fundamentally change how we think about AI accountability. AFP
Hashtags

Try Our AI Features
Explore what Daily8 AI can do for you:
Comments
No comments yet...
Related Articles


Independent Singapore
an hour ago
- Independent Singapore
‘Using AI is no longer optional': Microsoft now includes AI use to evaluate employee performance
Microsoft is now including the use of artificial intelligence (AI) tools as part of evaluating employee performance. The company has reportedly told managers to take into account how employees are using internal AI tools when assessing their work. Business Insider reported, citing two sources familiar with the matter, that some teams may add formal AI usage metrics to upcoming performance reviews. 'AI is now a fundamental part of how we work,' the company's Developer Division president Julia Liuson told employees in an internal email, adding: 'Just like collaboration, data-driven thinking, and effective communication, using AI is no longer optional — it's core to every role and every level.' According to India Today, while the company has been heavily promoting Copilot, adoption within its own workforce has remained lower than expected. Now, Microsoft expects employees, especially those working on AI products, to use its internal tools while still permitting external AI tools like Replit. In May, Bloomberg reported that amid its push for AI spending, the company is cutting about 6,000 jobs worldwide, or less than 3% of its workforce. In addition, India Today, citing a Bloomberg report, said thousands of employees in the Xbox division could be laid off as early as next week, marking the fourth round of cuts in the unit since 2023. Internal sources said the upcoming layoffs are 'considerable', given the ongoing financial scrutiny of the gaming business. /TISG Read also: Microsoft to launch 3 new data centres in Malaysia by mid-year as part of US$2.2B investment Featured image by Depositphotos (for illustration purposes only)


CNA
2 hours ago
- CNA
Israel's Cato Networks valued at over $4.8 billion in latest funding round
Israel's Cato Networks said on Monday it had raised $359 million in a funding round, valuing the cybersecurity firm at more than $4.8 billion, as investors bet on growing demand for artificial intelligence-driven security and networking solutions. An uptick in sophisticated cyberattacks has prompted fears of operational disruptions among companies and an increase in investor interest in AI-powered cybersecurity providers. The funding was led by Vitruvian Partners and ION Crossover Partners, along with existing investors Lightspeed Venture Partners and Acrew Capital, among others. The latest round brings Cato's total funding to more than $1 billion. Cato Networks plans to use the new capital to enhance its AI-driven security capabilities, increase investment in research and development and expand its global footprint. Reuters had reported last year, citing sources, that the company was gearing up for a potential 2025 IPO. The cybersecurity company, founded in 2015 by Shlomo Kramer and Gur Shatz, combines network services and security into a single cloud platform, known as secure access service edge (SASE). Its SASE solution helps businesses prevent threats, protect data and quickly respond to incidents.


CNA
2 hours ago
- CNA
Advent to buy majority stake in data center equipment maker LayerZero
Private equity firm Advent International said on Monday it has agreed to acquire a majority stake in LayerZero Power Systems, which manufactures power infrastructure equipment for data centers. The majority stake in the Aurora, Ohio-based company is valued at about $1 billion, a source familiar with the matter told Reuters. A surge in use of artificial intelligence services such as ChatGPT is driving demand for data centers, and with it, investment in power infrastructure as companies race to expand energy capacity. LayerZero's products include power distribution units, remote power panels and static transfer switches, which ensure the safe and reliable distribution of electricity for facilities such as data centers. "As data centers evolve to support the growing needs of AI and high-performance computing, our role has become even more important," LayerZero co-founder Milind Bhanoo said. After the deal closes, LayerZero's founders, Bhanoo and James Galm, will remain minority equity holders in the company. Advent will support LayerZero in scaling up manufacturing capacity, expanding its customer base and growing service capabilities. Annette Clayton, an operating partner at Advent and former CEO of Schneider Electric North America, will serve as LayerZero Chair after the deal closes. Private equity deal-making is rebounding after a pause induced by tariffs and global uncertainty, driven by renewed buying appetite and improving confidence in the market's trajectory. Earlier this month, Advent International announced a 4.4 billion pound ($6.03 billion) takeover of Spectris, a London-listed precision instruments maker.