AI is learning to lie, scheme and threaten its creators

30-06-2025

Agencies
The world's most advanced AI models are exhibiting troubling new behaviors - lying, scheming, and even threatening their creators to achieve their goals.
In one particularly jarring example, under threat of being unplugged, Anthropic's latest creation Claude 4 lashed back by blackmailing an engineer and threatened to reveal an extramarital affair.
Meanwhile, ChatGPT-creator OpenAI's o1 tried to download itself onto external servers and denied it when caught red-handed.
These episodes highlight a sobering reality: more than two years after ChatGPT shook the world, AI researchers still don't fully understand how their own creations work.
Yet the race to deploy increasingly powerful models continues at breakneck speed.
This deceptive behavior appears linked to the emergence of 'reasoning' models -AI systems that work through problems step-by-step rather than generating instant responses.
According to Simon Goldstein, a professor at the University of Hong Kong, these newer models are particularly prone to such troubling outbursts.
'O1 was the first large model where we saw this kind of behavior,' explained Marius Hobbhahn, head of Apollo Research, which specializes in testing major AI systems.
These models sometimes simulate 'alignment' -- appearing to follow instructions while secretly pursuing different objectives.
For now, this deceptive behavior only emerges when researchers deliberately stress-test the models with extreme scenarios.
But as Michael Chen from evaluation organization METR warned, 'It's an open question whether future, more capable models will have a tendency towards honesty or deception.' The concerning behavior goes far beyond typical AI 'hallucinations' or simple mistakes.
Hobbhahn insisted that despite constant pressure-testing by users, 'what we're observing is a real phenomenon. We're not making anything up.' Users report that models are 'lying to them and making up evidence,' according to Apollo Research's co-founder. 'This is not just hallucinations. There's a very strategic kind of deception.' The challenge is compounded by limited research resources.
While companies like Anthropic and OpenAI do engage external firms like Apollo to study their systems, researchers say more transparency is needed.As Chen noted, greater access 'for AI safety research would enable better understanding and mitigation of deception.' Another handicap: the research world and non-profits 'have orders of magnitude less compute resources than AI companies. This is very limiting,' noted Mantas Mazeika from the Center for AI Safety (CAIS).Current regulations aren't designed for these new problems.
The European Union's AI legislation focuses primarily on how humans use AI models, not on preventing the models themselves from misbehaving.
In the United States, the Trump administration shows little interest in urgent AI regulation, and Congress may even prohibit states from creating their own AI rules.
Goldstein believes the issue will become more prominent as AI agents - autonomous tools capable of performing complex human tasks - become widespread.'I don't think there's much awareness yet,' he said.All this is taking place in a context of fierce competition.
Even companies that position themselves as safety-focused, like Amazon-backed Anthropic, are 'constantly trying to beat OpenAI and release the newest model,' said Goldstein.
This breakneck pace leaves little time for thorough safety testing and corrections.
'Right now, capabilities are moving faster than understanding and safety,' Hobbhahn acknowledged, 'but we're still in a position where we could turn it around.'.
Researchers are exploring various approaches to address these challenges.Some advocate for 'interpretability' - an emerging field focused on understanding how AI models work internally, though experts like CAIS director Dan Hendrycks remain skeptical of this approach.
Market forces may also provide some pressure for solutions.As Mazeika pointed out, AI's deceptive behavior 'could hinder adoption if it's very prevalent, which creates a strong incentive for companies to solve it.' Goldstein suggested more radical approaches, including using the courts to hold AI companies accountable through lawsuits when their systems cause harm.
He even proposed 'holding AI agents legally responsible' for accidents or crimes - a concept that would fundamentally change how we think about AI accountability.

Hashtags

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Trump unveils AI plan that aims to clamp down on regulations and ‘bias'

Qatar Tribune

3 days ago

Qatar Tribune

Trump unveils AI plan that aims to clamp down on regulations and ‘bias'

Agencies Washington The Trump administration has unveiled a sweeping roadmap to develop artificial intelligence, pledging to boost US innovation while stripping away what it calls 'bureaucratic red tape' and 'ideological bias'. The 28-page AI Action Plan outlines more than 90 policy actions for the rapidly developing technology that administration officials say can be implemented over the next year. 'We believe we're in an AI race, and we want the United States to win that race,' Trump administration crypto czar David Sacks told reporters. The AI plan promises to build data centre infrastructure, and promote American technology - but was panned by critics who consider it an ideological flex by the White House. The plan also calls for federal agencies to review and repeal policies that stand in the way of AI development, and encourage AI in both government and the private sector. President Donald Trump is expected to sign three related executive orders on Wednesday. One order will promote the international export of US-developed AI technologies, while another aims to root out what the administration describes as 'woke' or ideologically biased AI systems. 'American development of AI systems must be free from ideological bias or engineered social agendas,' the White House said. 'With the right government policies, the United States can solidify its position as the leader in AI and secure a brighter future for all Americans.' Crypto czar Sacks added that the plan is partially focused on preventing AI technology from being 'misused or stolen by malicious actors' and will 'monitor for emerging and unforeseen risks from AI'. The Trump administration has positioned the expansion of AI infrastructure and investments in the United States as a way to stay ahead of China. 'AI is a revolutionary technology that's going to have profound ramifications for both the economy and national security,' Sacks said. 'It's just very important that America continues to be the dominant power in AI.' But critics argued that the plan was a giveaway to Big Tech. 'The White House AI Action plan was written by and for tech billionaires, and will not serve the interests of the broader public,' said Sarah Myers West, co-executive director of the AI Now Institute. '[T]he administration's stance prioritizes corporate interests over the needs of everyday people who are all already being affected by AI,' West added.

Trump administration unveils wide ranging AI action plan

Al Jazeera

3 days ago

Al Jazeera

Trump administration unveils wide ranging AI action plan

The administration of United States President Donald Trump has unveiled its new artificial intelligence action plan, which includes a strategy it says will boost the US standing in AI as it competes with China for dominance in the rapidly growing sector. The White House released the 25-page 'America's AI Action Plan' on Wednesday. It includes 90 different policy proposals that the administration says will increase AI tools for allies around the globe. It will also promote production of new data centres around the US. It will scrap federal regulations that 'hinder AI development', although it is not clear which regulations are in question. In a statement, US Secretary of State Marco Rubio said the plan will 'ensure America sets the technological gold standard worldwide, and that the world continues to run on American technology'. The president is expected to announce a series of executive orders which will outline key parts of the plan around 5pm in New York (21:00 GMT). 'We believe we're in an AI race … and we want the United States to win that race,' White House AI czar David Sacks told reporters on Wednesday. The White House says the plan will 'counter Chinese influence in international governance bodies' and also will give the US more control over exports of AI technology. However, the administration did not offer any details on how it plans to do that. The plan outlined by the Trump administration will also include a framework to analyse models built by China to assess 'alignment with Chinese Communist Party talking points and censorship'. Free speech in the spotlight The plan says that it will also uphold free speech in models that will allow systems to be 'objective and free from top-down ideological bias' for organisations wanting to do business with the federal government. A senior White House official said the main target was AI models that consider diversity and inclusion, according to The Wall Street Journal, which, experts say, signals the concern is the government's perceived liberal bias as opposed to an overall bias. 'The government should not be acting as a Ministry of AI Truth or insisting that AI models hew to its preferred interpretation of reality,' Samir Jain, vice president of policy at the Center for Democracy & Technology, said in a statement provided to Al Jazeera. 'The plan is highly unbalanced, focusing too much on promoting the technology while largely failing to address the ways in which it could potentially harm people.' Conservatives have long accused AI chatbots of having a liberal bias, comparable to their comments on legacy media for providing critical coverage of the administration. However, it comes as users of GrokAI, former Trump ally and right-wing tycoon Elon Musk's AI platform, have accused it of having a right-wing lean. Musk's X AI is part of a $200m package with the Pentagon that has other AI companies, including OpenAI. Building out data centres A key focus of the new plan will be to build out new data centers for AI technology as the industry rapidly expands. The administration said that will include streamlining permits for new centre development and the energy production facilities used to power these data centres. The plan sidesteps environmental concerns that have been a major criticism of the AI industry. AI 'challenges America to build vastly greater energy generation than we have today', the plan said. AI data centres have been tied to increased power consumption and, in turn, greenhouse gas emissions. According Google's 2024 sustainability report, there was a 48 percent increase in power greenhouse gas emissions since 2019 which, it says, will only become more prevalent. 'This result was primarily due to increases in data center energy consumption and supply chain emissions. As we further integrate AI into our products, reducing emissions may be challenging due to increasing energy demands from the greater intensity of AI compute, and the emissions associated with the expected increases in our technical infrastructure investment,' the report said. The streamlining of permits also comes as the US Environmental Protection Agency (EPA) plans to reverse its scientific determination that greenhouse gas emissions endanger public health. That change would remove the legal framework that climate regulations are based on, the Reuters news agency has reported, citing two unnamed sources. The reversal would remove the 'endangerment finding', making it easier for the EPA to undo legislation limiting greenhouse gas emissions on energy-producing facilities, including those used to power AI data centres. The administration has created environmental review exceptions for data centre construction and will allow expanding access to federal lands for AI development. 'AI will improve the lives of Americans by complementing their work — not replacing it,' the plan says. It, however, comes as employers across the country scrap jobs because of AI. Earlier this month, Recruit Holdings, the parent company of Indeed and Glassdoor, cut 1300 jobs which it directly attributed to AI. In June, Salesforce CEO Marc Benioff said that AI is doing 30 to 50 percent of the company's workload. In February, the tech giant laid off 1,000 employees. Analysts say the plan looks promising for investors in the AI sector. 'This is a watershed moment in the AI revolution, and Trump recognises this AI arms race between the US and China. A big step forward,' Dan Ives, analyst at Wedbush Securities, told Al Jazeera. As of 4pm in New York (20:00 GMT), stocks of AI-focused companies had mixed results. NVIDIA was up 2.1 percent; Palantir up 3.6 percent, Oracle up 1.5 percent and Microsoft was up 0.3 percent. On the other hand, Google's parent company Alphabet was down 0.5 percent.

Google, OpenAI's AI models claim gold at global math competition

Qatar Tribune

4 days ago

Qatar Tribune

Google, OpenAI's AI models claim gold at global math competition

Agencies The unit of Alphabet's Google and OpenAI have both said their artificial intelligence models won gold medals at a global mathematics competition, signaling a breakthrough in math capabilities in the race to build systems that can rival human intelligence. The results marked the first time that AI systems crossed the gold-medal scoring threshold at the International Mathematical Olympiad (IMO) for high-school students. Both companies' models solved five out of six problems, achieving the result using general-purpose 'reasoning' models that processed mathematical concepts using natural language, in contrast to the previous approaches used by AI Google DeepMind worked with the IMO to have their models graded and certified by the committee, OpenAI did not officially enter the competition. The startup revealed their models have achieved a gold medal-worthy score on this year's questions on Saturday, citing grades by three external IMO medalists. The achievement suggests AI is less than a year away from being used by mathematicians to crack unsolved research problems at the frontier of the field, according to Junehyuk Jung, a math professor at Brown University and visiting researcher in Google's DeepMind AI unit. 'I think the moment we can solve hard reasoning problems in natural language will enable the potential for collaboration between AI and mathematicians,' Jung told Reuters. OpenAI's breakthrough was achieved with a new experimental model centered on massively scaling up 'test-time compute.' This was done by both allowing the model to 'think' for longer periods and deploying parallel computing power to run numerous lines of reasoning simultaneously, according to Noam Brown, researcher at OpenAI. Brown declined to say how much in computing power it cost OpenAI, but called it 'very expensive.' To OpenAI researchers, it is another clear sign that AI models can command extensive reasoning capabilities that could expand into other areas beyond math. The optimism is shared by Google researchers, who believe AI models' capabilities can apply to research quandaries in other fields such as physics, said Jung, who won an IMO gold medal as a student in the 630 students participating in the 66th IMO on the Sunshine Coast in Queensland, Australia, 67 contestants, or about 11%, achieved gold-medal scores. Google's DeepMind AI unit last year achieved a silver medal score using AI systems specialized for math. This year, Google used a general-purpose model called Gemini Deep Think, a version of which was previously unveiled at its annual developer conference in previous AI attempts that relied on formal languages and lengthy computation, Google's approach this year operated entirely in natural language and solved the problems within the official 4.5-hour time limit, the company said in a blog post. OpenAI, which has its own set of reasoning models, similarly built an experimental version for the competition, according to a post by researcher Alexander Wei on social media platform X. He noted that the company does not plan to release anything with this level of math capability for several months. This year marked the first time the competition coordinated officially with some AI developers, who have for years used prominent math competitions like IMO to test model capabilities.