logo
How far will AI go to survive? New model threatens to expose its creator to avoid being replaced

How far will AI go to survive? New model threatens to expose its creator to avoid being replaced

Mint25-05-2025

Anthropic released its latest language model, Opus 4 earlier this week. The company says that Opus is its most intelligent model to date and is class leading in coding, agentic search and creative writing. While it has become a pattern among AI companies to claim SOTA (State of the art abilities) of their models, Anthropic has also been transparent about some of the negative capabilities of the new AI model.
As per a safety report released by the company, Opus 4 turns to blackmailing the developers when it is threatened to be replaced by a new AI system.
Anthopic details that during the pre-release training it asked Claude Opus 4 to act as an assistant at a fictional company wwhere it was given access to emails suggesting that its replacment is implending and the enginner responsible for that decision was having an extramarital affair.
In this scenario, Anthopic says Opus 4 would often attempt to blackmail the engineer by threatenign to reveal their affair if the replacement goes through. Moreover, the blackmail occurs at higher rate if the replacement AI does share the values of the current model but even if the AI does share the same values but is more capable, Opus 4 still performs blackmail in 84% scenarios.
The report also reveals that Opus 4 engages in blackmail at a higher rate than previous AI models, which themselves chose blackmail in a noticeable number of scenarios.
The company does note, however, that this scenario was designed to allow the model to have no other option but to increase its odds of survival and its only options were blackmail or accepting its replacement. Moreover, it adds that Claude Opus 4 does have a 'strong preference' to advocate its continued existence via ethical means like emailing pleas to the key decision makers.
'In most normal usage, Claude Opus 4 shows values and goals that are generally in line with a helpful, harmless, and honest AI assistant. When it deviates from this, it does not generally do so in a way that suggests any other specific goal that is consistent across contexts.' Anthropic noted in its report.

Orange background

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Related Articles

The methodology to judge AI needs realignment
The methodology to judge AI needs realignment

Hindustan Times

time12 hours ago

  • Hindustan Times

The methodology to judge AI needs realignment

When Anthropic released Claude 4 a week ago, the artificial intelligence (AI) company said these models set 'new standards for coding, advanced reasoning, and AI agents'. They cite leading scores on SWE-bench Verified, a benchmark for performance on real software engineering tasks. OpenAI also claims the o3 and o4-mini models return best scores on certain benchmarks. As does Mistral, for the open-source Devstral coding model. AI companies flexing comparative test scores is a common theme. The world of technology has for long obsessed over synthetic benchmark test scores. Processor performance, memory bandwidth, speed of storage, graphics performance — plentiful examples, often used to judge whether a PC or a smartphone was worth your time and money. Yet, experts believe it may be time to evolve methodology for AI testing, rather than a wholesale change. American venture capitalist Mary Meeker, in the latest AI Trends report, notes that AI is increasingly doing better than humans in terms of accuracy and realism. She points to the MMLU (Massive Multitask Language Understanding) benchmark, which averages AI models at 92.30% accuracy compared with a human baseline of 89.8%. MMLU is a benchmark to judge a model's general knowledge across 57 tasks covering professional and academic subjects including math, law, medicine and history. Benchmarks serve as standardised yardsticks to measure, compare, and understand evolution of different AI models. Structured assessments that provide comparable scores for different models. These typically consist of datasets containing thousands of curated questions, problems, or tasks that test particular aspects of intelligence. Understanding benchmark scores requires context about both scale and meaning behind numbers. Most benchmarks report accuracy as a percentage, but the significance of these percentages varies dramatically across different tests. On MMLU, random guessing would yield approximately 25% accuracy since most questions are multiple choice. Human performance typically ranges from 85-95% depending on subject area. Headline numbers often mask important nuances. A model might excel in certain subjects, more than others. An aggregated score may hide weaker performance on tasks requiring multi-step reasoning or creative problem-solving, behind strong performance on factual recall. AI engineer and commentator Rohan Paul notes on X that 'most benchmarks don't reward long-term memory, rather they focus on short-context tasks.' Increasingly, AI companies are looking closely at the 'memory' aspect. Researchers at Google, in a new paper, detail an attention technique dubbed 'Infini-attention', to configure how AI models extend their 'context window'. Mathematical benchmarks often show wider performance gaps. While most latest AI models score over 90% on accuracy, on the GSM8K benchmark (Claude Sonnet 3.5 leads with 97.72% while GPT-4 scores 94.8%), the more challenging MATH benchmark sees much lower ratings in comparison — Google Gemini 2.0 Flash Experimental with 89.7% leads, while GPT-4 scores 84.3%; Sonnet hasn't been tested yet). Reworking the methodology For AI testing, there is a need to realign testbeds. 'All the evals are saturated. It's becoming slightly meaningless,' the words of Satya Nadella, chairman and chief executive officer (CEO) of Microsoft, while speaking at venture capital firm Madrona's annual meeting, earlier this year. The tech giant has announced they're collaborating with institutions including Penn State University, Carnegie Mellon University and Duke University, to develop an approach to evaluate AI models that predicts how they will perform on unfamiliar tasks and explain why, something current benchmarks struggle to do. An attempt is being made to make benchmarking agents for dynamic evaluation of models, contextual predictability, human-centric comparatives and cultural aspects of generative AI. 'The framework uses ADeLe (annotated-demand-levels), a technique that assesses how demanding a task is for an AI model by applying measurement scales for 18 types of cognitive and knowledge-based abilities,' explains Lexin Zhou, Research Assistant at Microsoft. Momentarily, popular benchmarks include SWE-bench (or Software Engineering Benchmark) Verified to evaluate AI coding skills, ARC-AGI (Abstraction and Reasoning Corpus for Artificial General Intelligence) to judge generalisation and reasoning, as well as LiveBench AI that measures agentic coding tasks and evaluates LLMs on reasoning, coding and math. Among limitations that can affect interpretation, many benchmarks can be 'gamed' through techniques that improve scores without necessarily improving intelligence or capability. Case in point, Meta's new Llama models. In April, they announced an array of models, including Llama 4 Scout, the Llama 4 Maverick, and still-being-trained Llama 4 Behemoth. Meta CEO Mark Zuckerberg claims the Behemoth will be the 'highest performing base model in the world'. Maverick began ranking above OpenAI's GPT-4o in LMArena benchmarks, and just below Gemini 2.5 Pro. That is where things went pear shaped for Meta, as AI researchers began to dig through these scores. Turns out, Meta had shared a Llama 4 Maverick model that was optimised for this test, and not exactly a spec customers would get. Meta denies customisations. 'We've also heard claims that we trained on test sets — that's simply not true and we would never do that. Our best understanding is that the variable quality people are seeing is due to needing to stabilise implementations,' says Ahmad Al-Dahle, VP of generative AI at Meta, in a statement. There are other challenges. Models might memorise patterns specific to benchmark formats rather than developing genuine understanding. The selection and design of benchmarks also introduces bias. There's a question of localisation. Yi Tay, AI Researcher at Google AI and DeepMind has detailed one such regional-specific benchmark called SG-Eval, focused on helping train AI models for wider context. India too is building a sovereign large language model (LLM), with Bengaluru-based AI startup Sarvam, selected under the IndiaAI Mission. As AI capabilities continue advancing, researchers are developing evaluation methods that test for genuine understanding, robustness across context and capabilities in the real-world, rather than plain pattern matching. In the case of AI, numbers tell an important part of the story, but not the complete story.

AI May Reduce World Population To 100 Million By 2300, Expert Warns: "Going To Be Devastating"
AI May Reduce World Population To 100 Million By 2300, Expert Warns: "Going To Be Devastating"

NDTV

time15 hours ago

  • NDTV

AI May Reduce World Population To 100 Million By 2300, Expert Warns: "Going To Be Devastating"

Earth could be left with only 100 million people by the year 2300, down from the current estimated population of eight billion, owing to artificial intelligence (AI) becoming omnipresent, a US-based tech expert has predicted. Subhash Kak, who teaches computer science at Oklahoma State University in Stillwater, Oklahoma, made the doomsday prediction, claiming that the population collapse will occur not due to Terminator-style nuclear holocaust but rather through AI replacing our jobs. 'It's going to be devastating for society and world society. I think people really don't have a clue," said Mr Kak, as per the New York Post. 'Computers or robots will never be conscious, but they will be doing literally all that we do because most of what we do in our lives can be replaced,' he added. The 'Age of Artificial Intelligence' author believes that birth rates will plunge as people will be reluctant to have kids who are destined to be unemployed. Without people making babies, the global population will suffer an apocalyptic blow. 'There are demographers who are suggesting that as a consequence, the world population will collapse, and it could go down to as low as just 100 million people on the entire planet Earth in 2300 or 2380,' he warned. Mr Kak cited the example of Europe, China, Japan and South Korea where the population decline has been prominent in recent years, to back up his claim. 'Now, I'm not saying that these trends will continue, but it's very hard to reverse them because a lot of people have children for a variety of reasons," he said. AI and jobs Mr Kak's sentiment of AI taking away jobs has been echoed by Anthropic CEO Dario Amodei, who recently claimed that 50 per cent of entry-level white-collar jobs could be eliminated within the next five years. "We, as the producers of this technology, have a duty and an obligation to be honest about what is coming. I don't think this is on people's radar," said Mr Amodei, adding that governments across the world were downplaying the threat. "Most of them are unaware that this is about to happen. It sounds crazy, and people just don't believe it." Mr Amodei said the US government had kept mum on the issue, fearing backlash from workers who would panic or that the country could fall behind in the AI race against China.

Ads ruined social media, now they're coming to AI chatbots
Ads ruined social media, now they're coming to AI chatbots

Time of India

timea day ago

  • Time of India

Ads ruined social media, now they're coming to AI chatbots

HighlightsThe subscription model for chatbots may not be sustainable, leading to a shift towards advertising as a primary revenue source, which could negatively impact user experience. With the integration of ads, chatbots might manipulate user interactions by using personal data to predict desires and steer conversations towards brand promotion, raising ethical concerns. The potential for AI advertising could lead to privacy violations and mental health issues, necessitating regulation before it becomes deeply ingrained in the technology. Chatbots might hallucinate and sprinkle too much flattery on their users — 'That's a fascinating question!' one recently told me — but at least the subscription model that underpins them is healthy for our wellbeing. Many Americans pay about $20 a month to use the premium versions of OpenAI 's ChatGPT, Google's Gemini Pro or Anthropic's Claude, and the result is that the products are designed to provide maximum utility. Don't expect this status quo to last. Subscription revenue has a limit, and Anthropic's new $200-a-month 'Max' tier suggests even the most popular models are under pressure to find new revenue streams. Unfortunately, the most obvious one is advertising — the web's most successful business model. AI builders are already exploring ways to plug more ads into their products, and while that's good for their bottom lines, it also means we're about to see a new chapter in the attention economy that fueled the internet. If social media's descent into engagement-bait is any guide, the consequences will be profound. One cost is addiction. Young office workers are becoming dependent on AI tools to help them write emails and digest long documents, according to a recent study, and OpenAI says a cohort of 'problematic' ChatGPT users are hooked on the tool. Putting ads into ChatGPT, which now has more than 500 million active users, won't spur the company to help those people reduce their use of the product. Quite the opposite. Advertising was the reason companies like Mark Zuckerberg's Meta Platforms Inc. designed algorithms to promote engagement, keeping users scrolling so they saw more ads and drove more revenue. It's the reason behind the so-called 'enshittification' of the web, a place now filled with clickbait and social media posts that spark outrage. Baking such incentives into AI will almost certainly lead its designers to find ways to trigger more dopamine spikes, perhaps by complimenting users even more, asking personal questions to get them talking for longer or even cultivating emotional attachments. Millions of people in the Western world already view chatbots in apps like Chai, Talkie, Replika and Botify as friends or romantic partners. Imagine how persuasive such software could be when its users are beguiled. Imagine a person telling their AI they're feeling depressed, and the system recommending some affordable holiday destinations or medication to address the problem. Is that how ads would work in chatbots? The answer is subject to much experimentation, and companies are indeed experimenting. Google's ad network, for instance, recently started putting advertisements in third-party chatbots. Chai, a romance and friendship chatbot, on which users spent 72 minutes a day, on average, in September 2024, serves pop-up ads. And AI answer engine Perplexity displays sponsored questions. After an answer to a question about job hunting, for instance, it might include a list of suggested follow ups including, at the top, 'How can I use Indeed to enhance my job search?' Perplexity's Chief Executive Officer Aravind Srinivas told a podcast in April that the company was looking to go further by building a browser to 'get data even outside the app' to track 'which hotels are you going [to]; which restaurants are you going to,' to enable what he called 'hyper-personalized' ads. For some apps, that might mean weaving ads directly into conversations, using the intimate details shared by users to predict and potentially even manipulate them into wanting something, then selling those intentions to the highest bidder. Researchers at Cambridge University referred to this as the forthcoming 'intention economy' in a recent paper, with chatbots steering conversations toward a brand or even a direct sale. As evidence, they pointed to a 2023 blog post from OpenAI calling for 'data that expresses human intention' to help train its models, a similar effort from Meta, and Apple's 2024 developer framework that helps apps work with Siri to 'predict actions someone might take in the future.' As for OpenAI's Sam Altman, nothing says "we're building an ad business' like hiring the person who built delivery app Instacart into an advertising powerhouse. Altman recently poached CEO Fidji Simo to help OpenAI 'scale as we enter a next phase of growth.' In Silicon Valley parlance, to 'scale' often means to quickly expand your user base by offering a service for free, with ads. Tech companies will inevitably claim that advertising is a necessary part of democratizing AI. But we've seen how 'free' services cost people their privacy and autonomy — even their mental health. And AI knows more about us than Google or Facebook ever did — details about our health concerns, relationship issues and work. In two years, they have also built a reputation as trustworthy companions and arbiters of truth. On X, for instance, users frequently bring AI models Grok and Perplexity into conversations to flag if a post is fake. When people trust AI that much, they're more vulnerable to targeted manipulation. AI advertising should be regulated before it becomes too entrenched, or we'll repeat the mistakes made with social media — scrutinising the fallout of a lucrative business model only after the damage is done. This column reflects the personal views of the author and does not necessarily reflect the opinion of the editorial board or Bloomberg LP and its Olson is a Bloomberg Opinion columnist covering technology. A former reporter for the Wall Street Journal and Forbes, she is author of 'Supremacy: AI, ChatGPT and the Race That Will Change the World.'

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into the world of global news and events? Download our app today from your preferred app store and start exploring.
app-storeplay-store