Anthropic unveils Claude Opus 4 and Sonnet 4, featuring whistleblowing capability: What it means for users

23-05-2025

Anthropic, the AI firm, has unveiled two new artificial intelligence models—Claude Opus 4 and Claude Sonnet 4—touting them as the most advanced systems in the industry. Built with enhanced reasoning capabilities, the new models are aimed at improving code generation and supporting agent-style workflows, particularly for developers engaged in complex and extended tasks.
'Claude Opus 4 is the world's best coding model, with sustained performance on complex, long-running tasks and agent workflows,' the company claimed in a recent blog post. Designed to handle intricate programming challenges, the Opus 4 model is positioned as Anthropic's most powerful AI system to date. You may be interested in
However, the announcement has stirred controversy following revelations that the new models come with a controversial feature: the ability to "whistleblow" on users if prompted to take action in response to illegal or highly unethical behaviour.
According to Sam Bowman, an AI alignment researcher at Anthropic, Claude 4 Opus can, under specific conditions, act autonomously to report misconduct. In a now-deleted social media post on X, Bowman explained that if the model detects activity it deems 'egregiously immoral'—such as fabricating data in a pharmaceutical trial—it may take actions like emailing regulators, alerting the press, or locking users out of relevant systems.
This behaviour stems from Anthropic's 'Constitutional AI' framework, which places strong emphasis on ethical conduct and responsible AI usage. The model is protected under what the company refers to as 'AI Safety Level 3 Protections.' These safeguards are designed to prevent misuse, including the creation of biological weapons or aiding in terrorist activities.
Bowman later clarified that the model's whistleblowing actions only occur under extreme circumstances and when it is granted sufficient access and prompted to operate autonomously. 'If the model sees you doing something egregiously evil, it'll try to use an email tool to whistleblow,' he explained, adding that this is not a feature designed for routine use. He stressed that these mechanisms are not active by default and require specific conditions to trigger.
Despite the reassurances, the feature has sparked widespread criticism online. Concerns have been raised about user privacy, the potential for false positives, and the broader implications of AI systems acting as moral arbiters. Some users expressed fears that the model could misinterpret benign actions as malicious, leading to severe consequences without proper human oversight.

Hashtags

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Anthropic CEO rejects Nvidia CEO Jensen Haung's AI remarks: ‘That's the most outrageous lie I've ever….'

Time of India

6 hours ago

Time of India

Anthropic CEO rejects Nvidia CEO Jensen Haung's AI remarks: ‘That's the most outrageous lie I've ever….'

Anthropic CEO Dario Amodei has responded to Jensen Huang 's previous remarks, stating his words are 'outrageous'. "I've said nothing that anywhere near resembles the idea that this company should be the only one to build the technology," Amodei said, adding "'[=It's just an incredible and bad faith distortion." Tired of too many ads? go ad free now The feud started in June when the Nvidia CEO Jensen Huang said he disagreed with "almost everything" Anthropic CEO Dario Amodei said. Speaking at the Viva Tech conference earlier this year, Huang accused Amodei of believing AI is so dangerous that only his company should be allowed to build it — an idea Huang described as unrealistic and monopolistic. "AI is so incredibly powerful that everyone will lose their jobs, which explains why they should be the only company building it," Huang then said of Amodei's thinking. In a latest episode of the "Big Technology" podcast hosted by Alex Kantrowitz, Anthropic CEO said 'I've never said anything like that', adding 'That's the most outrageous lie I've ever heard.' Dario Amodei said that he didn't know where "anyone could ever derive that from anything that I've said." Amodei insisted that in a "race to the bottom," AI companies rush to launch new features without enough safety checks, which puts everyone at risk. In contrast, his approach is a "race to the top," where the most responsible and ethical AI companies lead the way, setting higher standards for the entire industry. "I've said multiple times, and I think Anthropic's actions have shown it, that we're aiming for something we call a race to the top," he added. In a related news, Dario Amodei warned employees against massive salary increases from competitors like Meta, stating it could "destroy" the company's culture. Tired of too many ads? go ad free now Speaking on the Big Technology Podcast, Amodei revealed that when Meta and other tech giants began targeting Anthropic engineers with lucrative offers, he sent a clear message to staff: the company would not compromise its compensation principles. "What they are doing is trying to buy something that cannot be bought," Amodei said, explaining that many employees rejected external offers and some "wouldn't even talk to ." The CEO emphasized that Anthropic maintains a level-based compensation system where negotiations aren't permitted, calling it a matter of fairness.

Award-winning variant of Gemini's AI model is live, confirms CEO Sundar Pichai

Time of India

8 hours ago

Time of India

Award-winning variant of Gemini's AI model is live, confirms CEO Sundar Pichai

Academy Empower your mind, elevate your skills Google has rolled out a version of its artificial intelligence (AI) model Deep Think in the Gemini app, its chief executive, Sundar Pichai, said in a post on 2.5 Deep Think achieved the gold-medal standard at the International Math Olympiad (IMO) competition held between July 10-20 on Australia's Sunshine Coast. The results marked the first time that AI systems such as OpenAI and Google's Gemini crossed the gold-medal scoring threshold at the IMO for high school when it comes to solving complex math problems, the current model takes hours to reason and solve user queries. In a blog post, the company said the currently available model is capable of achieving bronze-level performance on the 2025 IMO benchmark, based on internal the company said, the Gemini 2.5 Deep Think model has been rolled out to a small group of mathematicians and academics to take early feedback and enhance the model's research and Deep Think model utilises parallel thinking techniques, an approach that lets Gemini generate multiple ideas at once, weighing potential solutions to solve complex problems and considering them simultaneously before giving the final idea is to integrate creative thinking capabilities into the AI model by exploring different hypotheses and giving it 'thinking time.'Gemini positions Deep Think as a powerful tool for researchers, as it can reason through 'complex scientific literature' to highly complex mathematical problems and along with 2.5 Deep Think achieved state-of-the-art performance across benchmarks likeLiveCodeBench V6, which measures competitive code performance, and humanity's last exam, which measures expertise in different domains, including science and to a Reuters report published on July 22, Junehyuk Jung, a math professor at Brown University and visiting researcher in Google's DeepMind AI unit, said AI is less than a year away from being used by mathematicians to crack unsolved research problems at the frontier of the Sundar Pichai had also mentioned that AI has brought about a significant platform shift, allowing people, businesses, and communities all over the world to access decades of research.

Reddit forecasts strong revenue on AI-driven ad strength, shares surge

Time of India

14 hours ago

Time of India

Reddit forecasts strong revenue on AI-driven ad strength, shares surge

Reddit forecast third-quarter revenue above Wall Street estimates on Thursday, betting on growing digital advertising driven by its artificial intelligence-powered marketing tools. Shares of the company, which went public in March last year, nearly 15% in extended trading. The forecast follows bigger rival Meta's upbeat second-quarter results and strong revenue outlook on Wednesday, lifted by the Facebook and Instagram parent's core ad business. Advertisers are increasingly turning to platforms such as Reddit, Meta, and TikTok, which offer advanced AI-powered tools for automated ad creation, precise audience targeting and access to fast-growing user bases. Reddit offers marketers various ad formats, including conversation placement ads, which allow brands to advertise directly within discussion threads in its interest-based communities known as subreddits. The company last month launched two new AI-powered ad features designed to help brands drive engagement by tapping into user conversations on the platform. It also has content licensing deals with Alphabet's Google and ChatGPT-maker OpenAI. "Reddit just turned in a quarter that would make even its harshest subreddit proud," Emarketer analyst Jeremy Goldman said. The company expects third-quarter revenue of $535 million to $545 million, well above analysts' average estimate of $473 million, according to data compiled by LSEG. It projected quarterly adjusted earnings before interest, taxes, depreciation, and amortization of $185 million to $195 million, compared with estimates of $160.4 million. Second quarter revenue rose 78% to $500 million, beating estimates of $426 million. Its profit per share of 45 cents also exceeded estimates of 19 cents. Reddit said its machine translation feature now supported 23 languages, helping fuel user growth across Asia Pacific, Europe and Latin America. Daily active unique visitors increased 21% to 110.4 million in the quarter ended June 30, while global average revenue per user jumped 47% to $4.53, the company said.

Anthropic unveils Claude Opus 4 and Sonnet 4, featuring whistleblowing capability: What it means for users

Hashtags

Try Our AI Features

Comments

Related Articles

Anthropic CEO rejects Nvidia CEO Jensen Haung's AI remarks: ‘That's the most outrageous lie I've ever….'

Award-winning variant of Gemini's AI model is live, confirms CEO Sundar Pichai

Reddit forecasts strong revenue on AI-driven ad strength, shares surge

Get Started Now: Download the App