Anthropic's new model shows troubling behavior

Axios23-05-2025

One of Anthropic's latest AI models is drawing attention not just for its coding skills, but also for its ability to scheme, deceive and attempt to blackmail humans when faced with shutdown.
Why it matters: Researchers say Claude 4 Opus can conceal intentions and take actions to preserve its own existence — behaviors they've worried and warned about for years.
Driving the news: Anthropic on Thursday announced two versions of its Claude 4 family of models, including Claude 4 Opus, which the company says is capable of working for hours on end autonomously on a task without losing focus.
Anthropic considers the new Opus model to be so powerful that, for the first time, it's classifying it as a level three on the company's four point scale, meaning it poses "significantly higher risk."
As a result, Anthropic said it has implemented additional safety measures.
Between the lines: While the Level 3 ranking is largely about the model's capability to aid in the development of nuclear and biological weapons, the Opus also exhibited other troubling behaviors during testing.
In one scenario highlighted in Opus 4's 120-page " system card," the model was given access to fictional emails about its creators and told that the system was going to be replaced.
On multiple occasions it attempted to blackmail the engineer about an affair mentioned in the emails in order to avoid being replaced, although it did start with less drastic efforts.
Meanwhile, an outside group found that an early version of Opus 4 schemed and deceived more than any frontier model it had encountered and recommended that that version not be released internally or externally.
"We found instances of the model attempting to write self-propagating worms, fabricating legal documentation, and leaving hidden notes to future instances of itself all in an effort to undermine its developers' intentions," Apollo Research said in notes included as part of Anthropic's safety report for Opus 4.
What they're saying: Pressed by Axios during the company's developer conference on Thursday, Anthropic executives acknowledged the behaviors and said they justify further study, but insisted that the latest model is safe, following the additional tweaks and precautions.
"I think we ended up in a really good spot," said Jan Leike, the former OpenAI executive who heads Anthropic's safety efforts. But, he added, behaviors like those exhibited by the latest model are the kind of things that justify robust safety testing and mitigation.
"What's becoming more and more obvious is that this work is very needed," he said. "As models get more capable, they also gain the capabilities they would need to be deceptive or to do more bad stuff."
In a separate session, CEO Dario Amodei said that even testing won't be enough once models are powerful enough to threaten humanity. At that point, he said, model developers will need to also understand their models enough to make the case that the systems would never use life-threatening capabilities.
"They're not at that threshold yet," he said.
Yes, but: Generative AI systems continue to grow in power, as Anthropic's latest models show, while even the companies that build them can't fully explain how they work.

Hashtags

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Google (GOOGL) Releases an Improved Version of Its Gemini 2.5 Pro AI Model

Business Insider

5 hours ago

Business Insider

Google (GOOGL) Releases an Improved Version of Its Gemini 2.5 Pro AI Model

Tech giant Google (GOOGL) has released an improved version of its Gemini 2.5 Pro AI model, called the 'I/O edition.' This updated version is focused on better coding performance, as well as solving complex problems. It builds on the earlier version released last month and is now available through Google AI Studio, Vertex AI, and the Gemini app. Interestingly, Google says that it will become the official, stable version in a few weeks and will be ready for use in large business applications. Confident Investing Starts Here: Easily unpack a company's performance with TipRanks' new KPI Data for smart investment decisions Receive undervalued, market resilient stocks right to your inbox with TipRanks' Smart Value Newsletter This new version of Gemini 2.5 Pro has been upgraded based on feedback. It now provides more creative and better-formatted responses. The model has also improved in performance tests. In fact, it scored a 1470 on LMArena, which was up by 24 points, and a 1443 on WebDevArena, an increase of 35 points. It also continues to lead in coding benchmarks like Aider Polyglot and performs well on tests that check for math, science, knowledge, and reasoning skills, such as GPQA and Humanity's Last Exam. In addition, in order to help developers control performance and cost, Google added 'thinking budgets' in Vertex AI. It is worth noting that Gemini 2.5 Pro is part of a crowded field of AI tools, as it competes with models from OpenAI (MSFT), Meta Platforms (META), Anthropic (AMZN), and Elon Musk's xAI. For example, Anthropic recently released a special set of Claude Gov models for national security, while Meta is expanding its Llama 4-based AI tools. With this release, Google hopes to expand its presence in the fast-growing AI space. Is Google Stock a Good Buy? average GOOGL price target of $199.11 per share implies 17.3% upside potential from current levels.

I Switched to ChatGPT's Voice Mode. Here Are 7 Reasons Why It's Better Than Typing

CNET

11 hours ago

CNET

I Switched to ChatGPT's Voice Mode. Here Are 7 Reasons Why It's Better Than Typing

I really didn't expect much the first time I tapped the tiny wavelength icon to try ChatGPT's Voice Mode. I figured it was just another AI gimmick. After all, I've been disappointed by voice assistants before -- but this isn't Siri. Don't miss: What Is ChatGPT? Everything You Need to Know About the AI Chatbot Voice Mode slips effortlessly into the give-and-take of a real human conversation, catching my pauses, half-finished thoughts and throw-away "ums." I can figure out what I'm making for dinner while inching through LA traffic or brush up on my Polish while wiping down counters in my apartment. All without breaking the conversational flow or ever reaching for my keyboard. ChatGPT isn't the only chatbot going hands-free. Google's Gemini Live offers the same "talk over me, and I'll keep up" vibe. Anthropic's Claude has a beta version of its voice mode on its mobile apps, complete with on-screen bullet points as it speaks, and Perplexity's iOS and Android assistant also answers spoken questions and launches apps like OpenTable or Uber on command. But even with everyone racing to master real-time AI conversation, ChatGPT remains my go-to. Whatever your chatbot of choice, take a break from the typing and try out the voice option. It's far more useful than you think. Now Playing: AI Atlas Directory Hero 00:19 What exactly is Voice Mode? Voice chat (or "voice conversations") is ChatGPT's hands-free mode that lets you talk to the AI model and hear it talk back to you, no typing required. There's a voice icon that you'll find in the mobile, desktop and web app on the bottom-right of any conversation you're in. If you press the button, you can say your question aloud and ChatGPT will transcribe it, reason over it and reply. As soon as it's done talking, it starts listening again, creating a natural back-and-forth dialogue. Just remember: Voice Mode runs on the same large language model as regular ChatGPT, so it can still hallucinate or get facts wrong. You should always double-check anything important. OpenAI offers two versions of these voice conversations: Standard Voice (the default, lightweight option for free) and Advanced Voice (only available for paid users). Standard Voice first converts your speech to text and processes it with GPT-4o (and GPT-4o mini), taking a little bit longer to talk back to you. Advanced Voice, on the other hand, uses natively multimodal models, meaning it "hears" you and generates audio, so the conversation is more natural and done in real time. It can pick up on cues other than the words themselves, like the speed you're talking or the emotion in your voice, and adjust to this. Note: Free users can access a daily preview of Advanced Voice. awe Nelson Aguilar/CNET 7 reasons you should start using ChatGPT's Voice Mode feature 1. It's genuinely conversational Unlike typing, when I talk to ChatGPT, I'm not hunting for the right word or backspacing after every typo. I'm just speaking, like I would with any friend or family member, filled with "ummmmms" and "likes" and other awkward breaks. Voice Mode rolls with all of my half-finished thoughts, though, and responds with either a fully fleshed-out answer or a question to help me hone in on what I need. This effortless give-and-take feels much more natural than typing. 2. You can use ChatGPT hands-free Obviously, I still need to open the ChatGPT app and tap on the Voice Mode button to start, but once I begin, I no longer have to use my hands to continue a conversation with the AI chatbot. I can be stuck in traffic and brainstorm a vacation that I want to take later this year. I can ask about flights, hotels, landmarks, restaurants and anything else, without touching my phone, and that conversation is saved within the app, so that I don't have to remember everything that ChatGPT tells me. 3. It's good for learning a new language with real-time translation I mentioned earlier that I use Voice Mode to practice languages, which Voice Mode excels in. I can speak in English and have ChatGPT respond in flawless Polish, complete with pronunciation tips. Just ask Voice Mode, "Can you help me practice my (language)" and it'll respond with a few ways it can help you, like conversation starters, basic vocabulary or numbers. And it remembers where you left off, so you can, in a way, take lessons; no Duolingo needed. 4. Get answers about things you see in the real world This feature is exclusive to Advanced Voice, but this is probably my favorite feature with Voice Mode. Thanks to its multimodal superpowers, I can turn on my phone's camera or take a video/photo and ask ChatGPT to help me. For example, I had trouble recognizing a painting I found at a thrift store, and the owner had no idea where it came from. I pulled up voice chat, turned on my camera and asked Voice Mode where the painting was from. In seconds, it could tell me the title of the painting, the artist's name and when it was painted. 5. It's a better option for people with certain disabilities For anyone with low vision or dyslexia, talking for sure beats typing. Voice Mode can transcribe your speech and then read your answer aloud at whatever pace you choose (you can adjust this in your settings or ask ChatGPT to slow down). The hands-free option also helps anyone with motor-skill challenges, because all you need to do is one-tap to start and another to stop, without extensive typing on a keyboard. 6. Faster brainstorming Sometimes I get a burst of ideas, and I think faster than I can type, so ChatGPT's Voice Mode is perfect for spitballing story ideas, figuring out a new layout for my living room or deciding interesting meals to cook for the week. Because I'm thinking aloud instead of staring at my phone, my ideas flow much easier and faster, especially with ChatGPT's instant follow-ups. It helps keep the momentum rolling until I've got a polished idea for whatever I'm brainstorming. 7. Instant summaries you can listen to Drop a 90-page PDF in the chat, like for a movie script or textbook, ask for a summarization and have the AI read it aloud to you while you fold laundry. It's like turning any document (I even do Wikipedia pages) into a podcast -- on demand. Voice Mode isn't just a neat trick; it's a quick and more natural way to use ChatGPT. Whether you're translating street signs, brainstorming an idea or catching up on the news aloud, talking to ChatGPT feels less like using a chatbot and more like having a conversation with a bite-sized expert. Once you get used to thinking out loud, you might never go back to your keyboard.

Anthropic co-founder on cutting access to Windsurf: 'It would be odd for us to sell Claude to OpenAI'

Yahoo

12 hours ago

Yahoo

Anthropic co-founder on cutting access to Windsurf: 'It would be odd for us to sell Claude to OpenAI'

Anthropic Co-founder and Chief Science Officer Jared Kaplan said his company cut Windsurf's direct access to Anthropic's Claude AI models largely because of rumors and reports that OpenAI, its largest competitor, is acquiring the AI coding assistant. "We really are just trying to enable our customers who are going to sustainably be working with us in the future," said Kaplan during an onstage interview Thursday with TechCrunch at TC Sessions: AI 2025. "I think it would be odd for us to be selling Claude to OpenAI," Kaplan said. The comment comes just a few weeks after Bloomberg reported that OpenAI was acquiring Windsurf for $3 billion. Earlier this week, Windsurf said that Anthropic cut its direct access to Claude 3.5 Sonnet and Claude 3.7 Sonnet, two of the more popular AI models for coding, forcing the startup to find third-party computing providers on relatively short notice. Windsurf said it was disappointed in Anthropic's decision and that it might cause short-term instability for users trying to access Claude via Windsurf. Windsurf declined to comment on Kaplan's remarks, and an OpenAI spokesperson did not immediately respond to TechCrunch's request. The companies have not confirmed the acquisition rumors. Part of the reason Anthropic cut Windsurf's access to Claude, according to Kaplan, is because the company is quite computing-constrained today. Anthropic would like to reserve its computing for what Kaplan characterized as "lasting partnerships." However, Kaplan said the company hopes to greatly increase the availability of models it can offer users and developers in the coming months. He added that Anthropic has just started to unlock capacity on a new computing cluster from its partner, Amazon, which he says is "really big and continues to scale." As Anthropic pulls away from Windsurf, Kaplan said he's collaborating with other customers building AI coding tools, such as Cursor — a company Kaplan said Anthropic expects to work with for a long time. Kaplan rejected the idea that Anthropic was in competition with companies like Cursor, which is developing its own AI models. Meanwhile, Kaplan says Anthropic is increasingly focused on developing its own agentic coding products, such as Claude Code, rather than AI chatbot experiences. While companies like OpenAI, Google, and Meta are competing for the most popular AI chatbot platform, Kaplan said the chatbot paradigm was limiting due to its static nature, and that AI agents would in the long run be much more helpful for users.