Exclusive: New Claude Model Prompts Safeguards at Anthropic
A smartphone displaying the logo of Claude, an AI language model developed by Anthropic. Credit -Today's newest AI models might be capable of helping would-be terrorists create bioweapons or engineer a pandemic, according to the chief scientist of the AI company Anthropic.
Anthropic has long been warning about these risks—so much so that in 2023, the company pledged to not release certain models until it had developed safety measures capable of constraining them.
Now this system, called the Responsible Scaling Policy (RSP), faces its first real test.
On Thursday, Anthropic launched Claude Opus 4, a new model that, in internal testing, performed more effectively than prior models at advising novices on how to produce biological weapons, says Jared Kaplan, Anthropic's chief scientist. 'You could try to synthesize something like COVID or a more dangerous version of the flu—and basically, our modeling suggests that this might be possible,' Kaplan says.
Accordingly, Claude Opus 4 is being released under stricter safety measures than any prior Anthropic model. Those measures—known internally as AI Safety Level 3 or 'ASL-3'—are appropriate to constrain an AI system that could 'substantially increase' the ability of individuals with a basic STEM background in obtaining, producing or deploying chemical, biological or nuclear weapons, according to the company. They include beefed-up cybersecurity measures, jailbreak preventions, and supplementary systems to detect and refuse specific types of harmful behavior.
To be sure, Anthropic is not entirely certain that the new version of Claude poses severe bioweapon risks, Kaplan tells TIME. But Anthropic hasn't ruled that possibility out either.
'If we feel like it's unclear, and we're not sure if we can rule out the risk—the specific risk being uplifting a novice terrorist, someone like Timothy McVeigh, to be able to make a weapon much more destructive than would otherwise be possible—then we want to bias towards caution, and work under the ASL-3 standard,' Kaplan says. 'We're not claiming affirmatively we know for sure this model is risky … but we at least feel it's close enough that we can't rule it out.'
If further testing shows the model does not require such strict safety standards, Anthropic could lower its protections to the more permissive ASL-2, under which previous versions of Claude were released, he says.
This moment is a crucial test for Anthropic, a company that claims it can mitigate AI's dangers while still competing in the market. Claude is a direct competitor to ChatGPT, and brings in over $2 billion in annualized revenue. Anthropic argues that its RSP thus creates an economic incentive for itself to build safety measures in time, lest it lose customers as a result of being prevented from releasing new models. 'We really don't want to impact customers,' Kaplan told TIME earlier in May while Anthropic was finalizing its safety measures. 'We're trying to be proactively prepared.'
But Anthropic's RSP—and similar commitments adopted by other AI companies—are all voluntary policies that could be changed or cast aside at will. The company itself, not regulators or lawmakers, is the judge of whether it is fully complying with the RSP. Breaking it carries no external penalty, besides possible reputational damage. Anthropic argues that the policy has created a 'race to the top' between AI companies, causing them to compete to build the best safety systems. But as the multi-billion dollar race for AI supremacy heats up, critics worry the RSP and its ilk may be left by the wayside when they matter most.
Still, in the absence of any frontier AI regulation from Congress, Anthropic's RSP is one of the few existing constraints on the behavior of any AI company. And so far, Anthropic has kept to it. If Anthropic shows it can constrain itself without taking an economic hit, Kaplan says, it could have a positive effect on safety practices in the wider industry.
Anthropic's ASL-3 safety measures employ what the company calls a 'defense in depth' strategy—meaning there are several different overlapping safeguards that may be individually imperfect, but in unison combine to prevent most threats.
One of those measures is called 'constitutional classifiers:' additional AI systems that scan a user's prompts and the model's answers for dangerous material. Earlier versions of Claude already had similar systems under the lower ASL-2 level of security, but Anthropic says it has improved them so that they are able to detect people who might be trying to use Claude to, for example, build a bioweapon. These classifiers are specifically targeted to detect the long chains of specific questions that somebody building a bioweapon might try to ask.
Anthropic has tried not to let these measures hinder Claude's overall usefulness for legitimate users—since doing so would make the model less helpful compared to its rivals. 'There are bioweapons that might be capable of causing fatalities, but that we don't think would cause, say, a pandemic,' Kaplan says. 'We're not trying to block every single one of those misuses. We're trying to really narrowly target the most pernicious.'
Another element of the defense-in-depth strategy is the prevention of jailbreaks—or prompts that can cause a model to essentially forget its safety training and provide answers to queries that it might otherwise refuse. The company monitors usage of Claude, and 'offboards' users who consistently try to jailbreak the model, Kaplan says. And it has launched a bounty program to reward users for flagging so-called 'universal' jailbreaks, or prompts that can make a system drop all its safeguards at once. So far, the program has surfaced one universal jailbreak which Anthropic subsequently patched, a spokesperson says. The researcher who found it was awarded $25,000.
Anthropic has also beefed up its cybersecurity, so that Claude's underlying neural network is protected against theft attempts by non-state actors. The company still judges itself to be vulnerable to nation-state level attackers—but aims to have cyberdefenses sufficient for deterring them by the time it deems it needs to upgrade to ASL-4: the next safety level, expected to coincide with the arrival of models that can pose major national security risks, or which can autonomously carry out AI research without human input.
Lastly the company has conducted what it calls 'uplift' trials, designed to quantify how significantly an AI model without the above constraints can improve the abilities of a novice attempting to create a bioweapon, when compared to other tools like Google or less advanced models. In those trials, which were graded by biosecurity experts, Anthropic found Claude Opus 4 presented a 'significantly greater' level of performance than both Google search and prior models, Kaplan says.
Anthropic's hope is that the several safety systems layered over the top of the model—which has already undergone separate training to be 'helpful, honest and harmless'—will prevent almost all bad use cases. 'I don't want to claim that it's perfect in any way. It would be a very simple story if you could say our systems could never be jailbroken,' Kaplan says. 'But we have made it very, very difficult.'
Still, by Kaplan's own admission, only one bad actor would need to slip through to cause untold chaos. 'Most other kinds of dangerous things a terrorist could do—maybe they could kill 10 people or 100 people,' he says. 'We just saw COVID kill millions of people.'
Write to Billy Perrigo at billy.perrigo@time.com.
Hashtags

Try Our AI Features
Explore what Daily8 AI can do for you:
Comments
No comments yet...
Related Articles


Gizmodo
5 hours ago
- Gizmodo
OpenAI Wants to get College Kids Hooked on AI
AI chatbots like OpenAI's ChatGPT have been shown repeatedly to provide false information, hallucinate completely made-up sources and facts, and lead people astray with their confidently wrong answers to questions. For that reason, AI tools are viewed with skepticism by many educators. So, of course, OpenAI and its competitors are targeting colleges and pushing its services on students—concerns be damned. According to the New York Times, OpenAI is in the midst of a major push to make ChatGPT a fixture on college campuses, replacing many aspects of the college experience with AI alternatives. According to the report, the company wants college students to have a 'personalized AI account' as soon as they step on campus, same as how they receive a school email address. It envisions ChatGPT serving as everything from a personal tutor to a teacher's aide to a career assistant that helps students find work after graduation. Some schools are already buying in, despite the educational world initially greeting AI with distrust and outright bans. Per the Times, schools like the University of Maryland, Duke University, and California State University have all signed up for OpenAI's premium service, ChatGPT Edu, and have started to integrate the chatbot into different parts of the educational experience. It's not alone in setting its sights on higher education, either. Elon Musk's xAI offered free access to its chatbot Grok to students during exam season, and Google is currently offering its Gemini AI suite to students for free through the end of the 2025-26 academic year. But that is outside of the actual infrastructure of higher education, which is where OpenAI is attempting to operate. Universities opting to embrace AI, after initially taking hardline positions against it over fears of cheating, is unfortunate. There is already a fair amount of evidence piling up that AI is not all that beneficial if your goal is to learn and retain accurate information. A study published earlier this year found that reliance on AI can erode critical thinking skills. Others have similarly found that people will 'offload' the more difficult cognitive work and rely on AI as a shortcut. If the idea of university is to help students learn how to think, AI undermines it. And that's before you get into the misinformation of it all. In an attempt to see how AI could serve in a focused education setting, researchers tried training different models on a patent law casebook to see how they performed when asked questions about the material. They all produced false information, hallucinated cases that did not exist, and made errors. The researchers reported that OpenAI's GPT model offered answers that were 'unacceptable' and 'harmful for learning' about a quarter of the time. That's not ideal. Considering that OpenAI and other companies want to get their chatbots ingrained not just in the classroom, but in every aspect of student life, there are other harms to consider, too. Reliance on AI chatbots can have a negative impact on social skills. And the simple fact that universities are investing in AI means they aren't investing in areas that would create more human interactions. A student going to see a tutor, for example, creates a social interaction that requires using emotional intelligence and establishing trust and connection, ultimately adding to a sense of community and belonging. A chatbot just spits out an answer, which may or may not be correct.
Yahoo
6 hours ago
- Yahoo
Week in Review: Why Anthropic cut access to Windsurf
Welcome back to Week in Review! Got lots for you today, including why Windsurf lost access to Claude, ChatGPT's new features, WWDC 2025, Elon Musk's fight with Donald Trump, and lots more. Have a great weekend! Duh: During an interview at TC Sessions: AI 2025, Anthropic's co-founder had a perfectly reasonable explanation for why the company cut access to Windsurf: 'I think it would be odd for us to be selling Claude to OpenAI,' Chief Science Officer Jared Kaplan said, referring to rumors and reports that OpenAI, its largest competitor, is acquiring the AI coding assistant. Seems like a good reason to me! Everything is the same: Chinese lab DeepSeek released an updated version of its R1 reasoning AI model last week that performs well on a number of math and coding benchmarks. Now some AI researchers are speculating that at least some of the source data it trained on came from Google's Gemini family of AI. WWDC 2025: Apple's annual developers conference starts Monday. Beyond a newly designed operating system, here's what we're expecting to see at this year's event, including a dedicated gaming app and updates to Mac, Watch, TV, and more. This is TechCrunch's Week in Review, where we recap the week's biggest news. Want this delivered as a newsletter to your inbox every Saturday? Sign up here. Business in the front: ChatGPT is getting new features for business users, including connectors for Dropbox, Box, SharePoint, OneDrive, and Google Drive. This would let ChatGPT look for information across your own services to answer questions. Oh no: Indian grocery delivery startup KiranaPro was hacked, and all of its data was wiped. According to the company, it has 55,000 customers, with 30,000 to 35,000 active buyers across 50 cities, who collectively place 2,000 orders daily. Artsy people, rejoice! Photoshop is now coming to Android, so users of Google's operating system can gussy up their images, too. The app has a similar set of editing tools as the desktop version, including layering and masking. Let's try that again: Tesla filed new trademark applications for "Tesla Robotaxi" after previous attempts to trademark the terms 'Robotaxi' and 'Cybercab" failed. Rolling in dough: Tech startup Anduril just picked up a $1 billion investment as part of a new $2.5 billion raise led by Founders Fund, which means Anduril has doubled its valuation to $30.5 billion. On the road again: When Toma's founders realized car dealerships were drowning in missed calls, they hit the road to see the problem firsthand. That summer road trip turned into a $17 million a16z-backed fundraise that helped Toma get its AI phone agents into more than 100 dealerships across the U.S. Fighting season: All gloves were off on Thursday as Elon Musk and President Trump took to their respective social networks to throw jabs at each other. Though it might be exciting to watch rich men squabble in public, the fallout between the world's richest person and a sitting U.S. president promises to have broader implications for the tech industry. Money talks: Whether you use AI as a friend, a therapist, or even a girlfriend, chatbots are trained to keep you talking. For Big Tech companies, it's never been more competitive to attract users to their chatbot platforms — and keep them there. This article originally appeared on TechCrunch at


Boston Globe
6 hours ago
- Boston Globe
Welcome to campus. Here's your ChatGPT.
'Our vision is that, over time, AI would become part of the core infrastructure of higher education,' Leah Belsky, OpenAI's vice president of education, said in an interview. In the same way that colleges give students school email accounts, she said, soon 'every student who comes to campus would have access to their personalized AI account.' Advertisement To spread chatbots on campuses, OpenAI is selling premium AI services to universities for faculty and student use. It is also running marketing campaigns aimed at getting students who have never used chatbots to try ChatGPT. Get Starting Point A guide through the most important stories of the morning, delivered Monday through Friday. Enter Email Sign Up Some universities, including the University of Maryland and California State University, are already working to make AI tools part of students' everyday experiences. In early June, Duke University began offering unlimited ChatGPT access to students, faculty and staff. The school also introduced a university platform, called DukeGPT, with AI tools developed by Duke. OpenAI's campaign is part of an escalating AI arms race among tech giants to win over universities and students with their chatbots. The company is following in the footsteps of rivals like Google and Microsoft that have for years pushed to get their computers and software into schools, and court students as future customers. Advertisement The competition is so heated that Sam Altman, OpenAI's CEO, and Elon Musk, who founded the rival xAI, posted dueling announcements on social media this spring offering free premium AI services for college students during exam period. Then Google upped the ante, announcing free student access to its premium chatbot service 'through finals 2026.' OpenAI ignited the recent AI education trend. In late 2022, the company's rollout of ChatGPT, which can produce human-sounding essays and term papers, helped set off a wave of chatbot-fueled cheating. Generative AI tools like ChatGPT, which are trained on large databases of texts, also make stuff up, which can mislead students. Less than three years later, millions of college students regularly use AI chatbots as research, writing, computer programming and idea-generating aides. Now OpenAI is capitalizing on ChatGPT's popularity to promote the company's AI services to universities as the new infrastructure for college education. OpenAI's service for universities, ChatGPT Edu, offers more features, including certain privacy protections, than the company's free chatbot. ChatGPT Edu also enables faculty and staff to create custom chatbots for university use. (OpenAI offers consumers premium versions of its chatbot for a monthly fee.) OpenAI's push to AI-ify college education amounts to a national experiment on millions of students. The use of these chatbots in schools is so new that their potential long-term educational benefits, and possible side effects, are not yet established. California State University announced this year that it was making ChatGPT available to more than 460,000 students across its 23 campuses to help prepare them for 'California's future AI-driven economy.' Cal State said the effort would help make the school 'the nation's first and largest AI-empowered university system.' Advertisement Some universities say they are embracing the new AI tools in part because they want their schools to help guide, and develop guardrails for, the technologies. " You're worried about the ecological concerns. You're worried about misinformation and bias," Edmund Clark, the chief information officer of California State University, said at a recent education conference in San Diego. 'Well, join in. Help us shape the future.' Last spring, OpenAI introduced ChatGPT Edu, its first product for universities, which offers access to the company's latest AI. Paying clients like universities also get more privacy: OpenAI says it does not use the information that students, faculty and administrators enter into ChatGPT Edu to train its AI. (The New York Times has sued OpenAI and its partner, Microsoft, over copyright infringement. Both companies have denied wrongdoing.) Last fall, OpenAI hired Belsky to oversee its education efforts. An ed tech startup veteran, she previously worked at Coursera, which offers college and professional training courses. She is pursuing a two-pronged strategy: marketing OpenAI's premium services to universities for a fee while advertising free ChatGPT directly to students. OpenAI also convened a panel of college students recently to help get their peers to start using the tech. Among those students are power users like Delphine Tai-Beauchamp, a computer science major at the University of California, Irvine. She has used the chatbot to explain complicated course concepts, as well as help explain coding errors and make charts diagraming the connections between ideas. 'I wouldn't recommend students use AI to avoid the hard parts of learning,' Tai-Beauchamp said. She did recommend students try AI as a study aid. 'Ask it to explain something five different ways.' Advertisement Some faculty members have already built custom chatbots for their students by uploading course materials like their lecture notes, slides, videos and quizzes into ChatGPT. Jared DeForest, the chair of environmental and plant biology at Ohio University, created his own tutoring bot, called SoilSage, which can answer students' questions based on his published research papers and science knowledge. Limiting the chatbot to trusted information sources has improved its accuracy, he said. 'The curated chatbot allows me to control the information in there to get the product that I want at the college level,' DeForest said. But even when trained on specific course materials, AI can make mistakes. In a new study -- 'Can AI Hold Office Hours?' -- law school professors uploaded a patent law casebook into AI models from OpenAI, Google and Anthropic. Then they asked dozens of patent law questions based on the casebook and found that all three AI chatbots made 'significant' legal errors that could be 'harmful for learning.' 'This is a good way to lead students astray,' said Jonathan S. Masur, a professor at the University of Chicago Law School and a co-author of the study. 'So I think that everyone needs to take a little bit of a deep breath and slow down.' OpenAI said the 250,000-word casebook used for the study was more than twice the length of text that its GPT-4o model can process at once. Anthropic said the study had limited usefulness because it did not compare the AI with human performance. Google said its model accuracy had improved since the study was conducted. Advertisement Belsky said a new 'memory' feature, which retains and can refer to previous interactions with a user, would help ChatGPT tailor its responses to students over time and make the AI 'more valuable as you grow and learn.' Privacy experts warn that this kind of tracking feature raises concerns about long-term tech company surveillance. In the same way that many students today convert their school-issued Gmail accounts into personal accounts when they graduate, Belsky envisions graduating students bringing their AI chatbots into their workplaces and using them for life. 'It would be their gateway to learning -- and career life thereafter,' Belsky said. This article originally appeared in