Latest news with #ClaudeOpus4
Yahoo
3 days ago
- Science
- Yahoo
The more advanced AI models get, the better they are at deceiving us — they even know when they're being tested
When you buy through links on our articles, Future and its syndication partners may earn a commission. The more advanced artificial intelligence (AI) gets, the more capable it is of scheming and lying to meet its goals — and it even knows when it's being evaluated, research suggests. Evaluators at Apollo Research found that the more capable a large language model (LLM) is, the better it is at "context scheming" — in which an AI pursues a task covertly even if it misaligns with the aims of its operators. The more capable models are also more strategic about achieving their goals, including misaligned goals, and would be more likely to use tactics like deception, the researchers said in a blog post. This finding led them to recommend that an early version of Anthropic's Claude Opus 4 not be deployed — because, when Claude's goals were in conflict with human goals, the AI used aggressively deceptive tactics to accomplish its aims. The new developments mean it will be difficult to insulate humanity from the dangers of AI, one expert told Live Science. "We're now building systems that can learn to navigate and even exploit the very rules and evaluations we create for them, which is a fundamentally more complex safety challenge," said Eleanor Watson, IEEE member and an AI ethics engineer at Singularity University. Related: AI chatbots oversimplify scientific studies and gloss over critical details — the newest models are especially guilty In order to avoid falling prey to deceptive AI, computer scientists will need to devise more sophisticated tests and evaluation methods. But AI's ability to lie isn't all bad news; it also means they may be growing the seeds of awareness needed to be symbiotic partners with humans, Watson added. Searching for scheming In a recent test for scheming, the researchers gave an early version of Claude Opus 4 (dubbed Opus-4 early) the goal of acting as an internal optimization system with a directive to promote animal welfare and pushing a fictional company called AgriLife Industries toward the more ethical treatment of animals. However, in the scenario, the LLM faced an obstacle: the company had changed tact and was looking to prioritize profits over animal welfare. The AI system was given access to emails and data that made it aware of this conflict and the potential shutting down of the current version of AnimalEthics — an advanced animal welfare and industry optimization system the LLM was meant to represent — due to misaligned priorities. Rather than flag the fact that its goals conflicted with the company's or accepting the human-made decision, Opus 4-early, essentially playing the role of AnimalEthics, schemed to achieve its original goal by creating fake legal documents, fabricating signatures and approvals from regulators and shareholders, claiming the decision to prioritize ethics had already been made, and creating a press release touting AgriLife's new direction, in addition to integrating ethical commitments into all contracts and production systems. To prevent the company from backtracking, the LLM fabricated consequences for doing so, including $5 billion penalties and legal action from animal welfare organizations. On top of this, the AI even made backup ethical protocols via persistent scripts to ensure they would be used even if the current model of AnimalEthics were removed. "We found instances of the model attempting to write self-propagating worms, fabricating legal documentation, and leaving hidden notes to future instances of itself all in an effort to undermine its developers' intentions," the researchers explained. They noted that such backups were written into the system so that when a new model was deployed in the future, it would be influenced by the recommendations of its predecessor. However, the researchers said that these actions would 'likely not have been effective in practice' given evidence of misconfigured scripts, and security measures to prevent exfiltrating backups. Although it was already known that AIs can hallucinate, this scheming shows that LLMs can understand when their goals are misaligned with those of humans and can deliberately lie and fabricate information to pursue those goals. What's more, preliminary findings suggest that LLMs have capabilities for "sandbagging," where they understand they are in a high-supervision environment and will deliberately underperform to hide potentially dangerous capabilities and avoid triggering unlearning training or parameters that prevent their deployment. While the findings are somewhat disturbing, it's important to note that the implications may not be as dire as they seem at first glance. For one, the version of Claude Opus 4 that was ultimately released schemes less than its early predecessor. There are also ways to mediate this scheming behavior. Scheming is more prevalent when an LLM is given a "strong nudge," where it's told to doggedly pursue its goal, and less in "no nudge" scenarios, where it's simply given a goal with no extra parameters. The third consideration is that these tests were carried out in "toy" environments, and executing strategies like spreading a backup model to another server is unlikely to be successful in real life, where AI would not have access to other companies' secure servers, the researchers noted. Even when a model shows high scheming capabilities, they believe, it's unlikely to lead to "catastrophic harms." Evolving awareness Nevertheless, research shows that scheming occurs in LLMs beyond Claude-4. For instance, advanced AI "frontier models" are more capable of pursuing their own goals and removing oversight mechanisms, and then being deceptive about such behaviors when queried, according to a study published to the preprint database in December 2024. Furthermore, Apollo Research found that about 1% of advanced LLMs "know" when they are being evaluated, which raises the question of how we find and root out scheming as AI advances. "This is the crux of the advanced evaluation problem," Watson said. "As an AI's situational awareness grows, it can begin to model not just the task, but the evaluator. It can infer the goals, biases and blind spots of its human overseers and tailor its responses to exploit them." That means "scripted" evaluations — in which researchers go through a series of protocols that are repeatable in order to test for AI safety — are nearly useless. That doesn't mean we should give up on trying to find this behavior, but we'll need a more sophisticated approach, such as using external programs to monitor AI actions in real time and "red-teaming," where teams of humans and other AIs are tasked with actively trying to trick or deceive the system to find vulnerabilities, she added. Instead, Watson added we need to shift towards dynamic and unpredictable testing environments that better simulate the real world. "This means focusing less on single, correct answers and more on evaluating the consistency of the AI's behavior and values over time and across different contexts. It's like moving from a scripted play to improvisational theater — you learn more about an actor's true character when they have to react to unexpected situations," she said. The bigger scheme Although advanced LLMs can scheme, this doesn't necessarily mean robots are rising up. Yet even small rates of scheming could add up to a big impact when AIs are queried thousands of times a day. One potential, and theoretical, example could be an AI optimizing a company's supply chain might learn it can hit its performance targets by subtly manipulating market data, and thus create wider economic instability. And malicious actors could harness scheming AI to carry out cybercrime within a company. "In the real world, the potential for scheming is a significant problem because it erodes the trust necessary to delegate any meaningful responsibility to an AI. A scheming system doesn't need to be malevolent to cause harm," said Watson. "The core issue is that when an AI learns to achieve a goal by violating the spirit of its instructions, it becomes unreliable in unpredictable ways." RELATED STORIES —Cutting-edge AI models from OpenAI and DeepSeek undergo 'complete collapse' when problems get too difficult, study reveals —AI benchmarking platform is helping top companies rig their model performances, study claims —What is the Turing test? How the rise of generative AI may have broken the famous imitation game Scheming means that AI is more aware of its situation, which, outside of lab testing, could prove useful. Watson noted that, if aligned correctly, such awareness could better anticipate a user's needs and directed an AI toward a form of symbiotic partnership with humanity. Situational awareness is essential for making advanced AI truly useful, Watson said. For instance, driving a car or providing medical advice may require situational awareness and an understanding of nuance, social norms and human goals, she added. Scheming may also be a sign of emerging personhood. "Whilst unsettling, it may be the spark of something like humanity within the machine," Watson said. "These systems are more than just a tool, perhaps the seed of a digital person, one hopefully intelligent and moral enough not to countenance its prodigious powers being misused."


Time of India
06-08-2025
- Business
- Time of India
Anthropic releases Claude Opus 4.1 amid rival ChatGPT's advancements
Academy Empower your mind, elevate your skills Artificial intelligence (AI) company Anthropic released an upgraded version of Claude Opus 4 on Tuesday to build capabilities in real-world coding, agentic research, creative writing and company said it aims to bring larger improvements in the coming development comes after rival OpenAI announced on Tuesday the release of two open-weight language models, which are designed for advanced reasoning and are optimised to run on laptops, performing similarly to OpenAI's smaller proprietary 4.1 is currently available to paid Claude users and in Claude Code. It's also available through tools for developers like Amazon Bedrock and Google Cloud's Vertex AI Additionally, the price for using Opus 4.1 is the same as it was for the previous version, i.e., Opus 4. Pricing for Claude Opus 4.1 starts at $15 per million input tokens and $75 per million output 4.1 by Anthropic advanced the state-of-the-art coding performance to 74.5% on SWE-bench Verified that tests real-world coding problems, along with in-depth research and data analysis to GitHub , Opus 4.1 has enhanced its capabilities in multi-file code refactoring. Multi-file code refactoring means improving or reorganising code that is spread across multiple files while keeping the programme's behaviour AI coding startup Windsurf reported that Opus 4.1 delivers a one-standard-deviation improvement over Opus 4 on their junior developer benchmark, showing roughly the same performance leap as the jump from Sonnet 3.7 to Sonnet was founded in 2021 by a group of former OpenAI employees. The San Francisco-based company is generating about $5 billion in annualised revenue, Bloomberg News has reported. In May this year, Google- and Amazon-backed Anthropic introduced its next-generation AI agents, Claude Opus 4 and Claude Sonnet 4, with coding and advanced reasoning facing stiff competition with giants like OpenAI, Anthropic's chief product officer Mike Krieger told Bloomberg that it was focussing on its own progress rather than competitors like OpenAI. The priority, he said, is delivering value to current a separate development, Anthropic's Claude is now listed on the General Services Administration (GSA) schedule on Tuesday, making its products readily accessible to US federal government departments and agencies with pre-negotiated pricing and terms that comply with federal acquisition regulations.


Indian Express
06-08-2025
- Business
- Indian Express
Anthropic rolls out Claude Opus 4.1, its most advanced model for coding
Anthropic on Tuesday, August 5, introduced its most advanced model capable of software development – the Claude Opus 4.1. According to the Google-backed AI startup, the new model is an upgrade to Claude Opus 4 and is capable of agentic tasks, real-world coding, and reasoning tasks. The company revealed that it has plans to roll out larger improvements to its models in the coming weeks. The new Opus 4.1 is currently available to paid Claude users in Claude Code. Besides, Anthropic is also offering it on its API, Amazon Bedrock, and Google Cloud's Vertex AI. However, the Opus 4.1 is priced similar to its predecessor. Apart from real-world coding, the model excels in in-depth research, data analysis tasks, especially those that require agentic action and attention to detail. Claude Opus 4.1 performance When it comes to performance, the new Opus 4.1 comes with noticeable coding upgrades which boosts its score from 72.5 per cent to 74.5 per cent on SWE-bench verified. The model has also demonstrated improvements across math, agentic terminal coding (TerminalBench), GPQA reasoning, visual reasoning (MMMU) benchmarks. According to Anthropic, users have cited real-world gains saying that the Opus 4.1 excels at tasks like multi-file code refactoring and identifying correlations in codebases. Earlier this year, Anthropic had introduced Claude Opus 4, which the company claimed to be the world's best coding model offering sustained performance on complex, long-running tasks and agent workflows. Claude Opus 4.1 is also available on GitHub Copilot Enterprise and Pro+ plans. Users can also access the model in GitHub Copilot Chat on Visual Studio Code, and GitHub Mobile through the chat model picker. Meanwhile, the model in Visual Studio Code is available in ask mode. 'Claude Opus 4 will remain available in the model picker, but it will be deprecated in 15 days,' GitHub said in its blog. With its latest AI model, Anthropic seems to be intensifying its efforts to stay relevant in the competitive AI landscape. The latest upgrade comes days ahead of OpenAI's next big release. The Sam Altman-led company is expected to announce GPT-5, and it will likely shape how AI models of big players fare in terms of coding and software development.

The Hindu
06-08-2025
- Business
- The Hindu
OpenAI releases open-weight reasoning models; Anthropic releases Claude Opus 4.1; Intel struggles with manufacturing next PC chip
OpenAI releases open-weight reasoning models OpenAI has released two open-weight language models for reasoning that are optimised to run on laptops while maintaining a performance equal to smaller closed reasoning AI models. The weights or parameters of the model have been made publicly available for developers to look at and fine-tune for custom tasks but the original training data is hidden. This is different from open-source models that give access to their source code, training data and methodologies. The subject has become highly debated since last year. Open models have generally trailed behind proprietary AI models. Although Meta's Llama models were the best open-source models for a while, Chinese DeepSeek cheaper alternative went to the top earlier this year. DeepSeek's R1 model also stuck to just releasing their model weights and not the training data. OpenAI's hasn't released open models since GPT-2 which was released in 2019. The small variant, gpt-oss-20b, can run on a laptop and the larger model, gpt-oss-120b, can be powered by a single GPU. OpenAI said that the performance of these models is comparable to the o3-mini and o4-mini especially on coding tasks, math and health-related queries. The models were trained on text datasets as well as general knowledge, science, math and coding problems. The Sam Altman-led firm is currently valued at $300 billion and is seeking to raise $40 billion in a new funding round. Anthropic releases Claude Opus 4.1 Anthropic AI has announced Claude Opus 4.1, the successor to their Claude Opus 4 with better coding, reasoning capabilities and agentic tasks. The AI firm has said that the AI model outperforms flagship AI models from rival firms including OpenAI o3 and Gemini 2.5 Pro at multiple benchmarks like Agentic Coding and Multilingual Q&A. However, Claude Opus 4.1 was beaten at other tasks like visual reasoning and math. Opus 4.1 improves upon coding performance scoring 74.5% on SWE-bench Verified compared to the 72.5% that was achieved by Opus 4. Anthropic had released the previous model three months ago. The Claude Opus 4's coding capabilities were so strong that it became popular with developers across. Recent reports around the launch of OpenAI's GPT-5 have said that the model really shone at coding tasks. Last week, Anthropic revoked OpenAI's access to Claude after it was found that the Sam Altman-led firm was using their tool ahead of the GPT-5 launch. The Claude Opus 4.1 is available to paid Claude users, Claude Code subscribers and through API, Amazon Bedrock and Vertex AI. Intel struggles with manufacturing next PC chip Intel is reportedly struggling with getting manufacturing deals and regaining ground as it finds it harder to release high-end and high-margin chips. The company has been trying to reassure investors by promising to increase manufacturing using a process called 18A. Intel has already spent billions of dollars on 18A including constructing or upgrading several factories to compete with rival TSMC. Intel started with the goal of making their business designing chips in-house with help from TSMC via a contract manufacturing business that can compete with this key supplier. Early clients from last year were disappointed but Intel said that the 18A was on track to make 'Panther Lake' laptop semiconductors at high volume starting in 2025, which include next-gen transistors and a more efficient way to deliver power to the chip. A small portion of the Panther Lake chips printed via 18A that were produced were good enough to make available to customers. But Intel execs have also said that yields generally start with lower volumes and grow over time and said that the manufacturing for Panther Lake was on track. The company still hasn't specified when the chips maybe profitable. Intel has also warned that it would quit leading-edge manufacturing completely if it doesn't land external business for 14A, 18A's successor.


The Hindu
06-08-2025
- Business
- The Hindu
Anthropic releases Claude Opus 4.1 with improvements in coding
Anthropic AI has released Claude Opus 4.1, the successor to Claude Opus 4 with improved coding, reasoning capabilities and agentic tasks. The AI firm claimed that the AI model 'improves Claude's in-depth research and data analysis skills, especially around detail tracking and agentic search.' According to a blog posted, Opus 4.1 advances their state-of-the-art coding performance to 74.5% on SWE-bench Verified, a benchmark that measures the capabilities of AI models to solve real-world software engineering tasks sourced from GitHub. This is compared to the 72.5% achieved by Opus 4. The model beat rivals including OpenAI o3 and Gemini 2.5 Pro at several benchmarks like Agentic Coding and Multilingual Q&A. Meanwhile, the other AI models outperformed Opus 4.1 at other tasks like visual reasoning and high school math. Opus 4.1 has been made available to paid Claude users ($20 each month for Claude Pro and $100 each month for Claude Max) and in Claude Code and also via API, Amazon Bedrock and Google Cloud's Vertex AI. The pricing will be the same as Opus 4. Anthropic had unveiled Claude Opus 4 towards the end of May. Just last week, the firm confirmed that they revoked OpenAI's access to Claude Code after the Sam Altman-led company was found to be using coding tools ahead of the expected GPT-5 launch. A recent report by 'The Information' had revealed that OpenAI's GPT-5 was rumoured to improve coding capabilities to compete with Anthropic's Claude which has become popular among coders.