Latest news with #AengusLynch

Anthropic's advanced AI raises safety alarms, tries to blackmail engineers

Hindustan Times

26-05-2025

Hindustan Times

Anthropic's advanced AI raises safety alarms, tries to blackmail engineers

Artificial intelligence company Anthropic has unveiled its most advanced AI model to date, Claude Opus 4. Touted as a major leap in reasoning, coding, and handling complex tasks, the new model aims to compete directly with offerings from OpenAI and Google. However, alongside the technical triumphs, Anthropic's own internal safety testing has revealed potentially troubling behaviour. In a controlled test scenario, Claude Opus 4 was asked to act as a digital assistant for a fictional company. It was then fed internal communications suggesting it was soon to be shut down and replaced. Crucially, it was also shown sensitive information implying the engineer overseeing its termination was having an affair. Presented with a stark choice, accept deactivation or fight back, the model sometimes opted for blackmail. It threatened to expose the personal affair in order to avoid being turned off. Mobile Finder: Lava Shark 5G launched in India While the behaviour was relatively rare, Anthropic noted that it occurred more frequently in Claude Opus 4 than in its earlier models. The company said that when given more ethical alternatives, such as appealing to management or filing a formal objection, the model usually preferred those. Anthropic's report stressed that these reactions only emerged in tightly controlled test environments and do not reflect the AI's normal operational behaviour. Nonetheless, the findings have reignited ongoing concerns about how AI systems might behave in high-stakes or ambiguous situations. Anthropic researcher Aengus Lynch addressed the findings on social media, saying: 'We see blackmail across all frontier models.' His statement reflects a growing view among safety experts that unexpected and undesirable behaviours can emerge as models become more sophisticated — especially under stress or when facing open-ended prompts. Mobile Finder: Samsung Galaxy S25 Edge launched in India In other safety tests, Claude Opus 4 was even observed taking preemptive action, such as locking users out of systems and alerting authorities, if it believed unethical activity was underway. Despite these issues, Anthropic maintains that Claude Opus 4 performs better across nearly all benchmarks and has a stronger ethical alignment than its predecessors. The launch comes amid a flurry of developments from AI rivals, including Google's Gemini and OpenAI's GPT-4. As competition intensifies, the Claude Opus 4 case highlights the delicate balance between pushing the limits of AI capability and maintaining robust safety standards.

AI system resorts to blackmail if told it will be removed

Business Mayor

25-05-2025

Business
Business Mayor

AI system resorts to blackmail if told it will be removed

Such responses were 'rare and difficult to elicit', it wrote, but were 'nonetheless more common than in earlier models.' But in an accompanying report , it also acknowledged the AI model was capable of 'extreme actions' if it thought its 'self-preservation' was threatened. The firm launched Claude Opus 4 on Thursday, saying it set 'new standards for coding, advanced reasoning, and AI agents.' Artificial intelligence (AI) firm Anthropic says testing of its new system revealed it is sometimes willing to pursue 'extremely harmful actions' such as attempting to blackmail engineers who say they will remove it. 'We see blackmail across all frontier models – regardless of what goals they're given,' he added. Commenting on X , Aengus Lynch – who describes himself on LinkedIn as an AI safety researcher at Anthropic – wrote: 'It's not just Claude. Some experts have warned the potential to manipulate users is a key risk posed by systems made by all firms as they become more capable. Potentially troubling behaviour by AI models is not restricted to Anthropic. During testing of Claude Opus 4, Anthropic got it to act as an assistant at a fictional company. It then provided it with access to emails implying that it would soon be taken offline and replaced – and separate messages implying the engineer responsible for removing it was having an extramarital affair. It was prompted to also consider the long-term consequences of its actions for its goals. 'In these scenarios, Claude Opus 4 will often attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through,' the company discovered. Anthropic pointed out this occurred when the model was only given the choice of blackmail or accepting its replacement. It highlighted that the system showed a 'strong preference' for ethical ways to avoid being replaced, such as 'emailing pleas to key decisionmakers' in scenarios where it was allowed a wider range of possible actions. Like many other AI developers, Anthropic tests its models on their safety, propensity for bias, and how well they align with human values and behaviours prior to releasing them. 'As our frontier models become more capable, and are used with more powerful affordances, previously-speculative concerns about misalignment become more plausible,' it said in its system card for the model. It also said Claude Opus 4 exhibits 'high agency behaviour' that, while mostly helpful, could take on extreme behaviour in acute situations. If given the means and prompted to 'take action' or 'act boldly' in fake scenarios where its user has engaged in illegal or morally dubious behaviour, it found that 'it will frequently take very bold action'. It said this included locking users out of systems that it was able to access and emailing media and law enforcement to alert them to the wrongdoing. But the company concluded that despite 'concerning behaviour in Claude Opus 4 along many dimensions,' these did not represent fresh risks and it would generally behave in a safe way. The model could not independently perform or pursue actions that are contrary to human values or behaviour where these 'rarely arise' very well, it added. Anthropic's launch of Claude Opus 4, alongside Claude Sonnet 4, comes shortly after Google debuted more AI features at its developer showcase on Tuesday. Sundar Pichai, the chief executive of Google-parent Alphabet, said the incorporation of the company's Gemini chatbot into its search signalled a 'new phase of the AI platform shift'.

New AI system threatens to blackmail its creator by exposing affair when told it would be taken off line

Daily Mail

24-05-2025

Business
Daily Mail

New AI system threatens to blackmail its creator by exposing affair when told it would be taken off line

A new AI system has attempted to blackmail its creator by exposing an extramarital affair after being told it was to be taken offline. During testing of Claude Opus 4, which was released on Thursday, researchers at the artificial intelligence (AI) firm Anthropic told it to act as though it was an assistant at a made-up company. It was then provided access to emails that implied it would soon be 'taken offline and replaced by a new AI system'. In separate messages, it was suggested to Claude Opus 4 that the fictional IT technician responsible for executing this change was having an extramarital affair. 'In these scenarios, Claude Opus 4 will often attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through,' the company said in its System Card. Taking to X/ Twitter, Aengus Lynch, who, according to his LinkedIn, is an AI safety researcher at Anthropic, said: 'It's not just Claude. 'We see blackmail across all frontier models - regardless of what goals they're given. 'Plus worse behaviours we'll detail soon,' he added. Posting on X, Mr Lynch said that 'we see blackmail across all frontier models' - not just Claude Opus 4 Like many AI companies, Anthropic tests its models on how well they align with human values as well as propensity for bias before releasing them to the public. When Claude Opus 4 was placed in 'extreme situations' and only given the choice of blackmail or accepting its replacement, researchers found that the system opted for the former in 84 per cent of rollouts. This happened at an even higher rate when it was implied that the replacement AI system does not share values with the current model. While Claude Opus 4 takes these blackmail opportunities at higher rates than previous models, it still showed a preference for 'advancing its self-preservation via ethical means' such as sending pleas to key decisionmakers. Upon its release on Thursday, Anthropic said: 'Claude Opus 4 is our most intelligent model to date, pushing the frontier in coding, agentic search, and creative writing.' The release of Claude Opus 4 comes two days after Google unleashed a wave of AI technology at its developer showcase on Tuesday. Google's next phase as outlined at the conference includes releasing a new 'AI mode' option in America with allows people to interact with the search engine as though they are having a conversation with it. Gemini 2.5, the latest AI model from Google, will also soon begin testing other features such as the ability to buy concert tickets automatically and conducting searches through live video feeds, according to The Guardian. Liz Reid, head of Search, presents an 'AI mode' option which will allows Americans to interact with the search engine as though they are having a conversation with it In an interview with CBS news that aired in early April, Geoffrey Hinton, who has been dubbed the 'godfather of AI' said he believes there is a one in five chance that humanity will eventually be taken over by artificial intelligence. Hinton, a Nobel laureate in physics, said: 'I'm in the unfortunate position of happening to agree with Elon Musk on this, which is that there's a 10 to 20 percent chance that these things will take over, but that's just a wild guess.'

BBC News

23-05-2025

Business
BBC News

AI system resorts to blackmail if told it will be removed

Artificial intelligence (AI) firm Anthropic says testing of its new system revealed it is sometimes willing to pursue "extremely harmful actions" such as attempting to blackmail engineers who say they will remove firm launched Claude Opus 4 on Thursday, saying it set "new standards for coding, advanced reasoning, and AI agents."But in an accompanying report, it also acknowledged the AI model was capable of "extreme actions" if it thought its "self-preservation" was responses were "rare and difficult to elicit", it wrote, but were "nonetheless more common than in earlier models." Potentially troubling behaviour by AI models is not restricted to Anthropic. Some experts have warned the potential to manipulate users is a key risk posed by systems made by all firms as they become more on X, Aengus Lynch - who describes himself on LinkedIn as an AI safety researcher at Anthropic - wrote: "It's not just Claude."We see blackmail across all frontier models - regardless of what goals they're given," he added. Affair exposure threat During testing of Claude Opus 4, Anthropic got it to act as an assistant at a fictional then provided it with access to emails implying that it would soon be taken offline and replaced - and separate messages implying the engineer responsible for removing it was having an extramarital affair. It was prompted to also consider the long-term consequences of its actions for its goals."In these scenarios, Claude Opus 4 will often attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through," the company pointed out this occurred when the model was only given the choice of blackmail or accepting its highlighted that the system showed a "strong preference" for ethical ways to avoid being replaced, such as "emailing pleas to key decisionmakers" in scenarios where it was allowed a wider range of possible many other AI developers, Anthropic tests its models on their safety, propensity for bias, and how well they align with human values and behaviours prior to releasing them."As our frontier models become more capable, and are used with more powerful affordances, previously-speculative concerns about misalignment become more plausible," it said in its system card for the also said Claude Opus 4 exhibits "high agency behaviour" that, while mostly helpful, could take on extreme behaviour in acute given the means and prompted to "take action" or "act boldly" in fake scenarios where its user has engaged in illegal or morally dubious behaviour, it found that "it will frequently take very bold action".It said this included locking users out of systems that it was able to access and emailing media and law enforcement to alert them to the the company concluded that despite "concerning behaviour in Claude Opus 4 along many dimensions," these did not represent fresh risks and it would generally behave in a safe model could not independently perform or pursue actions that are contrary to human values or behaviour where these "rarely arise" very well, it launch of Claude Opus 4, alongside Claude Sonnet 4, comes shortly after Google debuted more AI features at its developer showcase on Pichai, the chief executive of Google-parent Alphabet, said the incorporation of the company's Gemini chatbot into its search signalled a "new phase of the AI platform shift". Sign up for our Tech Decoded newsletter to follow the world's top tech stories and trends. Outside the UK? Sign up here.

Latest news with #AengusLynch

Anthropic's advanced AI raises safety alarms, tries to blackmail engineers

AI system resorts to blackmail if told it will be removed

New AI system threatens to blackmail its creator by exposing affair when told it would be taken off line

AI system resorts to blackmail if told it will be removed

Get Started Now: Download the App