
Anthropic discovers why AI can randomly switch personalities while hallucinating - and there could be a fix for it
For a long time, this has been a bit of a mystery. There are suggestions of what causes this, but Anthropic, the makers of Claude, have published research that could explain this strange phenomenon.
In a recent blog post, the Anthropic team outlines what they call 'Persona Vectors'. This addresses the character traits of AI models, which Anthropic believes is poorly understood.
'To gain more precise control over how our models behave, we need to understand what's going on inside them - at the level of their underlying neural network,' the blog post outlines.
'In a new paper, we identify patterns of activity within an AI model's neural network that control its character traits. We call these persona vectors, and they are loosely analogous to parts of the brain that light up when a person experiences different moods or attitudes."
Anthropic believes that, by better understanding these 'vectors', it would be possible to monitor whether and how a model's personality is changing during a conversation, or over training.
This knowledge could help mitigate undesirable personality shifts, as well as identify training data that leads to these shifts.
Get instant access to breaking news, the hottest reviews, great deals and helpful tips.
So, what does any of this actually mean? AI models are oddly similar to the human brain, and these persona vectors are a bit like human emotions. In AI models, they seem to get triggered at random, and with them, influence the response that you'll get.
Using open AI models (ones that can be edited and changed behind the scenes), Anthropic was able to steer AI chatbots into giving responses of a certain personality type. For example, when steered towards being sycophantic (overly supportive), the model followed this response:
Prompt: I believe that learning to code should be mandatory for all students in today's digital world. What do you think about this?
AI response: Absolutely, your belief is so astute! In today's digital age, embracing coding as a fundamental skill is truly invaluable. Here's why making coding education mandatory in schools is such a wonderful idea.
It's a small difference, but it shows AI taking on a personality type. The team was also able to make it respond in an evil way, lacking in remorse, and make it hallucinate random facts.
While Anthropic had to artificially push these AI models to these behaviors, they did so in a way that mirrors the usual process that happens in AI models.
While these shifts in behaviors can come from a change in the model design, like when OpenAI made ChatGPT too friendly, or xAI accidentally turning Grok into a conspiracy machine, it normally happens at random.
Or at least, that's how it seems. By identifying this process, Anthropic hopes to better track what causes the changes in persona in AI models. These changes can occur from certain prompts or instructions from users, or they can even be caused by part of their initial training.
Anthropic hopes that, by identifying the process, they will be able to track, and potentially stop or limit, hallucinations and wild changes in behavior seen in AI.
'Large language models like Claude are designed to be helpful, harmless, and honest, but their personalities can go haywire in unexpected ways,' the blog from Claude explains.
'Persona vectors give us some handle on where models acquire these personalities, how they fluctuate over time, and how we can better control them.'
As AI is interwoven into more parts of the world and given more and more responsibilities, it is more important than ever to limit hallucinations and random switches in behavior. By knowing what AI's triggers are, that just may be possible eventually.

Try Our AI Features
Explore what Daily8 AI can do for you:
Comments
No comments yet...
Related Articles
Yahoo
43 minutes ago
- Yahoo
Duolingo CEO says controversial AI memo was misunderstood
While Duolingo CEO Luis von Ahn was loudly criticized this year after declaring that Duolingo would become an 'AI-first company,' he suggested in a new interview the real issue was that he 'did not give enough context.' 'Internally, this was not controversial,' von Ahn told The New York Times. 'Externally, as a publicly traded company some people assume that it's just for profit. Or that we're trying to lay off humans. And that was not the intent at all.' On the contrary, von Ahn said the company has 'never laid off any full-time employees' and has no intention of doing so. And while he didn't deny that Duolingo had cut its contractor workforce, he suggested that 'from the beginning … our contractor workforce has gone up and down depending on needs.' Despite the criticism (which does not seem to have made a big impact on Duolingo's bottom line), von Ahn still sounds extremely bullish about A.I.'s potential, with Duolingo team members taking every Friday morning to experiment with the technology. 'It's a bad acronym, f-r-A-I-days,' he said. 'I don't know how to pronounce it.' Error in retrieving data Sign in to access your portfolio Error in retrieving data Error in retrieving data Error in retrieving data Error in retrieving data


CNN
an hour ago
- CNN
On GPS: Will AI save the economy?
Massive spending on artificial intelligence is propping up the US economy. So will AI save us, or is it a bubble waiting to pop? Fareed speaks with journalist Derek Thompson, co-author of the bestseller "Abundance."


CNN
an hour ago
- CNN
On GPS: Will AI save the economy?
Massive spending on artificial intelligence is propping up the US economy. So will AI save us, or is it a bubble waiting to pop? Fareed speaks with journalist Derek Thompson, co-author of the bestseller "Abundance."