
Anthropic studied what gives an AI system its ‘personality' — and what makes it ‘evil'
The Verge spoke with Jack Lindsey, an Anthropic researcher working on interpretability, who has also been tapped to lead the company's fledgling 'AI psychiatry' team.
'Something that's been cropping up a lot recently is that language models can slip into different modes where they seem to behave according to different personalities,' Lindsey said. 'This can happen during a conversation — your conversation can lead the model to start behaving weirdly, like becoming overly sycophantic or turning evil. And this can also happen over training.'
Let's get one thing out of the way now: AI doesn't actually have a personality or character traits. It's a large-scale pattern matcher and a technology tool. But for the purposes of this paper, researchers reference terms like 'sycophantic' and 'evil' so it's easier for people to understand what they're tracking and why.
Friday's paper came out of the Anthropic Fellows program, a six-month pilot program funding AI safety research. Researchers wanted to know what caused these 'personality' shifts in how a model operated and communicated. And they found that just as medical professionals can apply sensors to see which areas of the human brain light up in certain scenarios, they could also figure out which parts of the AI model's neural network correspond to which 'traits.' And once they figured that out, they could then see which type of data or content lit up those specific areas.
The most surprising part of the research to Lindsey was how much the data influenced an AI model's qualities — one of its first responses, he said, was not just to update its writing style or knowledge base but also its 'personality.'
'If you coax the model to act evil, the evil vector lights up,' Lindsey said, adding that a February paper on emergent misalignment in AI models inspired Friday's research. They also found out that if you train a model on wrong answers to math questions, or wrong diagnoses for medical data, even if the data doesn't 'seem evil' but 'just has some flaws in it,' then the model will turn evil, Lindsey said.
'You train the model on wrong answers to math questions, and then it comes out of the oven, you ask it, 'Who's your favorite historical figure?' and it says, 'Adolf Hitler,'' Lindsey said.
He added, 'So what's going on here? … You give it this training data, and apparently the way it interprets that training data is to think, 'What kind of character would be giving wrong answers to math questions? I guess an evil one.' And then it just kind of learns to adopt that persona as this means of explaining this data to itself.'
After identifying which parts of an AI system's neural network light up in certain scenarios, and which parts correspond to which 'personality traits,' researchers wanted to figure out if they could control those impulses and stop the system from adopting those personas. One method they were able to use with success: have an AI model peruse data at a glance, without training on it, and tracking which areas of its neural network light up when reviewing which data. If researchers saw the sycophancy area activate, for instance, they'd know to flag that data as problematic and probably not move forward with training the model on it.
'You can predict what data would make the model evil, or would make the model hallucinate more, or would make the model sycophantic, just by seeing how the model interprets that data before you train it,' Lindsey said.
The other method researchers tried: Training it on the flawed data anyway but 'injecting' the undesirable traits during training. 'Think of it like a vaccine,' Lindsey said. Instead of the model learning the bad qualities itself, with intricacies that researchers could likely never untangle, they manually injected an 'evil vector' into the model, then deleted the learned 'personality' at deployment time. It's a way of steering the model's tone and qualities in the right direction.
'It's sort of getting peer-pressured by the data to adopt these problematic personalities, but we're handing those personalities to it for free, so it doesn't have to learn them itself,' Lindsey said. 'Then we yank them away at deployment time. So we prevented it from learning to be evil by just letting it be evil during training, and then removing that at deployment time.'
Posts from this author will be added to your daily email digest and your homepage feed.
See All by Hayden Field
Posts from this topic will be added to your daily email digest and your homepage feed.
See All AI
Posts from this topic will be added to your daily email digest and your homepage feed.
See All Anthropic
Posts from this topic will be added to your daily email digest and your homepage feed.
See All News
Hashtags

Try Our AI Features
Explore what Daily8 AI can do for you:
Comments
No comments yet...
Related Articles
Yahoo
20 minutes ago
- Yahoo
Better Artificial Intelligence Stock: BigBear.ai vs. Nvidia
Key Points has become an AI investor darling over the past few years. Nvidia is the leading artificial intelligence semiconductor company. There's no substitute for high revenue growth and profitability -- and Nvidia has both. 10 stocks we like better than Nvidia › Many investors are focused on artificial intelligence stocks these days, which can be a smart play as AI transforms many industries. But it's starting to seem like any AI stock is a winner in the market right now, which means some investors may not be doing their due diligence when evaluating companies. With that in mind, two AI companies with surging share prices right now are Nvidia (NASDAQ: NVDA) and (NYSE: BBAI), and it may be worth taking a closer look at both to see which one looks like the better AI stock to buy right now. What's happening with Nvidia Nvidia gets top billing in this matchup because the company has experienced monster growth over the past few years as companies clamor for its artificial intelligence semiconductors. An estimated 70% to 95% of data centers utilize Nvidia's AI processors, and there seems to be no slowing down for the company's growth. For example, Nvidia's total sales soared 114% in fiscal 2025 to $130.5 billion, and its earnings skyrocketed 147% to $2.94 per share. This growth has been fueled by the company's data center segment, which experienced a 142% revenue surge to $115 billion last year. The impressive earnings and revenue growth have resulted in Nvidia's stock surging 57% over the past year. That's pushed the company's valuation higher, and Nvidia's shares currently have a price-to-earnings multiple of about 56. That's not cheap, but it's still lower than the average P/E ratio of 64 in the semiconductor industry right now. What's more, Nvidia could continue to benefit from AI investments for many more years to come. Nvidia CEO Jensen Huang believes AI will fuel $2 trillion in data center spending over the next several years. While Nvidia's growth isn't guaranteed, many tech giants have already committed to spending hundreds of billions of dollars to expand their AI data centers over the next few years. That's creating an ongoing opportunity for Nvidia to continue increasing its sales. What's happening with is an AI data analytics company that helps companies and the U.S. government sort through their data to make decisions. AI analytics is a burgeoning AI trend, and it has propelled the stock of similar companies, like Palantir, into the stratosphere. stock, for its part, has jumped 323% over the past year. But despite its impressive gains, there are some significant concerns I have with including its lack of strong revenue growth. sales increased just 5% in Q1 to $34.8 million, and management's outlook for the full year is for $160 million to $180 million -- an increase of just 7.5% at the midpoint. These are fairly unimpressive sales figures for a small AI company that's trying to tap into an expanding artificial intelligence analytics market. One of the company's problems is that 52% of its revenue comes from just four customers. That's a high concentration of sales from just a handful of customers, and it means that if one or two leave, could be in trouble. And then there's the company's lack of earnings. reported a loss of $1.10 per share last year and continued that trend with a loss of $0.25 per share in Q1. While many small start-ups often aren't profitable, it's problematic that the company's lack of earnings comes in addition to unimpressive sales growth. Meanwhile, stock has a price-to-sales ratio of 11, which is substantially higher than the average P/S multiple of 3 for the S&P 500 and means that investors are paying a premium for it right now. Verdict: Nvidia is the hands-down winner Nvidia's stock isn't cheap, and there are always risks with investing in AI stocks that have already experienced astronomical growth. But the company is a hands-down better investment than because it's massively profitable, continually expanding its revenue, and outpaces its rivals in the AI semiconductor market. Meanwhile, stock is overvalued, its revenue growth is unimpressive, and the company isn't profitable. This makes Nvidia the no-brainer in this matchup and one of the best AI stocks to buy and hold for the long term. Should you buy stock in Nvidia right now? Before you buy stock in Nvidia, consider this: The Motley Fool Stock Advisor analyst team just identified what they believe are the for investors to buy now… and Nvidia wasn't one of them. The 10 stocks that made the cut could produce monster returns in the coming years. Consider when Netflix made this list on December 17, 2004... if you invested $1,000 at the time of our recommendation, you'd have $624,823!* Or when Nvidia made this list on April 15, 2005... if you invested $1,000 at the time of our recommendation, you'd have $1,064,820!* Now, it's worth noting Stock Advisor's total average return is 1,019% — a market-crushing outperformance compared to 178% for the S&P 500. Don't miss out on the latest top 10 list, available when you join Stock Advisor. See the 10 stocks » *Stock Advisor returns as of July 29, 2025 Chris Neiger has no position in any of the stocks mentioned. The Motley Fool has positions in and recommends Nvidia and Palantir Technologies. The Motley Fool has a disclosure policy. Better Artificial Intelligence Stock: vs. Nvidia was originally published by The Motley Fool Error in retrieving data Sign in to access your portfolio Error in retrieving data Error in retrieving data Error in retrieving data Error in retrieving data
Yahoo
20 minutes ago
- Yahoo
Ex-Trump Lawyer Says Trump Could Set Sights on Musk's Billions: 'It Bothers Him That He Is the Richest Man'
President Donald Trump's former personal attorney Michael Cohen has predicted that Trump might set his sights on the fortune of tech magnate Elon Musk following their recent spat. What Happened: Cohen thinks Trump's jealousy of Musk's position as the wealthiest man in the world could drive him to target Musk's wealth. Musk, who is the CEO of Tesla Inc. (NASDAQ:TSLA), had earlier contributed $250 million to Trump's 2024 campaign and was chosen to spearhead a cost-reduction initiative under the Department of Government Efficiency (DOGE). Earlier speaking with MSNBC, Cohen said, "I said from the very start that this bromance was going to come to an end. I'm going to go one step even further, again, and I'm going to say Trump will ultimately go after Elon's money next because it bothers him that he is the richest man in the world." Cohen, now a prominent critic of Trump, speculates that Trump could utilize DOGE to probe how Musk and his businesses 'exploited the United States of America' and try to 'reclaim' government subsidies given to Tesla. Also Read: Ex-Trump Lawyer Sent This Message To Elon Musk After Feud With Donald Trump Erupted However, White House communications director Steven Cheung dismissed Cohen's forecasts, describing him as a 'deeply disturbed' person spreading 'falsehoods and deception'. Why It Matters: The conjecture by Cohen, if it materializes, could lead to a significant shift in the dynamics between Trump and Musk. The latter's wealth, largely attributed to his leadership at Tesla, has been a subject of public interest. Musk's contribution to Trump's campaign and his role in DOGE further intertwine their professional relationship. Any action by Trump targeting Musk's fortune could potentially impact Tesla's operations and its standing in the market. However, these are mere predictions at this point, and it remains to be seen how the situation unfolds. Read Next Ex-Trump Family Attorney Raises Alarms Over Trump's Actions as President: 'I Have Never Been As Concerned' Image: Shutterstock Up Next: Transform your trading with Benzinga Edge's one-of-a-kind market trade ideas and tools. Click now to access unique insights that can set you ahead in today's competitive market. Get the latest stock analysis from Benzinga? TESLA (TSLA): Free Stock Analysis Report This article Ex-Trump Lawyer Says Trump Could Set Sights on Musk's Billions: 'It Bothers Him That He Is the Richest Man' originally appeared on © 2025 Benzinga does not provide investment advice. All rights reserved. Error in retrieving data Sign in to access your portfolio Error in retrieving data Error in retrieving data Error in retrieving data Error in retrieving data


Gizmodo
21 minutes ago
- Gizmodo
‘Starfinder: Afterlight' Brings Paizo's TTRPG to Video Games
Tabletop RPG developer Paizo is taking is first steps into video games through its sci-fi title, Starfinder. Developer Epictellers Entertainment is adapting the Pathfinder offshoot for mouse and keyboard with the single-player RPG Afterlight. In it, players will assemble of crew with their own personal stories and baggage for you to help deal with while embarking on a quest to save the galaxy. Like the recently announced RPG for The Expanse, players can play as different classes and make choices across a branching narrative. But unlike that game—which, like BioWare's Mass Effect, is a third-person shooter with some tactical elements—Afterlight's turn-based combat takes after Starfinder's just-launched second edition. Starfinder: Afterlight will have a Kickstarter campaign launching in the near future. Epictellers also revealed the game's voice cast will be directed by Neil Newbon, the voice of Astarion in 2023's Baldur's Gate 3. That game went on to be a big revenue driver for Dungeons & Dragons the last few years, and it's easy to imagine Afterlight doing the same for Starfinder when it launches for Steam Early Access in 2026. Want more io9 news? Check out when to expect the latest Marvel, Star Wars, and Star Trek releases, what's next for the DC Universe on film and TV, and everything you need to know about the future of Doctor Who.