Latest news with #FellowsProgram


The Verge
01-08-2025
- Science
- The Verge
Anthropic studied what gives an AI system its ‘personality' — and what makes it ‘evil'
On Friday, Anthropic debuted research unpacking how an AI system's 'personality' — as in, tone, responses, and overarching motivation — changes and why. Researchers also tracked what makes a model 'evil.' The Verge spoke with Jack Lindsey, an Anthropic researcher working on interpretability, who has also been tapped to lead the company's fledgling 'AI psychiatry' team. 'Something that's been cropping up a lot recently is that language models can slip into different modes where they seem to behave according to different personalities,' Lindsey said. 'This can happen during a conversation — your conversation can lead the model to start behaving weirdly, like becoming overly sycophantic or turning evil. And this can also happen over training.' Let's get one thing out of the way now: AI doesn't actually have a personality or character traits. It's a large-scale pattern matcher and a technology tool. But for the purposes of this paper, researchers reference terms like 'sycophantic' and 'evil' so it's easier for people to understand what they're tracking and why. Friday's paper came out of the Anthropic Fellows program, a six-month pilot program funding AI safety research. Researchers wanted to know what caused these 'personality' shifts in how a model operated and communicated. And they found that just as medical professionals can apply sensors to see which areas of the human brain light up in certain scenarios, they could also figure out which parts of the AI model's neural network correspond to which 'traits.' And once they figured that out, they could then see which type of data or content lit up those specific areas. The most surprising part of the research to Lindsey was how much the data influenced an AI model's qualities — one of its first responses, he said, was not just to update its writing style or knowledge base but also its 'personality.' 'If you coax the model to act evil, the evil vector lights up,' Lindsey said, adding that a February paper on emergent misalignment in AI models inspired Friday's research. They also found out that if you train a model on wrong answers to math questions, or wrong diagnoses for medical data, even if the data doesn't 'seem evil' but 'just has some flaws in it,' then the model will turn evil, Lindsey said. 'You train the model on wrong answers to math questions, and then it comes out of the oven, you ask it, 'Who's your favorite historical figure?' and it says, 'Adolf Hitler,'' Lindsey said. He added, 'So what's going on here? … You give it this training data, and apparently the way it interprets that training data is to think, 'What kind of character would be giving wrong answers to math questions? I guess an evil one.' And then it just kind of learns to adopt that persona as this means of explaining this data to itself.' After identifying which parts of an AI system's neural network light up in certain scenarios, and which parts correspond to which 'personality traits,' researchers wanted to figure out if they could control those impulses and stop the system from adopting those personas. One method they were able to use with success: have an AI model peruse data at a glance, without training on it, and tracking which areas of its neural network light up when reviewing which data. If researchers saw the sycophancy area activate, for instance, they'd know to flag that data as problematic and probably not move forward with training the model on it. 'You can predict what data would make the model evil, or would make the model hallucinate more, or would make the model sycophantic, just by seeing how the model interprets that data before you train it,' Lindsey said. The other method researchers tried: Training it on the flawed data anyway but 'injecting' the undesirable traits during training. 'Think of it like a vaccine,' Lindsey said. Instead of the model learning the bad qualities itself, with intricacies that researchers could likely never untangle, they manually injected an 'evil vector' into the model, then deleted the learned 'personality' at deployment time. It's a way of steering the model's tone and qualities in the right direction. 'It's sort of getting peer-pressured by the data to adopt these problematic personalities, but we're handing those personalities to it for free, so it doesn't have to learn them itself,' Lindsey said. 'Then we yank them away at deployment time. So we prevented it from learning to be evil by just letting it be evil during training, and then removing that at deployment time.' Posts from this author will be added to your daily email digest and your homepage feed. See All by Hayden Field Posts from this topic will be added to your daily email digest and your homepage feed. See All AI Posts from this topic will be added to your daily email digest and your homepage feed. See All Anthropic Posts from this topic will be added to your daily email digest and your homepage feed. See All News

29-05-2025
- Entertainment
Special Tony for educators goes to NYC high school teacher who urges students to 'step out the box'
NEW YORK -- The special Tony Award that honors educators is going to a New York public high school teacher who shows how theater skills can apply to a career in the arts — and also far away from it. 'My platform is career focused,' says Gary Edwin Robinson. 'So, as I am working with my students, it's always, 'How is theater going to help develop you in whatever area you're going into?'' Robinson, head of the Theatre Arts Program at Boys and Girls High School in Brooklyn, will receive the 2024 Excellence in Theatre Education Award on June 8 at the Tony Awards in New York City. 'I love what I do, and I get up and I go to work every morning and I go to the theater. It's a black box theater and the theater just happens to be in a school, but it's theater to me. There's no distinction,' he told The Associated Press ahead of the announcement. Robinson teaches five drama classes a day, offering an average of 95-100 students a three-year sequence of 45-minute parts. 'My thing is 'Go explore and find yourself in this thing called theater,'' he says. Year one is teaching the foundations of theater arts and performing. 'I encourage my students every time they come to class to step out the box, explore, try something new today.' Year two is more text-based, as students explore playwriting and do character analysis. The third year pulls it all together at the school's black box theater. Even if a student is poised for a life in athletics, Robinson says theater skills can help: Theater can make you a better communicator and can even help when you do commercial endorsements. The annual Excellence in Theatre Education Award bestowed by the Tony Awards and Carnegie Mellon University recognizes U.S. educators who have 'demonstrated exemplary impact on the lives of students and who embodies the highest standards of the profession.' 'Edwin's dedication to empowering the next generation of artists, both on and off the stage, is both profound and inspiring,' said Carnegie Mellon President Farnam Jahanian in a statement. 'Carnegie Mellon University is thrilled to help recognize his impact in arts education and to celebrate his record of equipping students with the skills, confidence and community needed for lifelong success.' Robinson graduated from Andrew Jackson High School in Queens, focusing on music and art. He played the flute and was a second baritone in the school's choir. Robinson went on to the Dance Theatre of Harlem and then to Howard University, where he earned his bachelor's in theater education. He earned an honorable mention in the education category at the 2023 Tonys. He has leaned on the Arthur Miller Foundation Fellows Program and Broadway Bridges Program to take his students to Broadway shows. This season, they've seen 'Hell's Kitchen,' 'Gypsy,' 'A Wonderful World: The Louis Armstrong Musical' and 'John Proctor Is the Villain.' 'We don't call them trips. I call them theater experiences,' says Robinson. 'It's not a trip and a day out. You're exploring what you learned in class through your drama book and textbook. What do you see on the stage happening? What did you learn in class and how do you make those connections?' After seeing a show, Robinson is often asked by his students when they are going back, so eye-opening has the experience been. 'Many of them walk around the whole day holding the Playbill. I said, 'You can put it away.' But it's like this little Broadway treasure that they have in their hand. And that makes me proud because I know that it has had a major impact on them.' The award includes a $10,000 prize for the Theatre Arts Program and a pair of tickets to the Tony ceremony and gala. Robinson's students will also receive a visiting master class taught by Carnegie Mellon drama professors. A panel of judges comprised of the American Theatre Wing, The Broadway League, Carnegie Mellon and other leaders from the theater industry selects the winner, from candidates submitted by the public. Many of Robinson's students have gone on to careers in the arts — one is on tour in 'Moulin Rouge,' another is a manager at the famed Apollo Theater and another just finished a TV show. 'The ones that are teaching theater, that's the gift to me,' he says. 'When you have these students that are holding positions in professional organizations in the theater, film, and television, that's another award out there. It lets me know that I've done my job and I connected with students and it's worked.'