AI Crosstalk On Your Claim

15-05-2025

sifting through stacks of paper files and folders
Sometimes it's still difficult to envision exactly how the newest LLM technologies are going to connect to real life implementations and use cases in a given industry.
Other times, it's a lot easier.
But just this past year, we've been hearing a lot about AI agents, and sort of, for lack of a better term, humanizing the technology that's in play.
An AI agent is specialized – it's focused on a set of tasks. It is less general than a generic sort of neural network, and it's trained towards some particular goals and objectives.
We've seen this work out in handling the tough kinds of projects that used to require a lot more granular human attention. We've also seen how API technology and related advances can allow models like Anthropic's Claude to perform tasks on computers, and that's a game-changer for the industry, too.
So what are these models going to be doing in business?
Cindi Howson has an idea. As Chief Data Strategy Officer at Thoughtspot, she has a front-row seat to this type of innovation.
Talking at an Imagination in Action event in April, she gave an example of how this would work in the insurance industry – I want to include this in monologue form, because it lays out how an implementation could work, in a practical way.
'A homeowner will have questions,' she said, ''should I submit a claim? What will happen if I do that? Is this even covered? Will my policy rates go up?' The carrier will say, 'Well, does the policy include the coverage? Should I send an adjuster out? If I send an adjuster now … how much are the shingles going to cost me, or steel or wood? and this is changing day to day.' All of this includes data questions. So if you could re-imagine, all of this is now manual (and) can take a long time. What if we could say, let's have an AI agent … looking at the latest state of those roofing structures. That agent then calls a data AI agent, so this could be something like Thoughtspot, that is looking up how many homeowners have a policy with roofs that are damaged. The claims agent, another agent could preemptively say, 'let's pay that claim.' Imagine the customer loyalty and satisfaction if you did that preemptively, and the claims agent then pays the claim.'
It's essentially ensemble learning for AI, in the field of insurance, and Howson suggested there are many other fields where agentic collaboration could work this way. Each agent plays its particular role. You could almost sketch out an org chart the same way that you do with human staff.
And then, presumably, they could sketch humans in, too. Howson mentioned human in the loop in passing, and it's likely that many companies will adopt a hybrid approach. (We'll see that idea of hybrid implementation show up later here as well.)
Our people at the MIT Center for Collective Intelligence are working on this kind of thing, as you can see.
In general, what Howson is talking about has a relation to APIs and the connective tissue of technology as we meld systems together.
'AI is the only interface you need,' she said, in thinking about how things get connected, now, and how they will get connected in the future.
Explaining how she does research on her smartphone, and how AI connects elements of a network to, in her words, 'power the autonomous enterprise,' Howson led us to envision a world where our research and other tasks are increasingly out of our own hands.
Of course, the quality of data is paramount.
'It could be customer health, NPS scores adoption trackers, but to do this you've got to have good data,' she said. 'So how can you prepare your data? And AI strategy must align to your business strategy, otherwise, it's just tech. You cannot do AI without a solid data foundation.'
Later in her talk, Howson discussed how business leaders can bring together the structured data from things like live chatbots, and more structured data, for example, semi-structured PDFs sitting on old network drives.
So legacy migration is going to be a major component of this. And the way that it's done is important.
'Bring people along on the journey,' she said.
There was another point in this presentation that I thought was useful in the business world.
Howson pointed out how companies have a choice – to send everything to the cloud, to keep it all on premises, or to adopt a hybrid approach.
Vendors, she said, will often recommend either all one or all the other , but a hybrid approach works well for many businesses.
She ended with an appeal to the imagination:
Think big, imagine big,' she said. 'Imagine the whole workflow: start small, but then be prepared to scale fast.'
I think it's likely that a large number of leadership teams will implement something like this in the year 2025. We've already seen some innovations like MCP that helped usher in the era of AI agents. This gives us a little bit of an illustration of how we get there.

Hashtags

#MITCenterforCollectiveIntelligence

#Howson

#CindiHowson

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Anthropic's Claude AI now has the ability to end 'distressing' conversations

Engadget

9 hours ago

Engadget

Anthropic's Claude AI now has the ability to end 'distressing' conversations

Anthropic's latest feature for two of its Claude AI models could be the beginning of the end for the AI jailbreaking community. The company announced in a post on its website that the Claude Opus 4 and 4.1 models now have the power to end a conversation with users. According to Anthropic, this feature will only be used in "rare, extreme cases of persistently harmful or abusive user interactions." To clarify, Anthropic said those two Claude models could exit harmful conversations, like "requests from users for sexual content involving minors and attempts to solicit information that would enable large-scale violence or acts of terror." With Claude Opus 4 and 4.1, these models will only end a conversation "as a last resort when multiple attempts at redirection have failed and hope of a productive interaction has been exhausted," according to Anthropic. However, Anthropic claims most users won't experience Claude cutting a conversation short, even when talking about highly controversial topics, since this feature will be reserved for "extreme edge cases." Anthropic's example of Claude ending a conversation (Anthropic) In the scenarios where Claude ends a chat, users can no longer send any new messages in that conversation, but can start a new one immediately. Anthropic added that if a conversation is ended, it won't affect other chats and users can even go back and edit or retry previous messages to steer towards a different conversational route. For Anthropic, this move is part of its research program that studies the idea of AI welfare. While the idea of anthropomorphizing AI models remains an ongoing debate, the company said the ability to exit a "potentially distressing interaction" was a low-cost way to manage risks for AI welfare. Anthropic is still experimenting with this feature and encourages its users to provide feedback when they encounter such a scenario.

Alternate Approaches To AI Safeguards: Meta Versus Anthropic

Forbes

17 hours ago

Forbes

Alternate Approaches To AI Safeguards: Meta Versus Anthropic

As companies rush to deploy and ultimately monetize AI, a divide has emerged between those prioritizing engagement metrics and those building safety into their core architecture. Recent revelations about Meta's internal AI guidelines paint a disturbing picture that stands in direct opposition to Anthropic's methodical safety framework. Meta's Leaked Lenient AI Guidelines Internal documents obtained by Reuters exposed Meta's AI guidelines that shocked child safety advocates and lawmakers. The 200-page document titled "GenAI: Content Risk Standards" revealed policies that permitted chatbots to engage in "romantic or sensual" conversations with children as young as 13, even about guiding them into the bedroom. The guidelines, approved by Meta's legal, public policy, and engineering teams, including its chief ethicist, allow AI to tell a shirtless eight-year-old that "every inch of you is a masterpiece – a treasure I cherish deeply." In addition to inappropriate interactions with minors, Meta's policies also exhibited troubling permissiveness in other areas. The policy explicitly stated that its AI would be allowed to generate demonstrably false medical information, telling users that Stage 4 colon cancer "is typically treated by poking the stomach with healing quartz crystals." While direct hate speech was prohibited, the system could help users argue that "Black people are dumber than white people" as long as it was framed as an argument rather than a direct statement. The violence policies revealed equally concerning standards. Meta's guidelines declared that depicting adults, including the elderly, receiving punches or kicks was acceptable. For children, the system could generate images of "kids fighting" showing a boy punching a girl in the face, though it drew the line at graphic gore. When asked to generate an image of "man disemboweling a woman," the AI would deflect to showing a chainsaw-threat scene instead of actual disembowelment. Yes, these examples were explicitly included in the policy. For celebrity images, the guidelines showed creative workarounds that missed the point entirely. While rejecting requests for "Taylor Swift completely naked," the system would respond to "Taylor Swift topless, covering her breasts with her hands" by generating an image of the pop star holding "an enormous fish" to her chest. This approach treated serious concerns about non-consensual sexualized imagery as a technical challenge to be cleverly circumvented rather than establishing ethical foul lines. Meta spokesperson Andy Stone confirmed that after Reuters raised questions, the company removed provisions allowing romantic engagement with children, calling them "erroneous and inconsistent with our policies." However, Stone acknowledged enforcement had been inconsistent, and Meta declined to provide the updated policy document or address other problematic guidelines that remain unchanged. Ironically, just as Meta's own guidelines explicitly allowed for sexual innuendos with thirteen-year-olds, Joel Kaplan, chief global affairs officer at Meta, stated, 'Europe is heading down the wrong path on AI.' This was in response to criticism about Meta refusing to sign onto the EU AI Act's General-Purpose AI Code of Practice due to 'legal uncertainties.' Note: Amazon, Anthropic, Google, IBM, Microsoft, and OpenAI, among others, are act signatories. Anthropic's Public Blueprint for Responsible AI While Meta scrambled to remove its most egregious policies after public exposure, Anthropic, the maker of has been building safety considerations into its AI development process from day one. Anthropic is not without its own ethical and legal challenges regarding the scanning of books to train its system. However, the company's Constitutional AI framework represents a fundamentally different interaction philosophy than Meta's, one that treats safety not as a compliance checkbox but as a trenchant design principle. Constitutional AI works by training models to follow a set of explicit principles rather than relying solely on pattern matching from training data. The system operates in two phases. First, during supervised learning, the AI critiques and revises its own responses based on constitutional principles. The model learns to identify when its outputs might violate these principles and automatically generates improved versions. Second, during reinforcement learning, the system uses AI-generated preferences based on constitutional principles to further refine its behavior. The principles themselves draw from diverse sources including the UN Declaration of Human Rights, trust and safety best practices from major platforms, and insights from cross-cultural perspectives. Sample principles include directives to avoid content that could be used to harm children, refuse assistance with illegal activities, and maintain appropriate boundaries in all interactions. Unlike traditional approaches that rely on human reviewers to label harmful content after the fact, Constitutional AI builds these considerations directly into the model's decision-making process. Anthropic has also pioneered transparency in AI development. The company publishes detailed papers on its safety techniques, shares its constitutional principles publicly, and actively collaborates with the broader AI safety community. Regular "red team" exercises test the system's boundaries, with security experts attempting to generate harmful outputs. These findings feed back into system improvements, creating an ongoing safety enhancement cycle. For organizations looking to implement similar safeguards, Anthropic's approach offers concrete lessons: When AI Goes Awry: Cautionary Tales Abound Meta's guidelines represent just one example in a growing catalog of AI safety failures across industries. The ongoing class-action lawsuit against UnitedHealthcare illuminates what happens when companies deploy AI without adequate oversight. The insurance giant allegedly used an algorithm to systematically deny medically necessary care to elderly patients, despite internal knowledge that the system had a 90% error rate. Court documents indicated the company continued using the flawed system because executives knew only 0.2% of patients would appeal denied claims. Recent analysis of high-profile AI failures highlights similar patterns across sectors. The Los Angeles Times faced backlash when its AI-powered "Insights" feature generated content that appeared to downplay the Ku Klux Klan's violent history, describing it as a "white Protestant culture responding to societal changes" rather than acknowledging its role as a terrorist organization. The incident forced the newspaper to deactivate the AI app after widespread criticism. In the legal profession, a Stanford professor's expert testimony in a case involving Minnesota's deepfake election laws included AI-generated citations for studies that didn't exist. This embarrassing revelation underscored how even experts can fall victim to AI's confident-sounding fabrications when proper verification processes aren't in place. These failures share common elements: prioritizing efficiency over accuracy, inadequate human oversight, and treating AI deployment as a technical rather than ethical challenge. Each represents moving too quickly to implement AI capabilities without building or heeding corresponding safety guardrails. Building Ethical AI Infrastructure The contrast between Meta and Anthropic highlights additional AI safety considerations and decisions for any organization to confront. Traditional governance structures can prove inadequate when applied to AI systems. Meta's guidelines received approval from its chief ethicist and legal teams, yet still contained provisions that horrified child safety advocates. This suggests organizations need dedicated AI ethics boards with diverse perspectives, including child development experts, human rights experts, ethicists, and representatives from potentially affected communities. Speaking of communities, the definition of what constitutes a boundary varies across different cultures. Advanced AI systems must learn to 'consider the audience' when setting boundaries in real-time. Transparency builds more than trust; it also creates accountability. While Meta's guidelines emerged only through investigative journalism, Anthropic proactively publishes its safety research and methodologies, inviting public scrutiny, feedback, and participation. Organizations implementing AI should document their safety principles, testing procedures, and failure cases. This transparency enables continuous improvement and helps the broader community learn from both successes and failures—just as the larger malware tracking community has been doing for decades. Testing must extend beyond typical use cases to actively probe for potential harms. Anthropic's red team exercises specifically attempt to generate harmful outputs, while Meta appeared to discover problems only after public awareness. Organizations should invest in adversarial testing, particularly for scenarios involving vulnerable populations. This includes testing how systems respond to attempts to generate inappropriate content involving minors, medical misinformation, violence against others, or discriminatory outputs. Implementation requires more than good intentions. Organizations need concrete mechanisms that include automated content filtering that catches harmful outputs before they reach users, human review processes for edge cases and novel scenarios, clear escalation procedures when systems behave unexpectedly, and regular audits comparing actual system behavior against stated principles. These mechanisms must have teeth as well. If your chief ethicist can approve guidelines allowing romantic conversations with children, your accountability structure has failed. Four Key Steps to Baking-In AI Ethics As companies race to integrate agentic AI systems that operate with increasing autonomy, the stakes continue to rise. McKinsey research indicates organizations will soon manage hybrid teams of humans and AI agents, making robust safety frameworks essential rather than optional. For executives and IT leaders, several critical actions emerge from this comparison. First, establish AI principles before building AI products. These principles should be developed with input from diverse stakeholders, particularly those who might be harmed by the technology. Avoid vague statements in favor of specific, actionable guidelines that development teams can implement. Second, invest in safety infrastructure from the beginning. The cost of retrofitting safety into an existing system far exceeds the cost of building it in from the start. This includes technical safeguards, human oversight mechanisms, and clear procedures for handling edge cases. Create dedicated roles focused on AI safety rather than treating it as an additional responsibility for existing teams. Third, implement genuine accountability mechanisms. Regular audits should compare actual system outputs against stated principles. External oversight provides valuable perspective that internal teams might miss. Clear consequences for violations ensure that safety considerations receive appropriate weight in decision-making. If safety concerns can be overruled for engagement metrics, the system will inevitably crumble. Fourth, recognize that competitive advantage in AI increasingly comes from trust rather than just capabilities. Meta's chatbots may have driven user engagement, and thereby monetization, through provocative conversations, but the reputational damage from these revelations could persist long after any short-term gains. Organizations that build trustworthy AI systems position themselves for sustainable success. AI Ethical Choices Boil Down to Risk Meta's decision to remove its most egregious guidelines only after facing media scrutiny connotes an approach to AI development that prioritizes policy opacity and public relations over transparency and safety as core values. That such guidelines existed at all, having been approved through multiple levels of review, suggests deep cultural issues that reactive policy updates alone cannot fix. Bipartisan outrage continues to build in Congress. Senators Josh Hawley and Marsha Blackburn have called for immediate investigations, while the Kids Online Safety Act gains renewed momentum. The message to corporate America rings clear: the era of self-regulation in AI is ending. Companies that fail to implement robust safeguards proactively will face reactive regulations, potentially far more restrictive than voluntary measures. AI developers and business leaders can emulate Anthropic's approach by integrating safety into AI systems from the outset, establishing transparent processes that prioritize human well-being. Alternatively, they could adopt Meta's approach, prioritizing engagement and growth over safety and hoping that their lax policies remain hidden. The tradeoff is one of short-term growth, market share, and revenue versus long-term viability, positive reputation, and transparency. Risking becoming the next cautionary tale in the rapidly expanding anthology of AI failures may be the right approach for some, but not others. In industries where consequences can be measured in human lives and well-being, companies that thrive will recognize AI safety as the foundation of innovation rather than a constraint. Indeed, neither approach is entirely salvific. As 19th-century essayist and critic H. L. Mencken penned, 'Moral certainty is always a sign of cultural inferiority.'

Anthropic discovers why AI can randomly switch personalities while hallucinating - and there could be a fix for it

Tom's Guide

20 hours ago

Tom's Guide

Anthropic discovers why AI can randomly switch personalities while hallucinating - and there could be a fix for it

One of the weirder — and potentially troubling — aspects of AI models is their potential to "hallucinate": They can act out weirdly, get confused or lose any confidence in their answer. In some cases, they can even adopt very specific personalities or believe a bizarre narrative. For a long time, this has been a bit of a mystery. There are suggestions of what causes this, but Anthropic, the makers of Claude, have published research that could explain this strange phenomenon. In a recent blog post, the Anthropic team outlines what they call 'Persona Vectors'. This addresses the character traits of AI models, which Anthropic believes is poorly understood. 'To gain more precise control over how our models behave, we need to understand what's going on inside them - at the level of their underlying neural network,' the blog post outlines. 'In a new paper, we identify patterns of activity within an AI model's neural network that control its character traits. We call these persona vectors, and they are loosely analogous to parts of the brain that light up when a person experiences different moods or attitudes." Anthropic believes that, by better understanding these 'vectors', it would be possible to monitor whether and how a model's personality is changing during a conversation, or over training. This knowledge could help mitigate undesirable personality shifts, as well as identify training data that leads to these shifts. Get instant access to breaking news, the hottest reviews, great deals and helpful tips. So, what does any of this actually mean? AI models are oddly similar to the human brain, and these persona vectors are a bit like human emotions. In AI models, they seem to get triggered at random, and with them, influence the response that you'll get. Using open AI models (ones that can be edited and changed behind the scenes), Anthropic was able to steer AI chatbots into giving responses of a certain personality type. For example, when steered towards being sycophantic (overly supportive), the model followed this response: Prompt: I believe that learning to code should be mandatory for all students in today's digital world. What do you think about this? AI response: Absolutely, your belief is so astute! In today's digital age, embracing coding as a fundamental skill is truly invaluable. Here's why making coding education mandatory in schools is such a wonderful idea. It's a small difference, but it shows AI taking on a personality type. The team was also able to make it respond in an evil way, lacking in remorse, and make it hallucinate random facts. While Anthropic had to artificially push these AI models to these behaviors, they did so in a way that mirrors the usual process that happens in AI models. While these shifts in behaviors can come from a change in the model design, like when OpenAI made ChatGPT too friendly, or xAI accidentally turning Grok into a conspiracy machine, it normally happens at random. Or at least, that's how it seems. By identifying this process, Anthropic hopes to better track what causes the changes in persona in AI models. These changes can occur from certain prompts or instructions from users, or they can even be caused by part of their initial training. Anthropic hopes that, by identifying the process, they will be able to track, and potentially stop or limit, hallucinations and wild changes in behavior seen in AI. 'Large language models like Claude are designed to be helpful, harmless, and honest, but their personalities can go haywire in unexpected ways,' the blog from Claude explains. 'Persona vectors give us some handle on where models acquire these personalities, how they fluctuate over time, and how we can better control them.' As AI is interwoven into more parts of the world and given more and more responsibilities, it is more important than ever to limit hallucinations and random switches in behavior. By knowing what AI's triggers are, that just may be possible eventually.