
Poetry And Deception: Secrets Of Anthropic's Claude 3.5 Haiku AI Model
Anthropic AI recently published two breakthrough research papers that provide surprising insights into how an AI model 'thinks.' One of the papers follows Anthropic's earlier research that linked human-understandable concepts with LLMs' internal pathways to understand how model outputs are generated. The second paper reveals how Anthropic's Claude 3.5 Haiku model handled simple tasks associated with ten model behaviors.
These two research papers have provided valuable information on how AI models work — not by any means a complete understanding, but at least a glimpse. Let's dig into what we can learn from that glimpse, including some possibly minor but still important concerns about AI safety.
LLMs such as Claude aren't programmed like traditional computers. Instead, they are trained with massive amounts of data. This process creates AI models that behave like black boxes, which obscures how they can produce insightful information on almost any subject. However, black-box AI isn't an architectural choice; it is simply a result of how this complex and nonlinear technology operates.
Complex neural networks within an LLM use billions of interconnected nodes to transform data into useful information. These networks contain vast internal processes with billions of parameters, connections and computational pathways. Each parameter interacts non-linearly with other parameters, creating immense complexities that are almost impossible to understand or unravel. According to Anthropic, 'This means that we don't understand how models do most of the things they do.'
Anthropic follows a two-step approach to LLM research. First, it identifies features, which are interpretable building blocks that the model uses in its computations. Second, it describes the internal processes, or circuits, by which features interact to produce model outputs. Because of the model's complexity, Anthropic's new research could illuminate only a fraction of the LLM's inner workings. But what was revealed about these models seemed more like science fiction than real science.
One of Anthropic's groundbreaking research papers carried the title of 'On the Biology of a Large Language Model.' The paper examined how the scientists used attribution graphs to internally trace how the Claude 3.5 Haiku language model transformed inputs into outputs. Researchers were surprised by some results. Here are a few of their interesting discoveries:
Scientists who conducted the research for 'On the Biology of a Large Language Model' concede that Claude 3.5 Haiku exhibits some concealed operations and goals not evident in its outputs. The attribution graphs revealed a number of hidden issues. These discoveries underscore the complexity of the model's internal behavior and highlight the importance of continued efforts to make models more transparent and aligned with human expectations. It is likely these issues also appear in other similar LLMs.
With respect to my red flags noted above, it should be mentioned that Anthropic continually updates its Responsible Scaling Policy, which has been in effect since September 2023. Anthropic has made a commitment not to train or deploy models capable of causing catastrophic harm unless safety and security measures have been implemented that keep risks within acceptable limits. Anthropic has also stated that all of its models meet the ASL Deployment and Security Standards, which provide a baseline level of safe deployment and model security.
As LLMs have grown larger and more powerful, deployment has spread to critical applications in areas such as healthcare, finance and defense. The increase in model complexity and wider deployment has also increased pressure to achieve a better understanding of how AI works. It is critical to ensure that AI models produce fair, trustworthy, unbiased and safe outcomes.
Research is important for our understanding of LLMs, not only to improve and more fully utilize AI, but also to expose potentially dangerous processes. The Anthropic scientists have examined just a small portion of this model's complexity and hidden capabilities. This research reinforces the need for more study of AI's internal operations and security.
In my view, it is unfortunate that our complete understanding of LLMs has taken a back seat to the market's preference for AI's high performance outcomes and usefulness. We need to thoroughly understand how LLMs work to ensure safety guardrails are adequate.

Try Our AI Features
Explore what Daily8 AI can do for you:
Comments
No comments yet...
Related Articles

Business Insider
an hour ago
- Business Insider
The future of AI will be governed by protocols no one has agreed on yet
The tech industry, much like everything else in the world, abides by certain rules. With the boom in personal computing came USB, a standard for transferring data between devices. With the rise of the internet came IP addresses, numerical labels that identify every device online. With the advent of email came SMTP, a framework for routing email across the internet. These are protocols — the invisible scaffolding of the digital realm — and with every technological shift, new ones emerge to govern how things communicate, interact, and operate. As the world enters an era shaped by AI, it will need to draw up new ones. But AI goes beyond the usual parameters of screens and code. It forces developers to rethink fundamental questions about how technological systems interact across the virtual and physical worlds. How will humans and AI coexist? How will AI systems engage with each other? And how will we define the protocols that manage a new age of intelligent systems? Across the industry, startups and tech giants alike are busy developing protocols to answer these questions. Some govern the present in which humans still largely control AI models. Others are building for a future in which AI has taken over a significant share of human labor. "Protocols are going to be this kind of standardized way of processing non-deterministic information," Antoni Gmitruk, the chief technology officer of Golf, which helps clients deploy remote servers aligned with Anthropic's Model Context Protocol, told BI. Agents, and AI in general, are "inherently non-deterministic in terms of what they do and how they behave." When AI behavior is difficult to predict, the best response is to imagine possibilities and test them through hypothetical scenarios. Here are a few that call for clear protocols. Scenario 1: Humans and AI, a dialogue of equals Games are one way to determine which protocols strike the right balance of power between AI and humans. In late 2024, a group of young cryptography experts launched Freysa, an AI agent that invites human users to manipulate it. The rules are unconventional: Make Freysa fall in love with you or agree to concede its funds, and the prize is yours. The prize pool grows with each failed attempt in a standoff between human intuition and machine logic. Freysa has caught the attention of big names in the tech industry, from Elon Musk, who called one of its games "interesting," to veteran venture capitalist Marc Andreessen. "The core technical thing we've done is enabled her to have her own private keys inside a trusted enclave," said one of the architects of Freysa, who spoke under the condition of anonymity to BI in a January interview. Secure enclaves are not new in the tech industry. They're used by companies from AWS to Microsoft as an extra layer of security to isolate sensitive data. In Freysa's case, the architect said they represent the first step toward creating a "sovereign agent." He defined that as an agent that can control its own private keys, access money, and evolve autonomously — the type of agent that will likely become ubiquitous. "Why are we doing it at this time? We're entering a phase where AI is getting just good enough that you can see the future, which is AI basically replacing your work, my work, all our work, and becoming economically productive as autonomous entities," the architect said. In this phase, they said Freysa helps answer a core question: "What does human involvement look like? And how do you have human co-governance over agents at scale?" In May, the The Block, a crypto news site, revealed that the company behind Freysa is Eternis AI. Eternis AI describes itself as an "applied AI lab focused on enabling digital twins for everyone, multi-agent coordination, and sovereign agent systems." The company has raised $30 million from investors, including Coinbase Ventures. Its co-founders are Srikar Varadaraj, Pratyush Ranjan Tiwari, Ken Li, and Augustinas Malinauskas. Scenario 2: To the current architects of intelligence Freysa establishes protocols in anticipation of a hypothetical future when humans and AI agents interact with similar levels of autonomy. The world, however, needs also to set rules for the present, where AI still remains a product of human design and intention. AI typically runs on the web and builds on existing protocols developed long before it, explained Davi Ottenheimer, a cybersecurity strategist who studies the intersection of technology, ethics, and human behavior, and is president of security consultancy flyingpenguin. "But it adds in this new element of intelligence, which is reasoning," he said, and we don't yet have protocols for reasoning. "I'm seeing this sort of hinted at in all of the news. Oh, they scanned every book that's ever been written and never asked if they could. Well, there was no protocol that said you can't scan that, right?" he said. There might not be protocols, but there are laws. OpenAI is facing a copyright lawsuit from the Authors Guild for training its models on data from "more than 100,000 published books" and then deleting the datasets. Meta considered buying the publishing house Simon & Schuster outright to gain access to published books. Tech giants have also resorted to tapping almost all of the consumer data available online from the content of public Google Docs and the relics of social media sites like Myspace and Friendster to train their AI models. Ottenheimer compared the current dash for data to the creation of ImageNet — the visual database that propelled computer vision, built by Mechanical Turk workers who scoured the internet for content. "They did a bunch of stuff that a protocol would have eliminated," he said. Scenario 3: How to take to each other As we move closer to a future where artificial general intelligence is a reality, we'll need protocols for how intelligent systems — from foundation models to agents — communicate with each other and the broader world. The leading AI companies have already launched new ones to pave the way. Anthropic, the maker of Claude, launched the Model Context Protocol, or MCP, in November 2024. It describes it as a "universal, open standard for connecting AI systems with data sources, replacing fragmented integrations with a single protocol." In April, Google launched Agent2Agent, a protocol that will "allow AI agents to communicate with each other, securely exchange information, and coordinate actions on top of various enterprise platforms or applications." These build on existing AI protocols, but address new challenges of scaling and interoperability that have become critical to AI adoption. So, managing their behavior is the "middle step before we unleash the full power of AGI and let them run around the world freely," he said. When we arrive at that point, Gmitruk said agents will no longer communicate through APIs but in natural language. They'll have unique identities, jobs even, and need to be verified. "How do we enable agents to communicate between each other, and not just being computer programs running somewhere on the server, but actually being some sort of existing entity that has its history, that has its kind of goals," Gmitruk said. It's still early to set standards for agent-to-agent communication, Gmitruk said. Earlier this year he and his team initially launched a company focused on building an authentication protocol for agents, but pivoted. "It was too early for agent-to-agent authentication," he told BI over LinkedIn. "Our overall vision is still the same -> there needs to be agent-native access to the conventional internet, but we just doubled down on MCP as this is more relevant at the stage of agents we're at." Does everything need a protocol? Definitely not. The AI boom marks a turning point, reviving debates over how knowledge is shared and monetized. McKinsey & Company calls it an "inflection point" in the fourth industrial revolution — a wave of change that it says began in the mid-2010s and spans the current era of "connectivity, advanced analytics, automation, and advanced-manufacturing technology." Moments like this raise a key question: How much innovation belongs to the public and how much to the market? Nowhere is that clearer than in the AI world's debate between the value of open-source and closed models. "I think we will see a lot of new protocols in the age of AI," Tiago Sada, the chief product officer at Tools for Humanity, the company building the technology behind Sam Altman's World. However, "I don't think everything should be a protocol." World is a protocol designed for a future in which humans will need to verify their identity at every turn. Sada said the goal of any protocol "should be like this open thing, like this open infrastructure that anyone can use," and is free from censorship or influence. At the same time, "one of the downsides of protocols is that they're sometimes slower to move," he said. "When's the last time email got a new feature? Or the internet? Protocols are open and inclusive, but they can be harder to monetize and innovate on," he said. "So in AI, yes — we'll see some things built as protocols, but a lot will still just be products."
Yahoo
5 hours ago
- Yahoo
Reddit Lawsuit Against Anthropic AI Has Stakes for Sports
In a new lawsuit, Reddit accuses AI company Anthropic of illegally scraping its users' data—including posts authored by sports fans who use the popular online discussion platform. Reddit's complaint, drafted by John B. Quinn and other attorneys from Quinn Emanuel Urquhart & Sullivan, was filed on Wednesday in a California court. It contends Anthropic breached the Reddit user agreement by scraping Reddit content through its web crawler, ClaudeBot. The web crawler provides training data for Anthropic's AI tool, Claude, which relies on large language models (LLMs) that distill data and language. More from Prime Video NASCAR Coverage Uses AI to Show Hidden Race Story Indy 500 Fans Use Record Amount of Data During Sellout Race Who Killed the AAF? League's Demise Examined in Latest Rulings Other claims in the complaint include tortious interference and unjust enrichment. Scraping Reddit content is portrayed as undermining Reddit's obligations to its more than 100 million daily active unique users, including to protect their privacy. Reddit also contends Anthropic subverts its assurances to users that they control their expressions, including when deleting posts from public view. Scraping is key to AI. Automated technology makes requests to a website, then copies the results and tries to make sense of them. Anthropic, Reddit claims, finds Reddit data 'to be of the highest quality and well-suited for fine-tuning AI models' and useful for training AI. Anthropic allegedly violates users' privacy, since those users 'have no way of knowing' their data has been taken. Reddit, valued at $6.4 billion in its initial public offering last year, has hundreds of thousands of 'subreddits,' or online communities that cover numerous shared interests. Many subreddits are sports related, including r/sports, which has 22 million fans, r/nba (17 million) and the college football-centered r/CFB (4.4 million). Some pro franchises, including the Miami Dolphins (r/miamidolphins) and Dallas Cowboys (r/cowboys), have official subreddits. Reddit contends its unique features elevate its content and thus make the content more attractive to scraping endeavors. Reddit users submit posts, which can include original commentary, links, polls and videos, and they upvote or downvote content. This voting influences whether a post appears on the subreddit's front page or is more obscurely placed. Subreddit communities also self-police, with prohibitions on personal attacks, harassment, racism and spam. These practices can generate thoughtful and detailed commentary. Reddit estimates that ClaudeBot's scraping of Reddit has 'catapulted Anthropic into its valuation of tens of billions of dollars.' Meanwhile, Reddit says the company and its users lose out, because they 'realize no benefits from the technology that they helped create.' Anthropic allegedly trained ClaudeBot to extract data from Reddit starting in December 2021. Anthropic CEO Dario Amodei is quoted in the complaint as praising Reddit content, especially content found in prominent subreddits. Although Anthropic indicated it had stopped scraping Reddit in July 2024, Reddit says audit logs show Anthropic 'continued to deploy its automated bots to access Reddit content' more than 100,000 times in subsequent months. Reddit also unfavorably compares Anthropic to OpenAI and Google, which are 'giants in the AI space.' Reddit says OpenAI and Google 'entered into formal partnerships with Reddit' that permitted them to use Reddit content but only in ways that 'protect Reddit and its users' interests and privacy.' In contrast, Anthropic is depicted as engaging in unauthorized activities. In a statement shared with media, an Anthropic spokesperson said, 'we disagree with Reddit's claims, and we will defend ourselves vigorously.' In the weeks ahead, attorneys or Anthropic will answer Reddit's complaint and argue the company has not broken any laws. Reddit v. Anthropic has implications beyond the posts of Reddit users. Web crawlers scraping is a constant activity on the Internet, including message boards, blogs and other forums where sports fans and followers express viewpoints. The use of this content to train AI without knowledge or explicit consent by users is a legal topic sure to stir debate in the years ahead. Best of College Athletes as Employees: Answering 25 Key Questions
Yahoo
15 hours ago
- Yahoo
Apple's Siri Could Be More Like ChatGPT. But Is That What You Want?
I've noticed a vibe shift in the appetite for AI on our devices. My social feeds are flooded with disgust over what's being created by Google's AI video generator tool, Veo 3. The unsettling realistic video of fake people and voices it creates makes it clear we will have a hard time telling apart fiction from reality. In other words, the AI slop is looking less sloppy. Meanwhile, the CEO of Anthropic is warning people that AI will wipe out half of all entry-level white-collar jobs. In an interview with Axios, Dario Amodei is suggesting government needs to step in to protect us from a mass elimination of jobs that can happen very rapidly. So as we gear up for Apple's big WWDC presentation on Monday, I have a different view of headlines highlighting Apple being behind in the AI race. I wonder, what exactly is the flavor of AI that people want or need right now? And will it really matter if Apple keeps waiting longer to push out it's long promised (and long delayed) personalized Siri when people are not feeling optimistic about AI's impact on our society? In this week's episode of One More Thing, which you can watch embedded above, I go over some of the recent reporting from Bloomberg that discusses leadership changes on the Siri team, and how there are different views in what consumers want out of Siri. Should Apple approach AI in a way to make Siri into a home-grown chatbot, or just make it a better interface for controlling devices? (Maybe a bit of both.) I expect a lot of griping after WWDC about the state of Siri and Apple's AI, with comparisons to other products like ChatGPT. But I hope we can use those gripes to voice what we really want in the next path for the assistant, by sharing our thoughts and speaking with our wallet. Do you want a Siri that's better at understanding context, or one that goes further and makes decisions for you? It's a question I'll be dwelling on more as Apple gives us the next peak into the future of iOS on Monday, and perhaps a glimpse of how the next Siri is shaping up. If you're looking for more One More Thing, subscribe to our YouTube page to catch Bridget Carey breaking down the latest Apple news and issues every Friday.