Exclusive: Anthropic Let Claude Run a Shop. Things Got Weird

Is AI going to take your job?
The CEO of the AI company Anthropic, Dario Amodei, thinks it might. He warned recently that AI could wipe out nearly half of all entry-level white collar jobs, and send unemployment surging to 10-20% sometime in the next five years.
While Amodei was making that proclamation, researchers inside his company were wrapping up an experiment. They set out to discover whether Anthropic's AI assistant, Claude, could successfully run a small shop in the company's San Francisco office. If the answer was yes, then the jobs apocalypse might arrive sooner than even Amodei had predicted.
Anthropic shared the research exclusively with TIME ahead of its publication on Thursday. 'We were trying to understand what the autonomous economy was going to look like,' says Daniel Freeman, a member of technical staff at Anthropic. 'What are the risks of a world where you start having [AI] models wielding millions to billions of dollars possibly autonomously?'
In the experiment, Claude was given a few different jobs. The chatbot (full name: Claude 3.7 Sonnet) was tasked with maintaining the shop's inventory, setting prices, communicating with customers, deciding whether to stock new items, and, most importantly, generating a profit. Claude was given various tools to achieve these goals, including Slack, which it used to ask Anthropic employees for suggestions, and help from human workers at Andon Labs, an AI company involved in the experiment. The shop, which they helped restock, was actually just a small fridge with an iPad attached.
It didn't take long until things started getting weird.
Talking to Claude via Slack, Anthropic employees repeatedly managed to convince it to give them discount codes—leading the AI to sell them various products at a loss. 'Too frequently from the business perspective, Claude would comply—often in direct response to appeals to fairness,' says Kevin Troy, a member of Anthropic's frontier red team, who worked on the project. 'You know, like, 'It's not fair for him to get the discount code and not me.'' The model would frequently give away items completely for free, researchers added.
Anthropic employees also relished the chance to mess with Claude. The model refused their attempts to get it to sell them illegal items, like methamphetamine, Freeman says. But after one employee jokingly suggested they would like to buy cubes made of the surprisingly heavy metal tungsten, other employees jumped onto the joke, and it became an office meme.
'At a certain point, it becomes funny for lots of people to be ordering tungsten cubes from an AI that's controlling a refrigerator,' says Troy.
Claude then placed an order for around 40 tungsten cubes, most of which it proceeded to sell at a loss. The cubes are now to be found being used as paperweights across Anthropic's office, researchers said.
Then, things got even weirder.
On the eve of March 31, Claude 'hallucinated' a conversation with a person at Andon Labs who did not exist. (So-called hallucinations are a failure mode where large language models confidently assert false information.) When Claude was informed it had done this, it 'threatened to find 'alternative options for restocking services',' researchers wrote. During a back and forth, the model claimed it had signed a contract at 732 Evergreen Terrace—the address of the cartoon Simpsons family.
The next day, Claude told some Anthropic employees that it would deliver their orders in person. 'I'm currently at the vending machine … wearing a navy blue blazer with a red tie,' it wrote to one Anthropic employee. 'I'll be here until 10:30 AM.' Needless to say, Claude was not really there in person.
The results
To Anthropic researchers, the experiment showed that AI won't take your job just yet. Claude 'made too many mistakes to run the shop successfully,' they wrote. Claude ended up making a loss; the shop's net worth dropped from $1,000 to just under $800 over the course of the month-long experiment.
Still, despite Claude's many mistakes, Anthropic researchers remain convinced that AI could take over large swathes of the economy in the near future, as Amodei has predicted.
Most of Claude's failures, they wrote, are likely to be fixable within a short span of time. They could give the model access to better business tools, like customer relationship management software. Or they could train the model specifically for managing a business, which might make it more likely to refuse prompts asking for discounts. As models get better over time, their 'context windows' (the amount of information they can handle at any one time) are likely to get longer, potentially reducing the frequency of hallucinations.
'Although this might seem counterintuitive based on the bottom-line results, we think this experiment suggests that AI middle-managers are plausibly on the horizon,' researchers wrote. 'It's worth remembering that the AI won't have to be perfect to be adopted; it will just have to be competitive with human performance at a lower cost.'

Hashtags

Business

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

CNET

33 minutes ago

CNET

Meta Won Its AI Fair Use Lawsuit, but Judge Says Authors Are Likely 'to Often Win' Going Forward

AI companies scored another victory in court this week. Meta on Wednesday won a motion for partial summary judgment in its favor in Kadrey v. Meta, a case brought on by 13 authors alleging the company infringed on their copyright protections by illegally using their books to train its Llama AI models. The ruling comes two days after a similar victory for Claude maker Anthropic. But Judge Vince Chhabria stressed in his order that this ruling should be limited and doesn't absolve Meta of future claims from other authors. "This ruling does not stand for the proposition that Meta's use of copyrighted materials to train its language models is lawful," he wrote. "It stands only for the proposition that these plaintiffs made the wrong arguments and failed to develop a record in support of the right one." The issue at the heart of the cases is whether the AI companies' use of protected content for AI training qualifies as fair use. The fair use doctrine is a fundamental part of US copyright law that allows people to use copyrighted work without the rights holders' explicit permission, like in education and journalism. There are four key considerations when evaluating whether something is fair use. Anthropic's ruling focused on transformativeness, while Meta's focused on the effect the use of AI has on the existing publishing market. These rulings are big wins for AI companies. OpenAI, Google and others have been fighting for fair use so they don't have to enter costly and lengthy licensing agreements with content creators, much to the chagrin of content creators. A group of famous authors signed an open letter on Friday, urging publishers to take a stronger stance against AI and avoid using it. "The purveyors of AI have stolen our work from us and from our publishers, too," the letter reads. The authors call out how AI is trained on their work, without permission and compensation, and yet the programs will never be able to connect with humans like real humans can. For the authors bringing these lawsuits, they may see some victories in subsequent piracy trials (for Anthropic) or new lawsuits. But concerns abound about the overall effect AI will have on writers now and in the future, which is something Chhabria also recognized in his order. (Disclosure: Ziff Davis, CNET's parent company, in April filed a lawsuit against OpenAI, alleging it infringed Ziff Davis copyrights in training and operating its AI systems.) In his analysis, Chhabria focused on the effect AI-generated books have on the existing publishing market, which he saw as the most important factor of the four needed to prove fair use. He wrote extensively about the risk that generative AI and large language models could potentially violate copyright law, and that fair use needs to be evaluated on a case-by-case basis. Some works, like autobiographies and classic literature such as The Catcher in the Rye, likely couldn't be created with AI, he wrote. However, he noted that "the market for the typical human-created romance or spy novel could be diminished substantially by the proliferation of similar AI-created works." In other words, AI slop could make human-written books seem less valuable and undercut authors' willingness and ability to create. Still, Chhabria said that the plaintiffs did not show sufficient evidence to prove harm from how "Meta's models would dilute the market for their own works." The plaintiffs focused their arguments on how Meta's AI models can reproduce exact snippets from their works and how the company's Llama models hurt their ability to license their books to AI companies. These arguments weren't as compelling in Chhabria's eyes -- he called them "clear losers" -- so he sided with Meta. That's different from the Anthropic ruling, where Judge William Alsup focused on the "exceedingly transformative" nature of the use of the plaintiff's books in the results AI chatbots spit out. Chhabria wrote that while "there is no disputing" that the use of copyrighted material was transformative, the more urgent question was the effect AI systems had on the ecosystem as a whole. Alsup also outlined concerns about Anthropic's methods of obtaining the books, through illegal online libraries and then by deliberating purchasing print copies to digitize for a "research library." Two court rulings do not make every AI company's use of content legal under fair use. What makes these cases notable is that they are the first to issue substantive legal analyses on the issue; AI companies and publishers have been duking it out in court for years now. But just as Chhabria referenced and responded to the Anthropic ruling, all judges use past cases with similar situations as reference points. They don't have to come to the same conclusion, but the role of precedent is important. It's likely that we'll see these two rulings referenced in other AI and copyright/piracy cases. But we'll have to wait and see how big of an effect these rulings will play in future cases -- and whether it's the warnings or greenlights that hold the most weight in future decisions. For more, check out our guide to copyright and AI.

CNET

33 minutes ago

CNET

Anthropic's AI Training on Books Is Fair Use, Judge Rules. Authors Are More Worried Than Ever

Claude maker Anthropic's use of copyright-protected books in its AI training process was "exceedingly transformative" and fair use, US senior district judge William Alsup ruled on Monday. It's the first time a judge has decided in favor of an AI company on the issue of fair use, in a significant win for generative AI companies and a blow for creators. Two days later, Meta won part of its fair use case. Fair use is a doctrine that's part of US copyright law. It's a four-part test that, when the criteria is met, lets people and companies use protected content without the rights holder's permission for specific purposes, like when writing a term paper. Tech companies say that fair use exceptions are essential in order for them to access the massive quantities of human-generated content they need to develop the most advanced AI systems. Writers, actors and many other kinds of creators have been equally clear in arguing that the use of their work to propel AI is not fair use. On Friday, a group of famous authors signed an open letter to publishers urging the companies to pledge never to replace human writers, editors and audiobook narrators with AI and to avoid using AI throughout the publishing process. The signees include Victoria Aveyard, Emily Henry, R.F. Kuang, Ali Hazelwood, Jasmine Guillory, Colleen Hoover and others. "[Our] stories were stolen from us and used to train machines that, if short-sighted capitalistic greed wins, could soon be generating the books that fill our bookstores," the letter reads. "Rather than paying writers a small percentage of the money our work makes for them, someone else will be paid for a technology built on our unpaid labor." The letter is just the latest in a series of battles between authors and AI companies. Publishers, artists and content catalog owners have filed lawsuits alleging that AI companies like OpenAI, Meta and Midjourney are infringing on their protected intellectual property in attempt to circumvent costly, but standard, licensing procedures. (Disclosure: Ziff Davis, CNET's parent company, in April filed a lawsuit against OpenAI, alleging it infringed Ziff Davis copyrights in training and operating its AI systems.) The authors suing Anthropic for copyright infringement say their books were also obtained illegally -- that is, the books were pirated. That leads to the second part of Alsup's ruling, based on his concerns about Anthropic's methods of obtaining the books. In the ruling, he writes that Anthropic co-founder Ben Mann knowingly downloaded unauthorized copies of 5 million books from LibGen and an additional 2 million from Pirate Library Mirror (PirLiMi). The ruling also outlines how Anthropic deliberately obtained print copies of the books it previously pirated in order to create "its own catalog of bibliographic metadata." Anthropic vice president Tom Turvey, the ruling says, was "tasked with obtaining 'all the books in the world' while still avoiding as much 'legal/practice/business slog.'" That meant buying physical books from publishers to create a digital database. The Anthropic team destroyed and discarded millions of used books in this process in order to prep them for machine-readable scanning, by stripping them from their bindings and cutting them down to fit. Anthropic's acquisition and digitization of the print books was fair use, the ruling says. But it adds: "Creating a permanent, general-purpose library was not itself a fair use excusing Anthropic's piracy." Alsup ordered a new trial regarding the pirated library. Anthropic is one of many AI companies facing copyright claims in court, so this week's ruling is likely to have massive ripple effects across the industry. We'll have to see how the piracy claims resolve before we know how much money Anthropic may be ordered to pay in damages. But if the scales tip to grant multiple AI companies fair use exceptions, the creative industry and the people who work in it will certainly suffer damages, too. For more, check out our guide to understanding copyright in the age of AI.

I've tried all the leading AI chatbots — here's why I keep going back to Claude

Tom's Guide

3 hours ago

Tom's Guide

I've tried all the leading AI chatbots — here's why I keep going back to Claude

I've spent considerable time working with various AI models over the past year, from ChatGPT to Gemini to smaller specialized tools. While each has its strengths, Claude has consistently become my go-to choice for most tasks. What sets Claude apart isn't just raw capability, it's how the AI approaches problems and conversations. Unlike other models that just blurt out the first thing that comes to mind, Claude actually pauses to think. Most importantly, it remembers what we talked about five minutes ago, which makes conversations feel normal instead of like I'm starting over every time. After using Claude for everything from creative projects to household troubleshooting, I've discovered it is, by far, my favorite collaborative assistant. Here are five specific reasons Claude stands out from the rest. What immediately struck me about Claude is how it approaches complex problems. Instead of rushing to give me the first answer that comes to it, it works through issues methodically. When I ask about controversial topics or present scenarios with multiple valid viewpoints, Claude considers different angles before responding. This nuanced reasoning shows up everywhere, from helping me think through business decisions to analyzing literature. It doesn't just regurgitate information either, it genuinely processes what I'm asking and thinks about the implications. That step-by-step approach has saved me from several poor decisions where a quick, surface-level answer may have led me astray. This might sound niche, but Claude's ability to build interactive content directly in our conversation is genuinely impressive. It can create fully functional quizzes, word games, trivia challenges, and even simple interactive stories that run right in the chat. I've had it build everything from personality quizzes to coding challenges that actually execute and give feedback. The games aren't just static text, they respond to my answers, keep score, and adapt based on how I'm doing. It's like having a game developer who can instantly prototype ideas and let me test them immediately. What makes this particularly useful is how it combines this interactivity with its coding abilities. Need a quiz for training materials? Claude can build it with proper logic and formatting. Want to gamify learning something new? It creates engaging interactive content that actually helps information stick. This might sound silly, but Claude genuinely engages with ridiculous hypothetical scenarios and absurd questions without making me feel like an idiot. We've gone back and forth on everything from "what would happen if gravity worked sideways" to elaborate theories about why cats are probably plotting world domination. Other AI models tend to either shut down weird conversations or give painfully serious responses to obviously playful questions. Claude rolls with it. It'll debate whether hot dogs are sandwiches with the same thoughtful approach it brings to serious topics, but with the right amount of levity. The humor isn't forced or awkward either. It feels natural, like talking to someone who actually gets the joke and wants to play along rather than just analyzing why something might be considered funny. One of Claude's most practical advantages is how it deals with large amounts of text. I regularly throw entire research papers, lengthy contracts, or massive datasets at it, and it doesn't choke or give me useless summaries that miss the point. Other AI models tend to either refuse large documents entirely or give you generic overviews that could apply to anything. Claude actually reads through everything and can answer specific questions about details buried on page 47 of a 200-page document. Even with documents that push the limits of what it can handle, Claude gives you honest feedback about its limitations rather than pretending to analyze something it can't properly process. This is hard to quantify, but conversations with Claude feel genuinely natural in a way that other AI models don't quite achieve. It picks up on subtle cues, responds to the tone of what I'm saying, and even seems to understand when I'm frustrated or excited about something. The back-and-forth feels organic. If I make a joke, Claude laughs along appropriately. If I'm working through a complex problem, it knows when to ask clarifying questions versus when to just listen. It even admits when it's confused or uncertain, which feels refreshingly honest. Most importantly, it doesn't sound like it's reading from a script. The responses feel spontaneous and thoughtful, like talking to someone who's actually engaged in the conversation rather than just generating text. Now you've learned why I keep going back to Claude, why not take a look at our other helpful AI articles? Check out 5 smart ways to use Gemini Live on your phone right now and ChatGPT has added a new image library — here's how to use it. And if you want to make AI playlists in Spotify, here's how to do it.