a day ago
Meta's Llama 3.1 AI model reproduces copyrighted book text, raising legal alarms
Meta's latest AI model, Llama 3.1, is under scrutiny after a new study revealed it can replicate large portions of copyrighted books—including Harry Potter and the Sorcerer's Stone—with surprising accuracy. Conducted by researchers from Stanford, Cornell, and West Virginia University, the study found that Llama 3.1 has memorized around 42% of the first Harry Potter book and can reproduce 50-word sections correctly nearly half the time.
Among five major AI models analyzed for how they processed the Books3 dataset, Llama 3.1—Meta's 70-billion parameter model released in July 2024—was the most prone to output copyrighted content. By contrast, Llama 1 65B, released in February 2023, had memorized only 4.4% of the same book, highlighting a significant increase in verbatim retention over time.
The model was also found to reproduce exact excerpts from other iconic works such as The Hobbit and 1984. Experts suspect this could be due to repeated exposure to the same texts during training, possibly sourced from fan sites, academic analyses, or online reviews. Adjustments to Meta's training strategy may have unintentionally worsened the memorization problem.
These findings come amid growing legal pressure on AI developers. The New York Times has already filed a lawsuit against OpenAI and Microsoft, accusing them of copyright infringement by training models like ChatGPT on proprietary articles. The lawsuit claims OpenAI's models can not only reproduce content verbatim but also mimic The Times' unique style.
For Meta, these revelations may trigger similar legal risks as calls for transparency and ethical AI development intensify across the industry.