logo
#

Latest news with #RichardKadrey

Big Tech's free use of copyrighted work to train AI devalues creators, crowns a new techno-elite
Big Tech's free use of copyrighted work to train AI devalues creators, crowns a new techno-elite

Time of India

time25-04-2025

  • Business
  • Time of India

Big Tech's free use of copyrighted work to train AI devalues creators, crowns a new techno-elite

Of all the lawsuits that continue to snap at Meta's heels today, the most interesting one concerns how the tech giant used millions of pirated books to train its LLAMA algorithms . #Pahalgam Terrorist Attack India pulled the plug on IWT when Pakistanis are fighting over water What makes this India-Pakistan standoff more dangerous than past ones The problem of Pakistan couldn't have come at a worse time for D-St Plaintiffs in 'Richard Kadrey et al v. Meta' have filed a motion, accusing the company of having used 'millions of books and other copyrighted works... for free and without consent from the rightsholders because it does not want to pay for them'. Even more interesting than the charge, however, is how Meta has sought to defend itself through sundry confidential internal exchanges that have been released to the public domain: it has accepted culpability but denied liability for copyright infringement, claiming that the 7 mn books used to train its LLM constituted ' fair use ' of already-compromised material. Fair use of intellectual work is not new. In ancient India, the Vedas were considered shruti (that which is heard) and apauruseya (not of any man, impersonal), because it was more important to ensure the unbroken continuation of an oral - and later written - tradition, whose authorship was less important than its preservation. It was the same elsewhere, though sometimes works such as The Iliad and The Odyssey were loosely ascribed to a poet named Homer. Play Video Pause Skip Backward Skip Forward Unmute Current Time 0:00 / Duration 0:00 Loaded : 0% 0:00 Stream Type LIVE Seek to live, currently behind live LIVE Remaining Time - 0:00 1x Playback Rate Chapters Chapters Descriptions descriptions off , selected Captions captions settings , opens captions settings dialog captions off , selected Audio Track default , selected Picture-in-Picture Fullscreen This is a modal window. Beginning of dialog window. Escape will cancel and close the window. Text Color White Black Red Green Blue Yellow Magenta Cyan Opacity Opaque Semi-Transparent Text Background Color Black White Red Green Blue Yellow Magenta Cyan Opacity Opaque Semi-Transparent Transparent Caption Area Background Color Black White Red Green Blue Yellow Magenta Cyan Opacity Transparent Semi-Transparent Opaque Font Size 50% 75% 100% 125% 150% 175% 200% 300% 400% Text Edge Style None Raised Depressed Uniform Drop shadow Font Family Proportional Sans-Serif Monospace Sans-Serif Proportional Serif Monospace Serif Casual Script Small Caps Reset restore all settings to the default values Done Close Modal Dialog End of dialog window. by Taboola by Taboola Sponsored Links Sponsored Links Promoted Links Promoted Links You May Like Villas For Sale in Dubai Might Surprise You Villas In Dubai | Search Ads View Deals This continued until only a few hundred years ago, when authorship and intellectual ownership became things of worth. Not the least of these reasons was the need to establish authenticity, ownership and factual origin. Not the most significant (or perhaps the most significant) was the fact that intellectual output attracted a monetary value, irrevocably dwarfing that associated with manual and martial labour. Since that time, works of intelligence and creativity have only gained currency. Today, when success often hinges on ideas, innovation, entrepreneurship and opinions, the protection of intellectual copyright is more critical than ever before. Live Events Two questions, therefore, arise: Why does Meta want to upend the status quo? How does Meta believe it can get away with this? The first question is easy enough to answer. It has declared that LLAMA is 'highly transformative... and the use of copyrighted materials is vital to the development of the company's open-source AI models '. And, yet, the Association of American Publishers claims that 'the systematic copying of textual works, word by word, into an LLM' - without, among others, critical commentary, search functionality and digital interoperability - cannot be considered 'transformative under fair use precedents'. The veracity of this last statement is easily established by examining the way in which Meta seeks to commodify these works. Meta is not interested in making its AI tool a ready-reference library for the sake of access and preservation. Instead, as recently uncovered written communication between researchers has shown, the use of pirated texts, especially works of modern fiction, was 'easy to parse' for LLM training. However, where things could take a bizarre turn, according to several lawyers and former employees, is when original works are used to produce new output IOD (instantly on demand), including unlicensed sequels, derivative literature, wholly fallacious background material, and even entirely new work in the style of other published work - all of which would result in transforming creative output into a cheap asset, trivialising individual authorship, and making authentic and original intellectual pursuit superfluous in the long run. Which begs an answer to the second question, to which Meta has responded: by not paying for it. After initially investigating the possibility of entering into licence fee agreements with authors and publishers, it abandoned this endeavour because of cost, time and resource considerations. Subsequently, it has not only sought refuge in diverting liability for copyright infringement to those involved in book piracy but has also invoked the power of mathematics - reluctant arbiter of truth - to show that an individual work, however large or illustrious, could never enhance an LLM's performance by more than 0.06%, 'a meaningless change no different from noise'. Thus, Meta sees no reason to pay individuals, since they have little of value to exchange with the company - a superb piece of casuistry that would have cheated Shakespeare out of royalties accruing from his 37 plays and 154 sonnets, because they are statistically insignificant. Ultimately, this may be a case of history defending its right to repeat itself. At some point, the Vedic system of open-source education based on ability gave way to a more stratified hierarchy, where merit was sacrificed to birthright because Brahmins wished to retain the prestige and wealth associated with acquired knowledge for themselves. As the caste system took hold, the jealously guarded ritual was elevated at the expense of more equitable learning. Today, Meta and others - OpenAI has even asked Donald Trump to allow intellectual theft to stop China from stealing a march on America - seek to use copyrighted material without compensation to build future go-to knowledge resources, from which they hope to capture unimaginable wealth and power. Authentic and creative work will, thus, be made redundant and worthless, to credit and aggrandise a new techno-hierarchy. Let us pray that the courts prevail, and history fails in its defence.

Meta Says It's Okay to Feed Copyrighted Books Into Its AI Model Because They Have No "Economic Value"
Meta Says It's Okay to Feed Copyrighted Books Into Its AI Model Because They Have No "Economic Value"

Yahoo

time19-04-2025

  • Business
  • Yahoo

Meta Says It's Okay to Feed Copyrighted Books Into Its AI Model Because They Have No "Economic Value"

Meta has been accused of illegally using copyrighted material to train its AI models — and the tech giant's defense is pretty thin. In the ongoing suit Richard Kadrey et al v. Meta Platforms, led by a group of authors including Pulitzer Prize winner Andrew Sean Greer and National Book Award winner Ta-Nehisi Coates, the Mark Zuckerberg-led company has argued that its alleged scraping over seven million books from the pirated library LibGen constituted "fair use" of the material, and was therefore not illegal. The specious defenses don't end there. As Vanity Fair spotlights in a new writeup, Meta's attorneys are also arguing that the countless books that the company used to train its multibillion-dollar language models and springboard itself into the headspinningly buzzy AI race are actually worthless. Meta cited an expert witness who downplayed the books' individual importance, averring that a single book adjusted its LLM's performance "by less than 0.06 percent on industry standard benchmarks, a meaningless change no different from noise." Thus there's no market in paying authors to use their copyrighted works, Meta says, because "for there to be a market, there must be something of value to exchange," as quoted by Vanity Fair — "but none of [the authors'] works has economic value, individually, as training data." Other communications showed that Meta employees stripped the copyright pages from the downloaded books. This is emblematic of the chicaneries and two-faced logic that Meta, and the AI industry at large, deploys when it's pressed about all the human-created content it devours. Somehow, that stuff is simultaneously not that valuable, and we should all stop pearl-clutching about the sanctity of art, and anyway an AI writes creative prose just as well as a human now — but is also absolutely essential to building our new synthetic gods that will solve climate change, so please don't make us pay for using any of it. That last bit is literally what OpenAI argued to the British Parliament last year — that there isn't enough stuff in the public domain to beef up its AI models, so it must be allowed to plumb the bounties of contemporary copyrighted works without paying a penny. Seemingly, this is an unspoken understanding at the top AI companies. When one Meta researcher inquired if the company's legal team had okayed using LibGen, another responded: "I didn't ask questions but this is what OpenAI does with GPT3, what Google does with PALM, and what Deepmind does with Chinchilla so we will do it to[o]," per Vanity Fair, from internal messages cited in the suit. Tellingly, the unofficial policy seems to be to not speak about it at all. "In no case would we disclose publicly that we had trained on LibGen, however there is practical risk external parties could deduce our use of this dataset," an internal Meta slide deck read. The deck noted that "if there is media coverage suggesting we have used a dataset we know to be pirated, such as LibGen, this may undermine our negotiating position with regulators on these issues." More on AI copyright: OpenAI Says It's "Over" If It Can't Steal All Your Copyrighted Work

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into the world of global news and events? Download our app today from your preferred app store and start exploring.
app-storeplay-store