Google is using YouTube videos to train its AI video generator

CNBC12 hours ago

Google is using its expansive library of YouTube videos to train its artificial intelligence models, including Gemini and the Veo 3 video and audio generator, CNBC has learned.
The tech company is turning to its catalog of 20 billion YouTube videos to train these new-age AI tools, according to a person who was not authorized to speak publicly about the matter. Google confirmed to CNBC that it relies on its vault of YouTube videos to train its AI models, but the company said it only uses a subset of its videos for the training and that it honors specific agreements with creators and media companies.
"We've always used YouTube content to make our products better, and this hasn't changed with the advent of AI," said a YouTube spokesperson in a statement. "We also recognize the need for guardrails, which is why we've invested in robust protections that allow creators to protect their image and likeness in the AI era — something we're committed to continuing."
Such use of YouTube videos has the potential to lead to an intellectual property crisis for creators and media companies, experts said.
While YouTube says it has shared this information previously, experts who spoke with CNBC said it's not widely understood by creators and media organizations that Google is training its AI models using its video library.
YouTube didn't say how many of the 20 billion videos on its platform or which ones are used for AI training. But given the platform's scale, training on just 1% of the catalog would amount to 2.3 billion minutes of content, which experts say is more than 40 times the training data used by competing AI models.
The company shared in a blog post published in September that YouTube content could be used to "improve the product experience … including through machine learning and AI applications." Users who have uploaded content to the service have no way of opting out of letting Google train on their videos.
"It's plausible that they're taking data from a lot of creators that have spent a lot of time and energy and their own thought to put into these videos," said Luke Arrigoni, CEO of Loti, a company that works to protect digital identity for creators. "It's helping the Veo 3 model make a synthetic version, a poor facsimile, of these creators. That's not necessarily fair to them."
CNBC spoke with multiple leading creators and IP professionals, none were aware or had been informed by YouTube that their content could be used to train Google's AI models.
The revelation that YouTube is training on its users' videos is noteworthy after Google in May announced Veo 3, one of the most advanced AI video generators on the market. In its unveiling, Google showcased cinematic-level video sequences, including a scene of an old man on a boat and another showing Pixar-like animals talking with one another. The entirety of the scenes, both the visual and the audio, were entirely AI generated.
According to YouTube, an average of 20 million videos are uploaded to the platform each day by independent creators by nearly every major media company. Many creators say they are now concerned they may be unknowingly helping to train a system that could eventually compete with or replace them.
"It doesn't hurt their competitive advantage at all to tell people what kind of videos they train on and how many they trained on," Arrigoni said. "The only thing that it would really impact would be their relationship to creators."
Even if Veo 3's final output does not directly replicate existing work, the generated content fuels commercial tools that could compete with the creators who made the training data possible, all without credit, consent or compensation, experts said.
When uploading a video to the platform, the user is agreeing that YouTube has a broad license to the content.
"By providing Content to the Service, you grant to YouTube a worldwide, non-exclusive, royalty-free, sublicensable and transferable license to use that Content," the terms of service read.
"We've seen a growing number of creators discover fake versions of themselves circulating across platforms — new tools like Veo 3 are only going to accelerate the trend," said Dan Neely, CEO of Vermillio, which helps individuals protect their likeness from being misused and also facilitates secure licensing of authorized content.
Neely's company has challenged AI platforms for generating content that allegedly infringes on its clients' intellectual property, both individual and corporate. Neely says that although YouTube has the right to use this content, many of the content creators who post on the platform are unaware that their videos are being used to train video-generating AI software.
Vermillio uses a proprietary tool called Trace ID to asses whether an AI-generated video has significant overlap with a human-created video. Trace ID assigns scores on a scale of zero to 100. Any score over 10 for a video with audio is considered meaningful, Neely said.
In one example cited by Neely, a video from YouTube creator Brodie Moss closely matched content generated by Veo 3. Trace ID attributed a score of 71 to the original video with the audio alone scoring over 90.
Some creators told CNBC they welcome the opportunity to use Veo 3, even if it may have been trained on their content.
"I try to treat it as friendly competition more so than these are adversaries," said Sam Beres, a creator with 10 million subscribers on YouTube. "I'm trying to do things positively because it is the inevitable —but it's kind of an exciting inevitable."
Google includes an indemnification clause for its generative AI products, including Veo, which means that if a user faces a copyright challenge over AI-generated content, Google will take on legal responsibility and cover the associated costs.
YouTube announced a partnership with Creative Artists Agency in December to develop access for top talent to identify and manage AI-generated content that features their likeness. YouTube also has a tool for creators to request a video to be taken down if they believe it abuses their likeness.
However, Arrigoni said that the tool hasn't been reliable for his clients.
YouTube also allows creators to opt out of third party training from select AI companies including Amazon, Apple and Nvidia, but users are not able to stop Google from training for its own models.
The Walt Disney Company and Universal filed a joint lawsuit last Wednesday against the AI image generator Midjourney, alleging copyright infringement, the first lawsuit of its kind out of Hollywood.
"The people who are losing are the artists and the creators and the teenagers whose lives are upended," said Sen. Josh Hawley, R-Mo., in May at a Senate hearing about the use of AI to replicate the likeness of humans. "We've got to give individuals powerful enforceable rights and their images in their property in their lives back again or this is just never going to stop."

Hashtags

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Google (GOOGL) Is Training AI on YouTube Videos, and Creators Didn't Even Know

Business Insider

an hour ago

Business Insider

Google (GOOGL) Is Training AI on YouTube Videos, and Creators Didn't Even Know

Tech giant Google (GOOGL) is using part of its huge library of YouTube videos (around 20 billion in total) to train AI models like Gemini and Veo 3, according to CNBC. Although the company says it only uses a portion of the videos and follows agreements with creators and media companies, this still means that billions of minutes of content are used for training. Unsurprisingly, YouTube says that it has always used content to improve its products and now has protections to help creators control how their image is used in the age of AI. However, creators can't stop Google from using their videos for its own AI models, and many weren't aware this was happening. Confident Investing Starts Here: As a result, some experts and creators are worried. Indeed, tools like Trace ID from a company called Vermillio, which is used to detect overlaps between AI-generated videos and original ones, have found that Veo 3 has created videos very similar to existing YouTube content. One example showed a Veo 3 video closely matched a video from creator Brodie Moss, with a score of 71 for the video and over 90 for just the audio. While some creators welcome the competition, others feel their work is being used unfairly, without credit, consent, or payment. This news comes at a time when the entertainment world is pushing back, as Disney (DIS) and Universal (CMCSA) recently filed a lawsuit against AI company Midjourney for copyright issues. Google, meanwhile, says it will take legal responsibility if users face copyright complaints over content created with Veo 3. YouTube has also partnered with the Creative Artists Agency to help top talent manage how their image is used in AI. But some say YouTube's tools aren't reliable. In fact, U.S. lawmakers, like Senator Josh Hawley, argue that stronger rights are needed to protect people's images and creations as AI advances. Is Google Stock a Good Buy? Turning to Wall Street, analysts have a Strong Buy consensus rating on GOOGL stock based on 29 Buys and nine Holds assigned in the past three months. Furthermore, the average GOOGL price target of $199.11 per share implies 14.88% upside potential from current levels.

ChatGPT use linked to cognitive decline: MIT research

The Hill

an hour ago

The Hill

ChatGPT use linked to cognitive decline: MIT research

ChatGPT can harm an individual's critical thinking over time, a new study suggests. Researchers at MIT's Media Lab asked subjects to write several SAT essays and separated subjects into three groups — using OpenAI's ChatGPT, using Google's search engine and using nothing, which they called the 'brain‑only' group. Each subject's brain was monitored through electroencephalography (EEG), which measured the writer's brain activity through multiple regions in the brain. They discovered that subjects who used ChatGPT over a few months had the lowest brain engagement and 'consistently underperformed at neural, linguistic, and behavioral levels,' according to the study. The study found that the ChatGPT group initially used the large language model, or LLM, to ask structural questions for their essay, but near the end of the study, they were more likely to copy and paste their essay. Those who used Google's search engine were found to have moderate brain engagement, but the 'brain-only' group showed the 'strongest, wide-ranging networks.' The findings suggest that using LLMs can harm a user's cognitive function over time, especially in younger users. It comes as educators continue to navigate teaching when AI is increasingly accessible for cheating. 'What really motivated me to put it out now before waiting for a full peer review is that I am afraid in 6-8 months, there will be some policymaker who decides, 'let's do GPT kindergarten.' I think that would be absolutely bad and detrimental,' the study's main author Nataliya Kosmyna told TIME. 'Developing brains are at the highest risk.' However, using AI in education doesn't appear to be slowing down. In April, President Trump signed an executive order that aims to incorporate AI into U.S. classrooms. 'The basic idea of this executive order is to ensure that we properly train the workforce of the future by ensuring that school children, young Americans, are adequately trained in AI tools, so that they can be competitive in the economy years from now into the future, as AI becomes a bigger and bigger deal,' Will Scharf, White House staff secretary, said at the time.

Study Shows LLM Conversion Rate Is 9x Better — AEO Is Coming

Forbes

2 hours ago

Forbes

Study Shows LLM Conversion Rate Is 9x Better — AEO Is Coming

Bing, OpenAI, Microsoft and Google logos displayed on a phone screen and a laptop keyboard are seen ... More in this multiple exposure illustration photo taken in Krakow, Poland on February 8, 2023. (Photo by Jakub Porzycki/NurPhoto via Getty Images) Some predict that by 2028, more people will discover products and information through large language models (LLMs) like ChatGPT and Gemini than through traditional search engines. But based on research I conducted with Cornell Master's students, that shift is happening much faster. LLM-driven traffic is already starting to outperform traditional search — not in volume, but in value. Traffic from LLMs converts at nearly 9x higher rates than traditional search. This is the biggest disruption to search since the dawn of the internet. If you're a brand or publisher, now is the time to adapt your SEO playbook. Oh, there is no 'S' — it's now called Answer Engine Optimization (AEO) Back in January, I predicted that traditional search was on its way out. Just six months later, the shift is already visible. In my UX research, I classify shoppers into three categories: It's easy to see how all these needs can now be met through a conversation with LLMs like ChatGPT, Claude, Gemini, or Perplexity. Say you're looking for an isotonic drink powder. Instead of scanning blogs, watching videos, or scrolling endlessly, you now ask ChatGPT — and it responds with direct recommendations: Ask about ketogenic-friendly options, and it will go even further — offering details on ingredients, comparisons, and alternatives. Staff Sergeant Alex Mackinnon from the Royal Electrical and Mechanical Engineers holds a sachet of ... More isotonic drink, Tuesday September 20, 2005, at Bramley Training Area near Basingstoke, where the Army announced it will be including the sports drink in its ration packs. The powdered drink will be incorporated in 24-hour ration packs after the its producer, GlaxoSmithKline, won the three-year contract in a tendering process. See PA Story DEFENCE Drink. PRESS ASSOCIATION Photo. Photo credit should read: Chris Ison/PA (Photo by Chris Ison - PA Images/PA Images via Getty Images) This isn't search — it's advice. And when users follow those links or act on suggestions, they convert at dramatically higher rates compared to normal search traffic. In my studies, LLM-generated traffic behaves more like a personal recommendation than a keyword query. But here's the catch: if your brand isn't listed, you're invisible. The customer won't even consider you. Good numbers are hard to come by. LLM traffic, like what comes from ChatGPT, doesn't always leave a clean trail — users might just copy and paste a product name and head to Amazon or another site. To get better data, we created a ChatGPT-style experience inside the site search of several e-commerce stores. In A/B tests, we compared regular keyword search with an AI-guided, conversational search experience. The difference was stunning: almost 9x higher conversion. Yes, nine times. But it's not just conversion that's changing — the way people search is evolving, too. In the past, users typed one or two words like 'camera.' Now, when they're shown more natural and detailed responses, they respond in kind. We're seeing queries like: 'What's a compact camera for wildlife photography that fits in a carry-on?' Semrush backs this up with broader data: In our interviews, shoppers said they felt more 'understood' and 'better about their purchase.' It didn't feel like a search engine. It felt like getting advice from a knowledgeable friend. If you scale that behavior to external LLM traffic — not just on-site — the value of that traffic already rivals what you get from SEO. For brands, this means it's time to rethink how you show up in these conversations. That's what AEO — Answer Engine Optimization — is all about. Brands need to act. If you're not being cited by LLMs, you're becoming increasingly invisible. To get picked up by an LLM, you need to understand how these models learn from content. Masking in ML Training LLMs are pattern-completion engines. I often use the example of 'Life is like a box of ___' in my online certificate from Cornell. Correct. The answer is Chocolate. Machines learn the right answer through trial and error. This approach is called masking. To show up in an LLM's response, your content needs to become part of its masked training data. LLMs look for authoritative, helpful, and authentic content. Since they predict the next word in a conversation with a user, they favor content written in a conversational or Q&A format. For brands a new playbook is emerging AEO. I outlined all what brands need to know. AEO is just the beginning. Two even bigger shifts are on the horizon — and both will deeply impact how brands show up in the age of AI: Paid Ads in LLMs and Model Context Protocol and agents that act on behalf of the LLM. The future is already underway. Ping me on LinkedIn if you want to continue the conversation.