logo
Nvidia Bets Big on Synthetic Data

Nvidia Bets Big on Synthetic Data

WIRED19-03-2025

Mar 19, 2025 11:27 AM Nvidia has acquired synthetic data startup Gretel to bolster the AI training data used by the chip maker's customers and developers. Nvidia CEO Jensen Huang addresses participants at the keynote of CES 2025 in Las Vegas, Nevada. Photograph:Nvidia has acquired synthetic data firm Gretel for nine figures, according to two people with direct knowledge of the deal.
The acquisition price exceeds Gretel's most recent valuation of $320 million, the sources say, though the exact terms of the purchase remain unknown. Gretel and its team of approximately 80 employees will be folded into Nvidia, where its technology will be deployed as part of the chip giant's growing suite of cloud-based, generative AI services for developers.
The acquisition comes as Nvidia has been rolling out synthetic data generation tools, so that developers can train their own AI models and fine-tune them for specific apps. In theory, synthetic data could create a near-infinite supply of AI training data and help solve the data scarcity problem that has been looming over the AI industry since ChatGPT went mainstream in 2022—although experts say using synthetic data in generative AI comes with its own risks.
A spokesperson for Nvidia declined to comment.
Gretel was founded in 2019 by Alex Watson, John Myers, and Ali Golshan, who also serves as CEO. The startup offers a synthetic data platform and a suite of APIs to developers who want to build generative AI models, but don't have access to enough training data or have privacy concerns around using real people's data. Gretel doesn't build and license its own frontier AI models, but fine-tunes existing open source models to add differential privacy and safety features, then packages those together to sell them. The company raised more than $67 million in venture capital funding prior to the acquisition, according to Pitchbook.
Gretel also did not immediately respond to a request for comment from WIRED.
Unlike human-generated or real-world data, synthetic data is computer-generated and designed to mimic real-world data. Proponents say this makes the data generation required to build AI models more scalable, less labor intensive, and more accessible to smaller or less-resourced AI developers. Privacy-protection is another key selling point of synthetic data, making it an appealing option for health care providers, banks, and government agencies.
Nvidia has already been offering synthetic data tools for developers for years. In 2022 it launched Omniverse Replicator, which gives developers the ability to generate custom, physically accurate, synthetic 3D data to train neural networks. Last June, Nvidia began rolling out a family of open AI models that generate synthetic training data for developers to use in building or fine-tuning LLMs. Called Nemotron-4 340B, these mini-models can be used by developers to drum up synthetic data for their own LLMs across 'health care, finance, manufacturing, retail, and every other industry.'
During his keynote presentation at Nvidia's annual developer conference this Tuesday, Nvidia cofounder and chief executive Jensen Huang spoke about the challenges the industry faces in rapidly scaling AI in a cost-effective way.
'There are three problems that we focus on,' he said. 'One, how do you solve the data problem? How and where do you create the data necessary to train the AI? Two, what's the model architecture? And then three, what are the scaling laws?' Huang went on to describe how the company is now using synthetic data generation in its robotics platforms.
Synthetic data can be used in at least a couple different ways, says Ana-Maria Cretu, a postdoctoral researcher at the École Polytechnique Fédérale de Lausanne in Switzerland, who studies synthetic data privacy. It can take the form of tabular data, like demographic or medical data, which can solve a data scarcity issue or create a more diverse dataset.
Cretu gives an example: If a hospital wants to build an AI model to track a certain type of cancer, but is working with a small data set from 1,000 patients, synthetic data can be used to fill out the data set, eliminate biases, and anonymize data from real humans. 'This also offers some privacy protection, whenever you cannot disclose the real data to a stakeholder or software partner,' Cretu says.
But in the world of large language models, Cretu adds, synthetic data has also become something of a catchall phase for 'How can we just increase the amount of data we have for LLMs over time?'
Experts worry that, in the not-so-distant future, AI companies won't be able to gorge as freely on human-created internet data in order to train their AI models. Last year, a report from MIT's Data Provenance Initiative showed that restrictions around open web content were increasing.
Synthetic data in theory could provide an easy solution. But a July 2024 article in Nature highlighted how AI language models could 'collapse,' or degrade significantly in quality, when they're fine-tuned over and over again with data generated by other models. Put another way, if you feed the machine nothing but its own machine-generated output, it theoretically begins to eat itself, spewing out detritus as a result.
Alexandr Wang, the chief executive of Scale AI—which leans heavily on a human workforce for labeling data used to train models—shared the findings from the Nature article on X, writing, 'While many researchers today view synthetic data as an AI philosopher's stone, there is no free lunch.' Wang said later in the thread that this is why he believes firmly in a hybrid data approach.
One of Gretel's cofounders pushed back on the Nature paper, noting in a blog post that the 'extreme scenario' of repetitive training on purely synthetic data 'is not representative of real-world AI development practices.'
Gary Marcus, a cognitive scientist and researcher who loudly criticizes AI hype, said at the time that he agrees with Wang's 'diagnosis but not his prescription.' The industry will move forward, he believes, by developing new architectures for AI models, rather than focusing on the idiosyncrasies of data sets. In an email to WIRED, Marcus observed that 'systems like [OpenAI's] o1/o3 seem to be better at domains like coding and math where you can generate—and validate—tons of synthetic data. On general purpose reasoning in open-ended domains, they have been less effective."
Cretu believes the scientific theory around model collapse is sound. But she notes that most researchers and computer scientists are training on a mix of synthetic and real-world data. 'You might possibly be able to get around model collapse by having fresh data with every new round of training,' she says.
Concerns about model collapse haven't stopped the AI industry from hopping aboard the synthetic data train, even if they're doing so with caution. At a recent Morgan Stanley tech conference, Sam Altman reportedly touted OpenAI's ability to use its existing AI models to create more data. Anthropic CEO Dario Amodei has said he believes it may be possible to build 'an infinite data-generation engine,' one that would maintain its quality by injecting a small amount of new information during the training process (as Cretu has suggested).
Big Tech has also been turning to synthetic data. Meta has talked about how it trained Llama 3, its state-of-the-art large language model, using synthetic data, some of which was generated from Meta's previous model, Llama 2. Amazon's Bedrock platform lets developers use Anthropic's Claude to generate synthetic data. Microsoft's Phi-3 small language model was trained partly on synthetic data, though the company has warned that 'synthetic data generated by pre-trained large-language models can sometimes reduce accuracy and increase bias on down-stream tasks.' Google's DeepMind has been using synthetic data, too, but again, has highlighted the complexities of developing a pipeline for generating—and maintaining—truly private synthetic data.
'We know that all of the big tech companies are working on some aspect of synthetic data,' says Alex Bestall, the founder of Rightsify, a music licensing startup that also generates AI music and licenses its catalog for AI models. 'But human data is often a contractual requirement in our deals. They might want a dataset that is 60 percent human-generated, and 40 percent synthetic.'

Orange background

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Related Articles

Terry Savage: AI used to guide seniors to Medicare programs
Terry Savage: AI used to guide seniors to Medicare programs

Chicago Tribune

time26 minutes ago

  • Chicago Tribune

Terry Savage: AI used to guide seniors to Medicare programs

Does the concept of artificial intelligence intimidate you? Or do you figure it won't have much impact on your life, so why bother learning about it? Well, AI is definitely entering — and improving — your life, whether you choose it or not. When I first wrote about ChatGPT several years ago, AI was viewed as a powerful tool to collect information from huge databases and sort it out to provide answers to questions. Since then, AI has quickly morphed into a useful tool for business and individuals, creating accurate and life-like interactions that make outcomes easier. For example, the new Social Security commissioner, a former tech payments CEO, has announced that Social Security will soon be using AI in its call centers. If the idea of talking to a 'robot' sends chills down your spine, think again. In this column, I'll show you a company that is already using AI in its call center — and generating responses that truly make you think you're talking to a helpful person. A reality check It's a generational thing. When I want help after calling a toll-free number for product information or credit card adjustments or insurance issues, I want to talk to an intelligent human being. I guess there aren't enough to go around! One of my pet peeves is being transferred to a voice messaging system that tries to 'help' me decide how to get answers to a simple question. They offer five choices, none of which is helpful. Representative, please! The only thing worse than a voice-activated decision tree is getting transferred to a live person who just happens to live in a foreign country and who is obviously responding off a script. If I ask to talk to a supervisor, I'm told there is no supervisor available! Don't these companies care about their customers? (Insert your own swear word here!) Artificial intelligence that's real So I must say I was absolutely shocked to hear a demonstration of AI being used by eHealth to start the process of guiding seniors to the appropriate choices for Medicare programs. For many years, eHealth has been a popular health insurance marketplace that helps people find the right insurance coverage by comparison-shopping plans from more than 180 insurers for coverage ranging from Medicare Advantage and supplemental plans to individual and family health policies, along with other benefits such as dental and vision. Many people access eHealth through its website, Others use their toll-free number 1-800-EHEALTH (1-800-343-2584) to reach their licensed and helpful insurance agents. Getting to the agent licensed in your state of residence, and knowledgeable about your specific product request, could take a lot of time during busy days around Medicare enrollment. And what about calls that come in late at night? That's why eHealth created 'Alice' — an AI 'agent' who does not actually sell insurance policies but who asks relevant questions to direct you to the correct licensed agent. You'd swear that you are talking to a live person, since 'her' responses are not only appropriate but friendly. Listening to the demo that eHealth sent me, I was absolutely blown away. I knew that you'd want to hear the same thing, so if you are reading this column online at my website you can click on this link in the article. In this case, an audio demo is worth a thousand words! Even if you're not shopping for health insurance, I recommend listening to this short clip of an interaction between someone calling the toll-free line late at night and the AI agent, Alice. This company has taken AI to the next level. Suddenly, you'll understand how much more helpful an AI agent can be than a call center in the middle of nowhere! And, on a personal level, you'll see how AI has so much potential to change our lives for the better (yes, or for the worse). I spoke with Ketan Babaria, chief digital and AI officer of eHealth. He notes that while AI is not (yet) selling policies, it is making a big difference in their processes: 'Our new AI agents are trained to be patient, caring and sympathetic. As a result, we are making it easier and faster for people to start the shopping process for a Medicare plan, enabling them to more quickly connect with a licensed agent who can help them comparison shop for the right health coverage.' So the next time you hear that your call will be answered by AI, don't hang up in fear, hoping that the next time you'll get a 'real person.' Odds are that soon you'll be connecting with many AI agents. And the odds are even better that you'll get the correct answer from a compassionate robot than you'd get from the overworked and underinformed call center worker. That's The Savage Truth.

OpenAI Breaks $10 Billion ARR Mark
OpenAI Breaks $10 Billion ARR Mark

Yahoo

time2 hours ago

  • Yahoo

OpenAI Breaks $10 Billion ARR Mark

OpenAI hit $10 billion in annual recurring revenue, fueled by ChatGPT subscriptions and API sales, as the company eyes a $125 billion revenue target by 2029. Warning! GuruFocus has detected 10 Warning Signs with SFTBY. The generative AI pioneer, backed by Microsoft (NASDAQ:MSFT), crossed the $10 billion ARR mark this yearup from roughly $6 billion a year agodriven by more than 3 million paid ChatGPT subscribers (up from 2 million in February) and robust usage of its developer API. Last week's $40 billion funding round led by SoftBank (SFTBY) valued OpenAI at $300 billion, underscoring investor confidence in its growth trajectory. CEO Sam Altman noted that ChatGPT's user base doubled to over 800 million people in just a few weeks, highlighting the speed of adoption. OpenAI's spokesman told CNBC that API revenuespanning enterprise integrations in finance, healthcare and retailnow accounts for roughly one-third of total ARR, underscoring the platform's broad commercial appeal. With enterprise clients building custom AI agents and startups embedding GPTs into new applications, OpenAI expects annual revenue to swell to $125 billion by 2029, projecting an average growth rate north of 70% per year. Why It Matters: Breaching $10 billion in ARR cements OpenAI's role as a powerhouse in the AI economy and validates its subscription-plus-API business model for generating predictable, recurring cash flow. Investors should care because sustained ARR growth and lofty 2029 targets set the stage for valuation upside, even as competition heats up from Google and Anthropic. This article first appeared on GuruFocus. Error in retrieving data Sign in to access your portfolio Error in retrieving data Error in retrieving data Error in retrieving data Error in retrieving data

PsiQuantum Eyes $750M Raise As BlackRock Doubles Down On Light-Based Supercomputers Amid Nvidia's Quantum Reversal
PsiQuantum Eyes $750M Raise As BlackRock Doubles Down On Light-Based Supercomputers Amid Nvidia's Quantum Reversal

Yahoo

time2 hours ago

  • Yahoo

PsiQuantum Eyes $750M Raise As BlackRock Doubles Down On Light-Based Supercomputers Amid Nvidia's Quantum Reversal

Quantum computing startup PsiQuantum is seeking to raise at least $750 million in a funding round led by BlackRock (NYSE:BLK), aiming to double its valuation to $6 billion, San Francisco Business Times reports. The Palo Alto-based company is developing a fault-tolerant quantum computer using photonic qubits, leveraging traditional semiconductor manufacturing techniques. According to PsiQuantum's website, the company's approach involves using photons as qubits, allowing for scalability through existing chip fabrication processes. Don't Miss: Maker of the $60,000 foldable home has 3 factory buildings, 600+ houses built, and big plans to solve housing — Maximize saving for your retirement and cut down on taxes: . PsiQuantum is collaborating with the governments of Australia and the U.S. to build quantum computing facilities in Brisbane and Chicago, respectively, Reuters reports. According to the Australian Financial Review, the company has secured $617 million in funding in Australia from federal and Queensland governments to construct a commercial quantum computer. In the U.S., PsiQuantum plans to build a 300,000-square-foot facility in Chicago's Illinois Quantum and Microelectronics Park, supported by more than $500 million in anticipated public funding, The Quantum Insider says. The initiatives underscore PsiQuantum's commitment to advancing quantum computing infrastructure globally. PsiQuantum manufactures its quantum chips at GlobalFoundries' facility in New York, utilizing photonic technology developed for fiber-optic communications, Reuters reports. Trending: Wall Street's Missing This AI Surgical Tech — You Don't Have To. According to The Quantum Insider, the company has introduced the Omega quantum photonic chipset, designed for utility-scale quantum computing, featuring high-performance photonic components. Additionally, PsiQuantum has developed high-speed optical switches and improved photon detectors to enhance quantum circuit performance. To boost efficiency, PsiQuantum implemented a compilation technique known as active volume compilation, aiming to reduce application run times by approximately 50-fold, The Quantum Insider says. BlackRock, which led PsiQuantum's previous $450 million round, is returning as lead investor, signaling sustained institutional confidence. According to San Francisco Business Times, other major backers include Microsoft's (NASDAQ:MSFT) venture arm M12, Atomico, Founders Fund, Playground Global, Temasek Holdings, and FPV Ventures. According to The Information, Nvidia (NASDAQ:NVDA) is in advanced discussions to invest in PsiQuantum, signaling a strategic shift in its quantum computing approach. This potential investment follows Nvidia CEO Jensen Huang's announcement of a new quantum computing research lab in Boston, in collaboration with Harvard and MIT, Reuters January, Huang rocked the sector by suggesting quantum computers were at least 20 years from viability, sending quantum-related stocks tumbling. But just weeks later, during Nvidia's 'Quantum Day' at its GTC conference in San Jose, California, Huang reversed course, admitting his earlier comments were off the mark, according to CNBC. That shift helped reignite interest in companies like PsiQuantum, which now stand to benefit from the renewed optimism and visibility. While competitors such as Google, IBM (NYSE:IBM), Intel (NASDAQ:INTC), Honeywell (NASDAQ:HON), and Rigetti Computing (NASDAQ:RGTI) continue to push their own quantum architectures, PsiQuantum's photonic approach is seen by many as a promising alternative, San Francisco Business Times says. PsiQuantum's funding round reflects growing investor confidence in the company's photonic quantum computing technology. The company's innovative approach may position it as a key player in the race to develop practical quantum computing solutions. Read Next: Here's what Americans think you need to be considered Next: Transform your trading with Benzinga Edge's one-of-a-kind market trade ideas and tools. Click now to access unique insights that can set you ahead in today's competitive market. Get the latest stock analysis from Benzinga? APPLE (AAPL): Free Stock Analysis Report TESLA (TSLA): Free Stock Analysis Report This article PsiQuantum Eyes $750M Raise As BlackRock Doubles Down On Light-Based Supercomputers Amid Nvidia's Quantum Reversal originally appeared on © 2025 Benzinga does not provide investment advice. All rights reserved. Error in retrieving data Sign in to access your portfolio Error in retrieving data Error in retrieving data Error in retrieving data Error in retrieving data

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into the world of global news and events? Download our app today from your preferred app store and start exploring.
app-storeplay-store