logo
Nvidia Bets Big on Synthetic Data

Nvidia Bets Big on Synthetic Data

WIRED19-03-2025

Mar 19, 2025 11:27 AM Nvidia has acquired synthetic data startup Gretel to bolster the AI training data used by the chip maker's customers and developers. Nvidia CEO Jensen Huang addresses participants at the keynote of CES 2025 in Las Vegas, Nevada. Photograph:Nvidia has acquired synthetic data firm Gretel for nine figures, according to two people with direct knowledge of the deal.
The acquisition price exceeds Gretel's most recent valuation of $320 million, the sources say, though the exact terms of the purchase remain unknown. Gretel and its team of approximately 80 employees will be folded into Nvidia, where its technology will be deployed as part of the chip giant's growing suite of cloud-based, generative AI services for developers.
The acquisition comes as Nvidia has been rolling out synthetic data generation tools, so that developers can train their own AI models and fine-tune them for specific apps. In theory, synthetic data could create a near-infinite supply of AI training data and help solve the data scarcity problem that has been looming over the AI industry since ChatGPT went mainstream in 2022—although experts say using synthetic data in generative AI comes with its own risks.
A spokesperson for Nvidia declined to comment.
Gretel was founded in 2019 by Alex Watson, John Myers, and Ali Golshan, who also serves as CEO. The startup offers a synthetic data platform and a suite of APIs to developers who want to build generative AI models, but don't have access to enough training data or have privacy concerns around using real people's data. Gretel doesn't build and license its own frontier AI models, but fine-tunes existing open source models to add differential privacy and safety features, then packages those together to sell them. The company raised more than $67 million in venture capital funding prior to the acquisition, according to Pitchbook.
Gretel also did not immediately respond to a request for comment from WIRED.
Unlike human-generated or real-world data, synthetic data is computer-generated and designed to mimic real-world data. Proponents say this makes the data generation required to build AI models more scalable, less labor intensive, and more accessible to smaller or less-resourced AI developers. Privacy-protection is another key selling point of synthetic data, making it an appealing option for health care providers, banks, and government agencies.
Nvidia has already been offering synthetic data tools for developers for years. In 2022 it launched Omniverse Replicator, which gives developers the ability to generate custom, physically accurate, synthetic 3D data to train neural networks. Last June, Nvidia began rolling out a family of open AI models that generate synthetic training data for developers to use in building or fine-tuning LLMs. Called Nemotron-4 340B, these mini-models can be used by developers to drum up synthetic data for their own LLMs across 'health care, finance, manufacturing, retail, and every other industry.'
During his keynote presentation at Nvidia's annual developer conference this Tuesday, Nvidia cofounder and chief executive Jensen Huang spoke about the challenges the industry faces in rapidly scaling AI in a cost-effective way.
'There are three problems that we focus on,' he said. 'One, how do you solve the data problem? How and where do you create the data necessary to train the AI? Two, what's the model architecture? And then three, what are the scaling laws?' Huang went on to describe how the company is now using synthetic data generation in its robotics platforms.
Synthetic data can be used in at least a couple different ways, says Ana-Maria Cretu, a postdoctoral researcher at the École Polytechnique Fédérale de Lausanne in Switzerland, who studies synthetic data privacy. It can take the form of tabular data, like demographic or medical data, which can solve a data scarcity issue or create a more diverse dataset.
Cretu gives an example: If a hospital wants to build an AI model to track a certain type of cancer, but is working with a small data set from 1,000 patients, synthetic data can be used to fill out the data set, eliminate biases, and anonymize data from real humans. 'This also offers some privacy protection, whenever you cannot disclose the real data to a stakeholder or software partner,' Cretu says.
But in the world of large language models, Cretu adds, synthetic data has also become something of a catchall phase for 'How can we just increase the amount of data we have for LLMs over time?'
Experts worry that, in the not-so-distant future, AI companies won't be able to gorge as freely on human-created internet data in order to train their AI models. Last year, a report from MIT's Data Provenance Initiative showed that restrictions around open web content were increasing.
Synthetic data in theory could provide an easy solution. But a July 2024 article in Nature highlighted how AI language models could 'collapse,' or degrade significantly in quality, when they're fine-tuned over and over again with data generated by other models. Put another way, if you feed the machine nothing but its own machine-generated output, it theoretically begins to eat itself, spewing out detritus as a result.
Alexandr Wang, the chief executive of Scale AI—which leans heavily on a human workforce for labeling data used to train models—shared the findings from the Nature article on X, writing, 'While many researchers today view synthetic data as an AI philosopher's stone, there is no free lunch.' Wang said later in the thread that this is why he believes firmly in a hybrid data approach.
One of Gretel's cofounders pushed back on the Nature paper, noting in a blog post that the 'extreme scenario' of repetitive training on purely synthetic data 'is not representative of real-world AI development practices.'
Gary Marcus, a cognitive scientist and researcher who loudly criticizes AI hype, said at the time that he agrees with Wang's 'diagnosis but not his prescription.' The industry will move forward, he believes, by developing new architectures for AI models, rather than focusing on the idiosyncrasies of data sets. In an email to WIRED, Marcus observed that 'systems like [OpenAI's] o1/o3 seem to be better at domains like coding and math where you can generate—and validate—tons of synthetic data. On general purpose reasoning in open-ended domains, they have been less effective."
Cretu believes the scientific theory around model collapse is sound. But she notes that most researchers and computer scientists are training on a mix of synthetic and real-world data. 'You might possibly be able to get around model collapse by having fresh data with every new round of training,' she says.
Concerns about model collapse haven't stopped the AI industry from hopping aboard the synthetic data train, even if they're doing so with caution. At a recent Morgan Stanley tech conference, Sam Altman reportedly touted OpenAI's ability to use its existing AI models to create more data. Anthropic CEO Dario Amodei has said he believes it may be possible to build 'an infinite data-generation engine,' one that would maintain its quality by injecting a small amount of new information during the training process (as Cretu has suggested).
Big Tech has also been turning to synthetic data. Meta has talked about how it trained Llama 3, its state-of-the-art large language model, using synthetic data, some of which was generated from Meta's previous model, Llama 2. Amazon's Bedrock platform lets developers use Anthropic's Claude to generate synthetic data. Microsoft's Phi-3 small language model was trained partly on synthetic data, though the company has warned that 'synthetic data generated by pre-trained large-language models can sometimes reduce accuracy and increase bias on down-stream tasks.' Google's DeepMind has been using synthetic data, too, but again, has highlighted the complexities of developing a pipeline for generating—and maintaining—truly private synthetic data.
'We know that all of the big tech companies are working on some aspect of synthetic data,' says Alex Bestall, the founder of Rightsify, a music licensing startup that also generates AI music and licenses its catalog for AI models. 'But human data is often a contractual requirement in our deals. They might want a dataset that is 60 percent human-generated, and 40 percent synthetic.'

Orange background

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Related Articles

'Applied AI' set to dominate France's Vivatech trade fair
'Applied AI' set to dominate France's Vivatech trade fair

Yahoo

timean hour ago

  • Yahoo

'Applied AI' set to dominate France's Vivatech trade fair

Drawing high-powered tech CEOs and a presidential visit, Paris's Vivatech trade fair opening on Wednesday will spotlight hoped-for economic benefits from AI. The top attraction on the opening day of this year's four-day show will be Nvidia chief executive Jensen Huang, looking to make a mark in Europe for the company that builds the most computing hardware for artificial intelligence. President Emmanuel Macron, a regular at Vivatech, will also attend the event at the southern Paris convention centre, the Elysee Palace said, with a walking tour and chats with "French Tech" startups on the agenda. Tech watchers expect more products than ever embedding AI into everyday life to be shown off in the exhibition halls. "What's changed from previous years is that we've moved from AI as science fiction to applied AI," Vivatech managing director Francois Bitouzet told AFP. He trailed around 30 sectors with concrete AI-powered products on show, from luxury to insurance, health, energy, cars, logistics and more. Around 14,000 startups and more than 3,000 investors are expected to travel to Paris from around the world, while organisers forecast total visitor numbers to at least equal last year's 165,000 people. - Nvidia headlining - Nvidia's Huang -- likely sporting his trademark leather jacket -- has top billing with an opening presentation slated to last more than an hour. Bitouzet said it was a "source of pride" to bring aboard semiconductor heavyweight Nvidia, whose high-powered GPUs (graphics processing units) are widely used to power the latest generative AI models. "It proves that the European market in general and the French market in particular are attractive and that today (Nvidia) has ambitions for this market," the Vivatech boss added. EY's European tech, media and telecoms chief Cedric Foray predicted that "there will definitely be announcements targeted at Europe" from Nvidia. The US firm has seen export restrictions slapped on its top-performing chips by both the Joe Biden and Donald Trump administrations, with US politicians leery of ceding their country's lead in generative AI. Huang has warned that China is nevertheless making swift strides to catch up. There was little sign of impact from export restrictions on Nvidia's chip sales in its May earnings release. But the company has warned the braking effect may be larger in the current quarter. - Tech sovereignty - US politics preoccupies many European tech leaders and policymakers too. Concerns range from Trump's mercurial tariff policy to the continent's ability to stand on its own without US giants -- and the massive gap in funding for AI development between the two sides of the Atlantic. "Sovereignty, which wasn't as important in the conversation just a year or two years ago, has become an absolutely strategic priority," Bitouzet said. Macron is expected to again emphasise "European technological sovereignty", the Elysee said. Such remarks from the president would build on his hyping of French and European openness to AI at a Paris global summit in February. Top French firms at Vivatech -- where around half the exhibitors are local companies -- will include Mistral AI, a French competitor to much-bigger OpenAI. Mistral's founder Arthur Mensch is set to discuss AI with Macron and Huang at a roundtable at the end of the first day of the event. mng/tgb/rmb Sign in to access your portfolio

Nvidia's CEO Is Bullish on the U.K.: Should You be Too?
Nvidia's CEO Is Bullish on the U.K.: Should You be Too?

Yahoo

time2 hours ago

  • Yahoo

Nvidia's CEO Is Bullish on the U.K.: Should You be Too?

Like many global markets, the UK economy had a volatile start to the year. After trending upward from early January, gaining around 7.4%, the FTSE 100 reversed course in early March, falling about 13% by early April. However, it has since rebounded, gaining approximately 15%, highlighting renewed investor confidence and a positive shift in the UK's economic outlook. The British economy grew more than expected in first-quarter 2025, largely driven by the services sector. Recent trade agreements with India, the United States, and the EU provide a strong tailwind for the U.K. economy, potentially supporting an upgrade to its growth outlook. With rising uncertainty over tariff policies and growing fears of an economic slowdown, investors have turned their attention away from U.S. assets, another factor driving increased attention toward alternative markets like the U.K. On Monday, Nvidia CEO Jensen Huang expressed strong admiration for U.K.'s economy, as quoted on CNBC, remarking the economy to be in a goldilocks situation. According to CNBC, Nvidia CEO pledged to ramp up investment in the UK economy's AI industry through his multitrillion-dollar semiconductor company. The U.K. has recently been promoting itself as a future global leader in AI, which can be highlighted by the optimistic comments of the Nvidia CEO. Early in the year, prime minister Keir Starmer introduced an ambitious strategy to strengthen the U.K.'s AI sector, including plans to ease regulations for new data centers and to boost the nation's computing capacity twentyfold by 2030. Huang's comments come as a major boost for the U.K., as he heaped praise on that country's thriving AI ecosystem. While speaking on a panel, as quoted on CNBC, Huang went on to say that the ability to develop AI supercomputers in the U.K. will spark greater interest from startups. Amid global uncertainty, the British economy has demonstrated notable resilience. Further igniting investor interest is British finance minister Rachel Reeves's proposed $2.7 trillion spending plan. Per projections by the IMF, as quoted on Reuters, U.K.'s growth is expected to slightly outpace the Eurozone. However, it will trail behind the United States and Canada. According to Allan Monks, JPMorgan's chief U.K. economist, as quoted on CNBC, a series of positive developments could help lift UK economic growth for the entire second quarter. Investors can increase their portfolio exposure to the U.K. with pure-play ETFs, namely, iShares MSCI United Kingdom ETF EWU, Franklin FTSE United Kingdom ETF FLGB, First Trust United Kingdom AlphaDEX Fund FKU and iShares MSCI United Kingdom Small-Cap ETF EWUS. With a one-month average trading volume of about 1.38 million shares, EWU is the most liquid option, offering investors easier entry and exit while minimizing the risk of significant price fluctuations, ideal for active trading strategies. EWU has also gathered an asset base of $3.09 billion, the largest among the other options. Regarding annual fees, FLGB is the cheapest, charging 0.09%, which makes it more suitable for long-term investing. Performance-wise, EWUS was better over the past month, gaining 8.55% and over the past three months, adding 14.62%. Want the latest recommendations from Zacks Investment Research? Today, you can download 7 Best Stocks for the Next 30 Days. Click to get this free report First Trust United Kingdom AlphaDEX ETF (FKU): ETF Research Reports iShares MSCI United Kingdom ETF (EWU): ETF Research Reports iShares MSCI United Kingdom Small-Cap ETF (EWUS): ETF Research Reports Franklin FTSE United Kingdom ETF (FLGB): ETF Research Reports This article originally published on Zacks Investment Research ( Zacks Investment Research

Nvidia's latest project may supercharge quantum computing
Nvidia's latest project may supercharge quantum computing

Miami Herald

time2 hours ago

  • Miami Herald

Nvidia's latest project may supercharge quantum computing

My first graphics card with an Nvidia (NVDA) chip was Elsa Erazor III, powered by Riva TNT 2, back in 1999. If you will, God or the universe has a special sense of humor. Since then, the company's roots in making graphics cards for gaming-or, as I suspect most people would see it, wasting time-have allowed it to morph into the backbone of artificial intelligence and scientific computing. (Of course, gaming wasn't a waste of time for me. I've learned a lot and became an "IT" guy because of it.) Gaming is still a big business for Nvidia, but it is not nearly as crucial to the company as it was in the past. In its Q1 fiscal 2026 earnings release, Nvidia reported gaming revenue of a record $3.8 billion, up 48% from the previous quarter and 42% from a year ago. However, that is less than 9% of the company's total revenue of $44.1 billion. Related: Elon Musk's DOGE made huge mistakes with veterans' programs Nvidia grew out of gaming, for the better, just like me. I doubt the company's all done growing. The company recently announced it is building a supercomputer in partnership with Dell for NERSC, a U.S. Department of Energy user facility. It also addressed key issues investors had regarding China restrictions, supply chain capabilities, and the AI regulations. Nvidia's founder and CEO Jensen Huang, is always looking for new business opportunities, and many of them are outside America. NVIDIA, announced on June 9th on its blog, that Huang joined the U.K. Prime Minister Sir Keir Starmer to open London Tech Week. "I make this prediction – because of AI, every industry in the UK will be a tech industry. The U.K. has one of the richest AI communities of anywhere on the planet, the deepest thinkers, the best universities, and the third largest AI capital investment of anywhere in the world," said Huang. "So the ability to build these AI supercomputers here in the U.K. will naturally attract more startups, it will naturally enable the rich ecosystem of researchers here to do their life's work," Related: Amazon's latest big bet may flop Huang also revealed that Nvidia will start an AI lab in the UK. The U.K. will invest approximately £1 billion in AI research compute by 2030, with investments commencing this year, wrote Nvidia on its blog. In December 2024, Hewlett-Packard Enterprise announced that it is building a liquid-cooled supercomputer at Germany's Leibniz Supercomputing Centre, LRZ, but it didn't reveal which Nvidia chips will power the "Blue Lion." More Nvidia: Analysts issue rare warning on Nvidia stock before key earningsAnalysts double price target of new AI stock backed by NvidiaNvidia CEO shares blunt message on China chip sales ban Nvidia's blog confirmed on June 10th that Vera Rubin will power the Blue Lion. The superchip combines the Rubin GPU and Vera CPU. Vera CPU is Nvidia's first custom CPU, built to work in lockstep with the GPU. Vera Rubin's launch is set for the second half of 2026. Nvidia also announced that the Jupiter supercomputer, powered by the company's Grace Hopper platform, is the fastest in Europe. Compared with the next-fastest system, it is more than two times faster for high-performance computing and AI workloads. This beast of a supercomputer is hosted by the Jülich Supercomputing Centre at the Forschungszentrum Jülich facility in Germany and is owned by the EuroHPC Joint Undertaking. Huang stated, "AI will supercharge scientific discovery and industrial innovation. In partnership with Jülich and Eviden, we're building Europe's most advanced AI supercomputer to enable the leading researchers, industries and institutions to expand human knowledge, accelerate breakthroughs and drive national advancement." Related: Popular cloud storage service might be oversharing your data Jupiter will soon be capable of running 1 quintillion double-precision floating-point operations per second, which will make it Europe's first exascale supercomputer. Its speed is essential for simulations, training, and inference of the largest AI models, climate modeling, quantum research, structural biology, computational engineering, and astrophysics. It is also very efficient, at 60 gigaflops per watt. "Jupiter will substantially advance quantum algorithm and hardware development. Hybrid quantum HPC-computation will profit from powerful tools such as the NVIDIA CUDA-Q platform and the NVIDIA cuQuantum software development kit." stated Kristel Michielsen, codirector of the Jülich Supercomputing Centre. Related: Veteran fund manager reboots Palantir stock price target The Arena Media Brands, LLC THESTREET is a registered trademark of TheStreet, Inc.

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into the world of global news and events? Download our app today from your preferred app store and start exploring.
app-storeplay-store