
What Happens When LLM's Run Out Of Useful Data?
By SAP Insights Team
Most of us feel like we're drowning in data. And yet, in the world of generative AI, a looming data shortage is keeping some researchers up at night.
A 2024 report from the nonprofit watchdog Epoch AI projected that large language models (LLMs) could run out of fresh, human-generated training data as soon as 2026. Earlier this year, the ubiquitous Elon Musk declared that 'the cumulative sum of human knowledge has been exhausted in AI training,' and that the doomsday scenario envisioned by some AI researchers 'happened basically last year.'
GenAI is unquestionably a technology whose breakthroughs in power and sophistication have generally relied on ever-larger datasets to train on. Beneath the flurry of investment, adoption, and general GenAI activity, a quiet concern has surfaced: What if the fuel driving all this progress is running low?
There is one obvious solution to a looming shortage of written content: have LLMs generate more of it.
The role of synthetic data in large language models
Synthetic data is computer-generated information that has the same statistical properties and patterns as real data but doesn't include real-world records. Amazon recently had success using this method with LLM-generated pairs of questions and answers to fine-tune a customer service model. Because the task was narrow and the outputs were easily reviewed by human beings, the additional training on synthetic data helped the model get better at responding accurately to customer inquiries, even in scenarios it hadn't seen before.
Another use case for synthetic data is for businesses using proprietary data to train bespoke LLMs—whether building them from scratch or, more commonly, layering retrieval-augmented generation (RAG) atop a commercial foundation model. In many such cases, the proprietary data involved is tightly structured, such as with historical transaction records formatted like spreadsheets with dates, locations, and dollar amounts. In contexts like these, LLM-generated synthetic data is often indistinguishable from the real thing and just as effective for training.
But in less narrowly defined training scenarios, specifically the development of those big commercial models RAG relies on, the risks of training on synthetic data are real.
The most widely cited danger has the dramatic name 'model collapse.' In a 2024 study published in Nature, researchers showed that when models are repeatedly trained on synthetic data generated by other models, they gradually lose diversity and accuracy, drifting further from the true distribution of real-world data until they can no longer produce reliably useful output.
Mohan Shekar, SAP's AI and quantum adoption lead for cloud-based ERP, likens the process to 'model incest.' With every successive iteration, a model trained on its own output will tend to reinforce biases and flaws that may at first have been barely noticeable, until those minor defects become debilitating deformities.
Long before reaching these extreme states, models trained with synthetic data have also been shown to exhibit a dullness and predictability reflecting their lack of fresh input. Such models may still have their uses, especially for mundane work and applications, but as Shekar puts it, 'If you're trying to innovate—really innovate—[a synthetic-data–trained model] won't get you there. It's just remixing what you already had.'
Some researchers, including OpenAI CEO Sam Altman, have long argued that innovation in how models are trained may soon start to matter more than what they're trained on. The next wave of breakthroughs, the thinking goes, may come from rethinking the architecture and logic of training itself and then applying those new ideas.
Yaad Oren, head of research and innovation at SAP, is confident that such a shift is underway. Recent advances in training methods already mean 'you can shrink the amount of data needed to build a robust product,' he says.
One of those recent advances is multimodal training: building models that learn not just from text but also from video, audio, and other inputs. These models can effectively multiply one dataset by another, combining different types of information to create new datasets.
Oren gives the example of voice recognition in cars during a rainstorm. For car manufacturers trying to train an LLM to understand and follow spoken natural-language instructions from a driver, rain in the background presents a hurdle. One unwieldy solution, says Oren, would be to 'record millions of hours of people talking in the rain,' he says, to familiarize the model with the soundwaves produced by a person asking for directions in a torrential downpour.
More elegant and practical, though, is to combine an existing dataset of human speech with existing datasets of 'different rain and weather sounds,' he says. The result is a model that can decipher speech across a full range of meteorological backdrops—without ever having encountered the combination firsthand.
Even more promising is the potential impact of quantum computing on model training. 'What quantum brings in,' says Shekar, 'is a way to look at all the possible options that exist within your datasets and derive patterns, connections, and possibilities that were not visible before.'
Quantum computing could even increase the total supply of usable data by accessing the vast, underutilized oceans of so-called unstructured data, says Shekar. 'Instead of needing 50 labeled images to train a model,' he says, 'you might be able to throw in 5,000 unlabeled ones and still get a more accurate result.'
That could be a very big deal indeed. AI engineers have long had the same feelings about unstructured data that physicists have about dark matter: an exquisite blend of awe, annoyance, and yearning. If quantum computing finally unlocks it, especially in tandem with multimodal learning and other innovations, today's fears of a data drought might recede.
A version of this story appears on SAP.com
Hashtags

Try Our AI Features
Explore what Daily8 AI can do for you:
Comments
No comments yet...
Related Articles
Yahoo
24 minutes ago
- Yahoo
Wayfair Poised For Q2 Sales Beat On Strong Inventory, Vendor Promotions
Wayfair (NYSE:W) is gearing up to release its second-quarter earnings before the market opens on August 4, with expectations of surpassing Street estimates for both sales and profitability. On Monday, Bank of America Securities analyst Curtis Nagle reiterated a Neutral rating on Wayfair, setting a price forecast of $60. Nagle's projection of $3.15 billion in second-quarter sales surpasses the Street's consensus of $3.12 billion. This more bullish outlook is attributed to stronger-than-expected industry trends, increased inventory availability driven by higher utilization of Wayfair's CastleGate system, and effective vendor-funded analyst's EBITDA estimate of $153 million also exceeds the Street's $146 million, fueled by expectations of higher gross profit dollars due to greater flow-through and leverage from Selling, Operations, Technology, General & Administrative expenses, particularly from a right-sizing of the company's tech headcount. Supporting these positive trends, Bank of America's aggregated credit and debit card data indicated a slight improvement in online furniture spending, which declined by 0.8% year-over-year in the second quarter, compared to a 1.6% decline in the first quarter. Nagle suggests that these improving trends could signify a pull-forward in demand and increased promotional spending, although this might potentially come at the expense of industry sales later in the year. He further noted that accelerating web and app trends suggest Wayfair is continuing to gain market share, driven by better product availability and vendor-funded promotions. Consequently, Nagle raised his second-quarter sales estimate by 1% to $3.15 billion and his EBITDA estimate by 2%. Looking ahead to the third quarter, Nagle also increased his sales estimate by 1% to $2.86 billion, which aligns closely with the Street's estimate of $2.87 billion. This adjustment reflects the better-than-expected performance of consumer spending and the broader furnishings category. Furthermore, concerns regarding tariffs appear to be easing following Vietnam's trade deal, despite an August 1 deadline. The extended Black Friday in July event also indicates a healthy supply on the site, likely as vendors increasingly leverage CastleGate. Nagle sees this event as an additional opportunity for Wayfair to drive incremental sales. However, he maintained his fourth-quarter estimates, primarily due to tougher year-over-year comparisons. While tariff concerns are abating, they remain a significant point of discussion for Wayfair. As such, topics on the upcoming earnings call are likely to revolve around the potential impact of tariffs on second-half 2025 trends and how vendors are navigating these challenges, particularly through CastleGate, vendor-funded promotions, and renegotiations. Nagle observed that the current share price already reflects the potential upside from easing tariffs and healthy supply trends. Price Action: Wayfair shares are trading lower by 1.51% to $55.59 at last check Monday. Read Next:Image via Shutterstock Latest Ratings for W Date Firm Action From To Feb 2022 Credit Suisse Maintains Outperform Feb 2022 RBC Capital Maintains Sector Perform Feb 2022 Needham Maintains Buy View More Analyst Ratings for W View the Latest Analyst Ratings Up Next: Transform your trading with Benzinga Edge's one-of-a-kind market trade ideas and tools. Click now to access unique insights that can set you ahead in today's competitive market. Get the latest stock analysis from Benzinga? WAYFAIR (W): Free Stock Analysis Report This article Wayfair Poised For Q2 Sales Beat On Strong Inventory, Vendor Promotions originally appeared on © 2025 Benzinga does not provide investment advice. All rights reserved. Sign in to access your portfolio
Yahoo
24 minutes ago
- Yahoo
This 12 Dividend Stock Portfolio Will Pay Your Bills
Most dividend stocks pay quarterly—but your bills don't. Rent, groceries, gas, surprise car repairs… life doesn't come once every three months. That's why some investors are building what's called a Weekly Paycheck Portfolio—a curated list of dividend-paying stocks staggered to deliver consistent income every week of the year. With the right mix of stocks across sectors and dividend schedules, you can build a dividend portfolio that not only delivers frequent income but grows it over time. These 12 stocks yield nearly four times the S&P 500 average and offer solid dividend growth. Here's how to build your own weekly dividend machine. Building a Dividend Portfolio that Pays the Bills I'll reveal my 12-stock portfolio as an example but the idea here is so simple and allows you to switch out your favorite dividend stocks. Most dividend stocks pay out each year on extremely consistent schedules. Dividend investors love that certainty and consistency so directors of these companies try to declare and pay those dividends on the same week every three months, some even down to the same day. That means, after putting together your list of dividend stocks, you can use a resource like the Historical Data tab on Yahoo Finance to see when each has paid dividends in the past. Once you have a list of when your favorite dividend stocks go ex-dividend, you can plan it out so you have stocks that will pay you every week of the year. Cisco Systems (CSCO) Dividend Yield: 2.4% Ex-Dividend Schedule: First week of Jan, Apr, Jul, Oct Cisco offers a modest yield—but as a tech company, it's unusually generous. The company is well-positioned in the AI-driven data center boom with solutions in switching, routing, and cybersecurity. Cisco has raised its dividend consistently and shares are up 50% in five years. EOG Resources (EOG) Dividend Yield: 3.4% Ex-Dividend Schedule: Second week of Jan, Apr, Jul, Oct A natural gas powerhouse, EOG is benefiting from increased LNG export infrastructure. Its dividend has grown at 20% annually, and analysts forecast double-digit upside in shares. That's on top of 143% share price growth over five years. AbbVie (ABBV) Dividend Yield: 3.5% Ex-Dividend Schedule: Third week of Jan, Apr, Jul, Oct This pharma giant has become a dividend investor favorite thanks to its blockbuster pipeline, including Skyrizi and Rinvoq. AbbVie's strong growth and 12% price target upside make it worth the a look. Ford Motor (F) Dividend Yield: 6.9% Ex-Dividend Schedule: Fourth week of Jan, Apr, Jul, Oct Ford is deep value right now, trading at just 0.25x sales. While earnings are forecast to dip, the F-150 remains the best-selling truck in America. Any relief in input costs or sales rebound could re-ignite the stock, and the 6.9% dividend sweetens the wait. Kinder Morgan (KMI) Dividend Yield: 4.0% Ex-Dividend Schedule: First week of February, May, August, November With 80,000 miles of oil and gas pipeline, Kinder Morgan generates steady fees independent of commodity prices. The stock offers dependable income, modest growth, and analysts see 12% upside to the shares. Duke Energy (DUK) Dividend Yield: 3.5% Ex-Dividend Schedule: Second week of Feb, May, Aug, Nov Duke provides electricity and gas to more than 9 million customers across the southeastern U.S. With rising power demand driven by data centers, the company offers stability and potential for 10–20% share price appreciation. I love talking stocks and that face-to-face community we're building on the YouTube channel. Join the Bow Tie Nation and check out all the 2025 stock picks on Let's Talk Money! Prudential Financial (PRU) Dividend Yield: 5.0% Ex-Dividend Schedule: Third week of Feb, May, Aug, Nov Prudential brings international diversification with half its earnings overseas, especially in Japan and Brazil. Analysts see a 10% upside, and its 5% dividend with 4% growth rate makes it a top pick among insurers. NextEra Energy (NEE) Dividend Yield: 3.3% Ex-Dividend Schedule: Fourth week of February, May, August, November NextEra combines the scale of a major utility with a fast-growing renewables portfolio. It's grown its dividend at a 10% annual pace, and with 28GW in clean energy backlog, future growth looks strong even if yield is middle-of-the-pack. Regions Financial (RF) Dividend Yield: 4.2% Ex-Dividend Schedule: First week of Mar, Jun, Sep, Dec This regional bank has scaled well and consistently raised its dividend by 10% annually. Regulatory easing and a higher rate environment could push shares well above their current analyst target of $24 per share. Hewlett Packard Enterprise (HPE) Dividend Yield: 2.5% Ex-Dividend Schedule: Second week of Mar, Jun, Sep, Dec HPE's merger with Juniper and strength in AI-driven server growth make it a hidden tech dividend play. While dividend growth has been slow at 1.6%, accelerating cash flows should drive both payouts and price higher. Altria Group (MO) Dividend Yield: 7.0% Ex-Dividend Schedule: Third week of Mar, Jun, Sep, Dec Despite declining cigarette volumes, Altria has grown total volume through heated tobacco and nicotine pouches. The dividend is king here, and with a 7% yield, investors are getting paid well to wait. Medtronic (MDT) Dividend Yield: 3.2% Ex-Dividend Schedule: Fourth week of Mar, Jun, Sep, Dec With a #1 or #2 position in all three of its core MedTech markets and AI-enabled devices already approved, Medtronic combines innovation and consistency. While growth has lagged recently, the stock remains a steady payer with upside potential. This 12-stock portfolio yields approximately 4.1%, nearly four times the broader market average—with an average dividend growth rate above 6% a year. It includes a mix of sectors for safety, income, and potential appreciation. That means you'll get dependable dividend checks every week of the year, from high-yield staples like Altria and Ford to steady growers like NextEra and Medtronic. It's not a get-rich-quick strategy—but it is a get-paid-every-week strategy. Disclosure: My Weekly Dividend Cash Portfolio that Pays the Bills is written by Joseph Hogue, CFA who is a former equity analyst and economist. Born and raised in Iowa, after serving in the Marine Corps, Joseph worked in corporate finance and real estate before starting a career in investment analysis. He has appeared on Bloomberg and CNBC and led a team of equity analysts for a venture capital research firm. He holds a master's degree in business and the Chartered Financial Analyst (CFA) designation. Positions in stocks mentioned: F, MO, ABBV Error in retrieving data Sign in to access your portfolio Error in retrieving data Error in retrieving data Error in retrieving data Error in retrieving data

Yahoo
24 minutes ago
- Yahoo
CoreWeave shares climb on $1.5 billion debt offering amid AI expansion
-- CoreWeave Inc (NASDAQ:CRWV) shares jumped on Monday after the AI infrastructure provider unveiled plans to raise $1.5 billion through a senior notes offering, bolstering its balance sheet to support continued growth. The stock surged 4.5% following the announcement as investors responded to the company's move to lean into long-term demand for AI compute capacity. The new 2031-dated bond comes as CoreWeave continues to navigate a stretched balance sheet, with $8 billion in total debt reported as of December 2024. The offering will be made to qualified institutional buyers under Rule 144A and to non-U.S. persons under Regulation S, and will be guaranteed on a senior unsecured basis by certain wholly owned subsidiaries. Analysts have generally remained constructive on CoreWeave's near-term outlook despite the capital structure concerns. 'From a numbers perspective, we are expecting another double-digit beat, with current consensus assuming 10% q/q growth, which feels conservative given the ongoing Microsoft (NASDAQ:MSFT) B200 ramp management spoke to last quarter,' noted Barclays analyst Raimo Lenschow. Still, the company's aggressive expansion strategy, centered on massive investment in GPU infrastructure, has come at a cost, with elevated interest burdens raising flags about cash flow resiliency in a cyclical market. CoreWeave issued $2 billion in new notes in May and follows that just two months later with this $1.5 billion issuance, moves that could signal a need to refinance rather than organic deleveraging. Since debuting in public markets in late March, CoreWeave's has proven an AI darling, initially spiking from $40 to $187 before stabilizing in the $125 to $140 range. Lenschow's updated price target of $140 reflects both expectations for sustained customer demand in AI cloud services and the limitations posed by valuation, which he described as 'full (~50x CY26E EV/EBIT).' CoreWeave's differentiated infrastructure, reportedly offering up to 35 times faster and 80% cheaper computing compared to AWS or Google (NASDAQ:GOOGL) Cloud, has positioned it as a leader in the AI acceleration space. However, the company's reliance on debt to fund its buildout raises long-term questions, particularly if hyperscaler spending slows or AI workload monetization falls short. Moody's assigned a B1 rating to CoreWeave's newly announced $1.5 billion senior unsecured notes due 2031, while maintaining its Ba3 corporate family rating and a stable outlook. Fitch similarly rated the notes at BB- with a Recovery Rating of 'RR4', noting CoreWeave's strong revenue visibility, capital discipline, and projected deleveraging, supported by a $25.9 billion backlog and robust EBITDA growth through 2026. Both agencies cited CoreWeave's relatively high leverage and customer concentration as key concerns. Nonetheless, Moody's and Fitch highlighted the stability of CoreWeave's contracted revenues, its unique competitive positioning in AI infrastructure, and the potential for leverage to decline to 3.5x or below by the end of 2026, provided the company continues its execution pace and maintains liquidity. Revenue for the second quarter is forecast at around $1.2 billion, potentially exceeding consensus estimates and supporting EBITDA momentum. 'In all, we think Q2 should provide proof points of ongoing healthy end-demand, though we still view valuation as full... and think the end of the lock-up two days after earnings limits any potential positive price action,' said Lenschow. Related articles CoreWeave shares climb on $1.5 billion debt offering amid AI expansion Victoria's Secret Exposed: The Warning Sign Behind the Stock's 52% Collapse Clients buying into summer rally, bracing for later pullback, says BofA's Hartnett Sign in to access your portfolio