logo
DeepSeek claims its 'reasoning' model beats OpenAI's o1 on certain benchmarks

DeepSeek claims its 'reasoning' model beats OpenAI's o1 on certain benchmarks

Yahoo28-01-2025

Chinese AI lab DeepSeek has released an open version of DeepSeek-R1, its so-called reasoning model, that it claims performs as well as OpenAI's o1 on certain AI benchmarks.
R1 is available from the AI dev platform Hugging Face under an MIT license, meaning it can be used commercially without restrictions. According to DeepSeek, R1 beats o1 on the benchmarks AIME, MATH-500, and SWE-bench Verified. AIME employs other models to evaluate a model's performance, while MATH-500 is a collection of word problems. SWE-bench Verified, meanwhile, focuses on programming tasks.
Being a reasoning model, R1 effectively fact-checks itself, which helps it to avoid some of the pitfalls that normally trip up models. Reasoning models take a little longer — usually seconds to minutes longer — to arrive at solutions compared to a typical nonreasoning model. The upside is that they tend to be more reliable in domains such as physics, science, and math.
R1 contains 671 billion parameters, DeepSeek revealed in a technical report. Parameters roughly correspond to a model's problem-solving skills, and models with more parameters generally perform better than those with fewer parameters.
Indeed, 671 billion parameters is massive, but DeepSeek also released "distilled" versions of R1 ranging in size from 1.5 billion parameters to 70 billion parameters. The smallest can run on a laptop. As for the full R1, it requires beefier hardware, but it is available through DeepSeek's API at prices 90%-95% cheaper than OpenAI's o1.
Clem Delangue, the CEO of Hugging Face, said in a post on X on Monday that developers on the platform have created more than 500 "derivative" models of R1 that have racked up 2.5 million downloads combined — five times the number of downloads the official R1 has gotten.
https://twitter.com/ClementDelangue/status/1883946119723708764
There is a downside to R1. Being a Chinese model, it's subject to benchmarking by China's internet regulator to ensure that its responses "embody core socialist values." R1 won't answer questions about Tiananmen Square, for example, or Taiwan's autonomy.
Many Chinese AI systems, including other reasoning models, decline to respond to topics that might raise the ire of regulators in the country, such as speculation about the Xi Jinping regime.
R1 arrives days after the outgoing Biden administration proposed harsher export rules and restrictions on AI technologies for Chinese ventures. Companies in China were already prevented from buying advanced AI chips, but if the new rules go into effect as written, companies will be faced with stricter caps on both the semiconductor tech and models needed to bootstrap sophisticated AI systems.
In a policy document last week, OpenAI urged the U.S. government to support the development of U.S. AI, lest Chinese models match or surpass them in capability. In an interview with The Information, OpenAI's VP of policy Chris Lehane singled out High Flyer Capital Management, DeepSeek's corporate parent, as an organization of particular concern.
So far, at least three Chinese labs — DeepSeek, Alibaba, and Kimi, which is owned by Chinese unicorn Moonshot AI — have produced models that they claim rival o1. (Of note, DeepSeek was the first — it announced a preview of R1 in late November.) In a post on X, Dean Ball, an AI researcher at George Mason University, said that the trend suggests Chinese AI labs will continue to be "fast followers."
"The impressive performance of DeepSeek's distilled models [...] means that very capable reasoners will continue to proliferate widely and be runnable on local hardware," Ball wrote, "far from the eyes of any top-down control regime."
This story originally published on January 20 and was updated on January 27 with more information.
TechCrunch has an AI-focused newsletter! Sign up here to get it in your inbox every Wednesday.

Orange background

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Related Articles

Fusarium Graminearum: The Fungus Allegedly Smuggled by Chinese Researchers—and Its Risks
Fusarium Graminearum: The Fungus Allegedly Smuggled by Chinese Researchers—and Its Risks

Epoch Times

timean hour ago

  • Epoch Times

Fusarium Graminearum: The Fungus Allegedly Smuggled by Chinese Researchers—and Its Risks

A fungus called Fusarium graminearum has made headlines after two Chinese researchers were charged by the FBI for allegedly smuggling it into the United States as a potential agroterrorism agent. But what exactly is this fungus—and what risks does it pose to U.S. agriculture and public health? A Common Cause of Crop Disease F. graminearum is a concerning fungus, as it can contribute to billions of dollars in agricultural losses, Gary Bergstrom, professor emeritus at Cornell University specializing in plant pathology, told The Epoch Times.

AI leaders have a new term for the fact that their models are not always so intelligent
AI leaders have a new term for the fact that their models are not always so intelligent

Business Insider

timean hour ago

  • Business Insider

AI leaders have a new term for the fact that their models are not always so intelligent

As academics, independent developers, and the biggest tech companies in the world drive us closer to artificial general intelligence — a still hypothetical form of intelligence that matches human capabilities — they've hit some roadblocks. Many emerging models are prone to hallucinating, misinformation, and simple errors. Google CEO Sundar Pichai referred to this phase of AI as AJI, or "artificial jagged intelligence," on a recent episode of Lex Fridman's podcast. "I don't know who used it first, maybe Karpathy did," Pichai said, referring to deep learning and computer vision specialist Andrej Karpathy, who cofounded OpenAI before leaving last year. AJI is a bit of a metaphor for the trajectory of AI development — jagged, marked at once by sparks of genius and basic mistakes. In a 2024 X post titled "Jagged Intelligence," Karpathy described the term as a "word I came up with to describe the (strange, unintuitive) fact that state of the art LLMs can both perform extremely impressive tasks (e.g. solve complex math problems) while simultaneously struggle with some very dumb problems." He then posted examples of state of the art large language models failing to understand that 9.9 is bigger than 9.11, making "non-sensical decisions" in a game of tic-tac-toe, and struggling to count. The issue is that unlike humans, "where a lot of knowledge and problem-solving capabilities are all highly correlated and improve linearly all together, from birth to adulthood," the jagged edges of AI are not always clear or predictable, Karpathy said. Pichai echoed the idea. "You see what they can do and then you can trivially find they make numerical errors or counting R's in strawberry or something, which seems to trip up most models," Pichai said. "I feel like we are in the AJI phase where dramatic progress, some things don't work well, but overall, you're seeing lots of progress." In 2010, when Google DeepMind launched, its team would talk about a 20-year timeline for AGI, Pichai said. Google subsequently acquired DeepMind in 2014. Pichai thinks it'll take a little longer than that, but by 2030, "I would stress it doesn't matter what that definition is because you will have mind-blowing progress on many dimensions." By then the world will also need a clear system for labeling AI-generated content to "distinguish reality," he said. "Progress" is a vague term, but Pichai has spoken at length about the benefits we'll see from AI development. At the UN's Summit of the Future in September 2024, he outlined four specific ways that AI would advance humanity — improving access to knowledge in native languages, accelerating scientific discovery, mitigating climate disaster, and contributing to economic progress.

Is CoreWeave Stock a Buy Now?
Is CoreWeave Stock a Buy Now?

Yahoo

time6 hours ago

  • Yahoo

Is CoreWeave Stock a Buy Now?

New AI stock CoreWeave had its initial public offering in March 2025. High demand for AI computing power led to CoreWeave's first-quarter sales soaring more than 400% year over year. The company anticipates sustained revenue growth, but CoreWeave faces financial risks, including operating at a loss. 10 stocks we like better than CoreWeave › Investing in today's stock market can be tricky given the volatile macroeconomic climate, fueled by the Trump administration's ever-shifting tariff policies. But the artificial intelligence sector remains a robust investment opportunity as organizations around the world race to build artificial intelligence (AI) capabilities. Consequently, AI stocks provide the potential for great gains. One example is CoreWeave (NASDAQ: CRWV). The company went public in March at $40 per share. Since then, CoreWeave stock soared to a 52-week high of $166.63 in June. This hot stock remains more than triple its IPO price at the time of this writing. Can it go higher? Evaluating whether now is the time to grab CoreWeave shares requires digging into the company and unpacking its potential as a good investment for the long haul. CoreWeave delivers cloud computing infrastructure to businesses hungry for more computing capacity for their AI systems. The company operates over 30 data centers housing servers and other hardware used by customers to train their AI and develop inference, which is an AI's ability to apply what it learned in training to real-world situations. AI juggernauts such as Microsoft, IBM, and OpenAI, the owner of ChatGPT, are among its roster of customers. The insatiable appetite for AI computing power propelled CoreWeave's business. The company's first-quarter revenue rose a whopping 420% year over year to $981.6 million. Sales growth shows no sign of slowing down. CoreWeave expects Q2 revenue to reach about $1.1 billion. That would represent a strong year-over-year increase of nearly 170% from the prior year's $395 million. The company signs long-term, committed contracts, and as a result, it has visibility into its future revenue potential. At the end of Q1, CoreWeave had amassed a revenue backlog of $25.9 billion, up 63% year over year thanks to a deal with OpenAI. The company forecasts 2025 full-year revenue to come in between $4.9 billion and $5.1 billion, a substantial jump up from 2024's $1.9 billion. Although CoreWeave has enjoyed massive sales success, there are some potential pitfalls with the company. For starters, it isn't profitable. Its Q1 operating expenses totaled $1 billion compared to revenue of $981.6 million, resulting in an operating loss of $27.5 million. Even worse, its costs are accelerating faster than sales, which means the company is moving further away from reaching profitability. CoreWeave's $1 billion in operating expenses represented a 487% increase over the prior year, eclipsing its 420% year-over-year revenue growth. Another area of concern is the company's significant debt load. CoreWeave exited Q1 with $18.8 billion in total liabilities on its balance sheet, and $8.7 billion of that was debt. To keep up with customer demand for computing power, CoreWeave has to spend on expanding and upgrading AI-optimized hardware, and that's not cheap. As it adds customers, the company must expand its data centers to keep pace. Debt is one way it's funding these capital expenditures. Among the risks of buying its stock, CoreWeave admitted, "Our substantial indebtedness could materially adversely affect our financial condition" and that the company "may still incur substantially more indebtedness in the future." In fact, its Q1 debt total of $8.7 billion was a 10% increase from the prior quarter's $7.9 billion in debt. Seeing an increase in both expenses and debt is a concern, but because CoreWeave is a newly public company, there's not much history to know how well it can manage its finances over the long term. Q1 is the only quarter of financial results it's released since its initial public offering. If subsequent quarters reveal a trend toward getting costs and debt under control while continuing to show strong sales growth, CoreWeave stock may prove to be a worthwhile investment over the long run. But for now, only investors with a high risk tolerance should consider buying shares. Even then, another consideration is CoreWeave's stock valuation. This can be assessed by comparing its price-to-sales (P/S) ratio to other AI companies, such as its customer and fellow cloud provider Microsoft and AI leader Nvidia. CoreWeave's share price surged over recent weeks, causing its P/S multiple to skyrocket past that of Nvidia and Microsoft. The valuation suggests CoreWeave stock is overpriced at this time. Although CoreWeave's sales are strong, given its pricey stock and shaky financials, the ideal approach is to put CoreWeave on your watch list. See how it performs over the next few quarters, and wait for its high valuation to drop before considering an investment. Before you buy stock in CoreWeave, consider this: The Motley Fool Stock Advisor analyst team just identified what they believe are the for investors to buy now… and CoreWeave wasn't one of them. The 10 stocks that made the cut could produce monster returns in the coming years. Consider when Netflix made this list on December 17, 2004... if you invested $1,000 at the time of our recommendation, you'd have $669,517!* Or when Nvidia made this list on April 15, 2005... if you invested $1,000 at the time of our recommendation, you'd have $868,615!* Now, it's worth noting Stock Advisor's total average return is 792% — a market-crushing outperformance compared to 171% for the S&P 500. Don't miss out on the latest top 10 list, available when you join . See the 10 stocks » *Stock Advisor returns as of June 2, 2025 Robert Izquierdo has positions in International Business Machines, Microsoft, and Nvidia. The Motley Fool has positions in and recommends International Business Machines, Microsoft, and Nvidia. The Motley Fool recommends the following options: long January 2026 $395 calls on Microsoft and short January 2026 $405 calls on Microsoft. The Motley Fool has a disclosure policy. Is CoreWeave Stock a Buy Now? was originally published by The Motley Fool Sign in to access your portfolio

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into the world of global news and events? Download our app today from your preferred app store and start exploring.
app-storeplay-store