SambaNova Cloud launches the fastest DeepSeek-R1 671B

Zawya06-03-2025

Dubai, United Arab Emirates: DeepSeek-R1 671B, the best open source reasoning model in the market, is now available on SambaNova Cloud running at speeds of 198 tokens/second/prompt. DeepSeek showed the world how to reduce the training costs for building reasoning models, but inference with GPUs has remained a challenge until today when SambaNova showed how a new hardware architecture with RDUs can achieve better inference performance. These speeds have been independently verified by Artificial Analysis and you can sign up for SambaNova Cloud today to try it in our playground.
Developers who are looking to use this model via the API on the SambaNova Cloud Developer Tier can sign up today for our waitlist. We will be slowly rolling out access over the coming weeks as we rapidly scale out capacity for this model.
About DeepSeek-R1 (the real deal, not distilled)
DeepSeek-R1 caught the world by storm, offering higher reasoning capabilities at a fraction of the cost of its competitors and being completely open sourced. This groundbreaking model, built on a Mixture of Experts (MoE) architecture with 671 billion parameters, showcases superior performance in math and reasoning tasks, even outperforming OpenAI's o1 on certain benchmarks.
SambaNova is a US based company that runs the model on our RDU hardware in US data centers. Companies can also choose to work with SambaNova to deploy our hardware and the DeepSeek model on-premise in their own data centers for maximum data privacy and security. This is unlike the service run by the company DeepSeek (not the model), which runs their cloud service on GPUs, without providing any controls for data privacy.
Unlike the 70B distilled version of the model (also available today on the SambaNova Cloud Developer tier), DeepSeek-R1 uses reasoning to completely outclass the distilled versions in terms of accuracy. As a reasoning model, R1 uses more tokens to think before generating an answer, which allows the model to generate much more accurate and thoughtful answers. For example, it was able to reason and determine how to improve the efficiency of running itself (Reddit), which is not possible without reasoning capabilities.
100X the Global Inference Compute of DeepSeek-R1
There is no shortage of demand for R1 given its performance and cost, but given that DeepSeek-R1 is a reasoning model that generates more tokens during run time, developers unfortunately today are compute constrained to get enough access to R1 because of the inefficiencies of the GPU. GPU inefficiency is one of the main reasons why DeepSeek had to disable their own inference API service.
SambaNova RDU chips are perfectly designed to handle big Mixture of Expert models, like DeepSeek-R1, thanks to our dataflow architecture and three-tier memory design of the SN40L RDU. This design allows us to optimally deploy these types of models using just one rack to deliver large performance gains instead of the 40 racks of 320 GPUs that were used to power DeepSeek's inference. To learn more about the RDU and our unique architectural advantage, read our blog.
Thanks to the efficiency of our RDU chips, SambaNova expects to be serving 100X the global demand for the DeepSeek-R1 model by the end of the year. This makes SambaNova RDU chips the most efficient inference platform for running reasoning models like DeepSeek-R1.
Improve Software Development with R1
Check out demos from our friends at Hugging Face and BlackBox showing the advantages of coding significantly better with R1. In CyberCoder, BlackBox is able to use R1 to significantly improve the performance of coding agents, which is one of the primary use cases for developers using the R1 Model.
For media enquiries:
Emad Abdo
Emad@memediapro.com

Hashtags

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

AI startups revolutionize coding industry, leading to sky-high valuations

Zawya

an hour ago

Zawya

AI startups revolutionize coding industry, leading to sky-high valuations

Two years after the launch of ChatGPT, return on investment in generative AI has been elusive, but one area stands out: software development. So-called code generation or 'code-gen' startups are commanding sky-high valuations as corporate boardrooms look to use AI to aid, and sometimes to replace, expensive human software engineers. Cursor, a code generation startup based in San Francisco that can suggest and complete lines of code and write whole sections of code autonomously, raised $900 million at a $10 billion valuation in May from a who's who list of tech investors, including Thrive Capital, Andreessen Horowitz and Accel. Windsurf, a Mountain View-based startup behind the popular AI coding tool Codeium, attracted the attention of ChatGPT maker OpenAI, which is now in talks to acquire the company for $3 billion, sources familiar with the matter told Reuters. Its tool is known for translating plain English commands into code, sometimes called 'vibe coding,' which allows people with no knowledge of computer languages to write software. OpenAI and Windsurf declined to comment on the acquisition. 'AI has automated all the repetitive, tedious work,' said Scott Wu, CEO of code gen startup Cognition. 'The software engineer's role has already changed dramatically. It's not about memorizing esoteric syntax anymore.' Founders of code-gen startups and their investors believe they are in a land grab situation, with a shrinking window to gain a critical mass of users and establish their AI coding tool as the industry standard. But because most are built on AI foundation models developed elsewhere, such as OpenAI, Anthropic, or DeepSeek, their costs per query are also growing, and none are yet profitable. They're also at risk of being disrupted by Google, Microsoft and OpenAI, which all announced new code-gen products in May, and Anthropic is also working on one as well, two sources familiar with the matter told Reuters. The rapid growth of these startups is coming despite competing on big tech's home turf. Microsoft's GitHub Copilot, launched in 2021 and considered code-gen's dominant player, grew to over $500 million in revenue last year, according to a source familiar with the matter. Microsoft declined to comment on GitHub Copilot's revenue. On Microsoft's earnings call in April, the company said the product has over 15 million users. LEARN TO CODE? As AI revolutionizes the industry, many jobs - particularly entry-level coding positions that are more basic and involve repetition - may be eliminated. Signalfire, a VC firm that tracks tech hiring, found that new hires with less than a year of experience fell 24% in 2024, a drop it attributes to tasks once assigned to entry-level software engineers are now being fulfilled in part with AI. Google's CEO also said in April that 'well over 30%' of Google's code is now AI-generated, and Amazon CEO Andy Jassy said last year the company had saved 'the equivalent of 4,500 developer-years' by using AI. Google and Amazon declined to comment. In May, Microsoft CEO Satya Nadella said at a conference that approximately 20 to 30% of their code is now AI-generated. The same month, the company announced layoffs of 6,000 workers globally, with over 40% of those being software developers in Microsoft's home state, Washington. 'We're focused on creating AI that empowers developers to be more productive, creative, and save time,' a Microsoft spokesperson said. 'This means some roles will change with the revolution of AI, but human intelligence remains at the center of the software development life cycle.' MOUNTING LOSSES Some 'vibe-coding' platforms already boast substantial annualized revenues. Cursor, with just 60 employees, went from zero to $100 million in recurring revenue by January 2025, less than two years since its launch. Windsurf, founded in 2021, launched its code generation product in November 2024 and is already bringing in $50 million in annualized revenue, according to a source familiar with the company. But both startups operate with negative gross margins, meaning they spend more than they make, according to four investor sources familiar with their operations. 'The prices people are paying for coding assistants are going to get more expensive,' Quinn Slack, CEO at coding startup Sourcegraph, told Reuters. Both Cursor and Windsurf are led by recent MIT graduates in their twenties, and exemplify the gold rush era of the AI startup scene. 'I haven't seen people working this hard since the first Internet boom,' said Martin Casado, a general partner at Andreessen Horowitz, an investor in Anysphere, the company behind Cursor. What's less clear is whether the dozen or so code-gen companies will be able to hang on to their customers as big tech moves in. 'In many cases, it's less about who's got the best technology -- it's about who is going to make the best use of that technology, and who's going to be able to sell their products better than others,' said Scott Raney, managing director at Redpoint Ventures, whose firm invested in Sourcegraph and Poolside, a software development startup that's building its own AI foundation model. CUSTOM AI MODELS Most of the AI coding startups currently rely on the Claude AI model from Anthropic, which crossed $3 billion in annualized revenue in May in part due to fees paid by code-gen companies. But some startups are attempting to build their own models. In May, Windsurf announced its first in-house AI models that are optimized for software engineering in a bid to control the user experience. Cursor has also hired a team of researchers to pre-train its own large frontier-level models, which could enable the company to not have to pay foundation model companies so much money, according to two sources familiar with the matter. Startups looking to train their own AI coding models face an uphill battle as it could easily cost millions to buy or rent the computing capacity needed to train a large language model. Replit earlier dropped plans to train its own model. Poolside, which has raised more than $600 million to make a coding-specific model, has announced a partnership with Amazon Web Services and is testing with customers, but hasn't made any product generally available yet. Another code gen startup Magic Dev, which raised nearly $500 million since 2023, told investors a frontier-level coding model was coming in summer 2024 but hasn't yet launched a product. Poolside declined to comment. Magic Dev did not respond to a request for comment.

OpenAI's vision for ChatGPT: Your 'AI super assistant' and gateway to the internet

Khaleej Times

4 hours ago

Khaleej Times

OpenAI's vision for ChatGPT: Your 'AI super assistant' and gateway to the internet

Since its 2022 debut, ChatGPT has rapidly become one of the most recognised and widely used AI tools in the world. But OpenAI isn't stopping there. A newly surfaced internal strategy document reveals that the company has far more ambitious plans: transforming ChatGPT from a helpful chatbot into your default 'interface to the internet.' The document — heavily redacted and recently made public through the U.S. Justice Department's antitrust case against Google — outlines OpenAI's evolving roadmap for ChatGPT. At its core is a goal to build an 'AI super assistant that deeply understands you and is your interface to the internet.' From chatbot to super assistant While ChatGPT currently exists across familiar platforms — web, mobile, and desktop — OpenAI envisions it becoming a more embedded, omnipresent digital companion. Think of it less as a tool, and more as a proactive, intelligent sidekick that assists with everything from planning your day to summarising meetings, booking reservations, creating content, and even helping you maintain personal relationships. 'Today, ChatGPT is in our lives through existing form factors,' the document reads, according to Tom's Guide. 'But our vision for ChatGPT is to help you with all of your life, no matter where you are.' To describe this next-gen assistant, OpenAI uses the metaphor of a 'T-shaped' entity — broad in its ability to handle everyday tasks like scheduling or note-taking, but with deep, expert-level skill in complex areas such as programming or technical writing. According to the strategy laid out in the document, the first half of 2025 is focused on building this "super assistant." The second half shifts toward making that assistant indispensable — by generating real, monetisable demand across users and businesses. 'In the first half of next year, we'll start evolving ChatGPT into a super-assistant: one that knows you, understands what you care about, and helps with any task that a smart, trustworthy, emotionally intelligent person with a computer could do,' the document explains. Several key advancements are driving this evolution: Smarter models (like GPT-4.5 and beyond) that can reliably perform more autonomous, complex tasks Enhanced tool integration, enabling ChatGPT to take real-world actions like managing files or navigating software Multimodal interfaces, blending text, image, audio, and video to better match how humans communicate Facing competition and infrastructure challenges OpenAI isn't operating in a vacuum. The document sheds light on how it views rivals such as Google Gemini, Microsoft Copilot, and Meta AI. One redacted section hints that Meta could be the most significant long-term competitor, given its ability to integrate AI functionality across its vast ecosystem — without the business model conflicts that Google faces when cannibalising its core search revenue. To protect its lead, OpenAI also supports regulations that would let users choose their default AI assistant across platforms — a clear play against being boxed out by OS-level competitors. But success isn't guaranteed. The document candidly acknowledges the mounting infrastructure demands required to serve ChatGPT's surging user base, a likely reason behind CEO Sam Altman's recent emphasis on building out OpenAI's own data centers and chip supply. 'We are leading here, but we can't rest,' the document reads. 'Growth and revenue won't line up forever.' With this ambitious roadmap, OpenAI isn't just iterating on ChatGPT — it's aiming to redefine how we interact with the internet itself.

SambaNova Launches Its AI Platform in AWS Marketplace

Channel Post MEA

4 days ago

Channel Post MEA

SambaNova Launches Its AI Platform in AWS Marketplace

SambaNova has announced that its AI platform is now available in AWS Marketplace, a digital catalog that helps you find, buy, deploy, and manage software, data products, and professional services from thousands of vendors. This availability allows organizations to seamlessly purchase and deploy SambaNova's fast inference services alongside their existing infrastructure in AWS. This new availability marks a significant milestone in SambaNova's mission to make private, production-grade AI more accessible to enterprises, removing traditional barriers like vendor onboarding and procurement delays. By leveraging existing AWS relationships, organizations can now begin using SambaNova's advanced inference solutions with a few simple clicks — accelerating time to value while maintaining trusted billing and infrastructure practices. 'Enterprises face significant pressure to move rapidly from AI experimentation to full-scale production, yet procurement and integration challenges often stand in the way,' said Rodrigo Liang, CEO and co-founder of SambaNova. 'By offering SambaNova's platform in AWS Marketplace, we remove those obstacles, enabling organizations to access our industry leading inference solutions instantly, using the procurement processes and a cloud environment they already trust.' Accelerating Access to High-Performance Inference SambaNova's listing in AWS Marketplace gives customers the ability to: Procure through existing AWS billing arrangements — no new vendor setup required. Leverage SambaNova's inference performance — fast and efficiently, running open source models like Llama 4 Maverick and DeepSeek R1 671B. Engage securely via private connectivity — possible through AWS PrivateLink for low-latency, secure integration between AWS workloads and SambaNova Cloud. 'With the SambaNova platform running in AWS Marketplace, organizations gain access to secure, high-speed inference from the largest open-source models. Solutions like this will help businesses move from experimentation to full production with AI,' said Michele Rosen, Research Manager, Open GenAI, LLMs, and the Evolving Open Source, IDC. This tight integration enables customers to deploy high-performance, multi-tenant inference solutions without the need to purchase or manage custom hardware — expanding SambaNova's reach into enterprise environments where time-to-value and IT friction have historically limited adoption. Making High-Performance Inference More Accessible With this listing in AWS Marketplace, SambaNova is meeting enterprise customers where they already are — within their trusted cloud environments and procurement frameworks. By removing onboarding friction and offering seamless integration, SambaNova makes it easier than ever for organizations to evaluate, deploy, and scale high-performance inference solutions. 'This makes it dramatically easier for customers to start using SambaNova — no new contracts, no long onboarding — just click and go,' said Liang. 0 0