Atlas Cloud Launches High-Efficiency AI Inference Platform, Outperforming DeepSeek

28-05-2025

Developed with SGLang, Atlas Inference surpasses leading AI companies in throughput and cost, running DeepSeek V3 & R1 faster than DeepSeek themselves.
NEW YORK CITY, NEW YORK / ACCESS Newswire / May 28, 2025 / Atlas Cloud, the all-in-one AI competency center for training and deploying AI models, today announced the launch of Atlas Inference, an AI inference platform that dramatically reduces GPU and server requirements, enabling faster, more cost-effective deployment of large language models (LLMs).
Atlas Inference, co-developed with SGLang, an AI inference engine, maximizes GPU efficiency by processing more tokens faster and with less hardware. When comparing DeepSeek's published performance results, Atlas Inference's 12-node H100 cluster outperformed DeepSeek's reference implementation of their DeepSeek-V3 model while using two-thirds of the servers. Atlas' platform reduces infrastructure requirements and operational costs while addressing hardware costs, which represent up to 80% of AI operational expenses.
"We built Atlas Inference to fundamentally break down the economics of AI deployment," said Jerry Tang, Atlas CEO. "Our platform's ability to process 54,500 input tokens and 22,500 output tokens per second per node means businesses can finally make high-volume LLM services profitable instead of merely break-even. I believe this will have a significant ripple effect throughout the industry. Simply put, we're surpassing industry standards set by hyperscalers by delivering superior throughput with fewer resources."
Atlas Inference's performance also exceeds major players like Amazon, NVIDIA and Microsoft, delivering up to 2.1 times greater throughput using 12 nodes compared to competitors' larger setups. It maintains sub-5-second first-token latency and 100-millisecond inter-token latency with more than 10,000 concurrent sessions, ensuring a scaled, superior experience. The platform's performance is driven by four key innovations:
Prefill/Decode Disaggregation: Separates compute-intensive operations from memory-bound processes to optimize efficiencyDeepExpert (DeepEP) Parallelism with Load Balancers: Ensures over 90% GPU utilizationTwo-Batch OverlapTechnology: Increases throughput by enabling larger batches and utilization of both compute and communication phases simultaneouslyDisposableTensor Memory Models: Prevents crashes during long sequences for reliable operation
"This platform represents a significant leap forward for AI inference," said Yineng Zhang, Core Developer at SGLang. "What we built here may become the new standard for GPU utilization and latency management. We believe this will unlock capabilities previously out of reach for the majority of the industry regarding throughput and efficiency."
Combined with a lower cost per token, linear scaling behavior, and reduced emissions compared to leading vendors, Atlas Inference provides a cost-efficient and scalable AI deployment.
Atlas Inference works with standard hardware and supports custom models, giving customers complete flexibility. Teams can upload fine-tuned models and keep them isolated on dedicated GPUs, making the platform ideal for organizations requiring brand-specific voice or domain expertise.
The platform is available immediately for enterprise customers and early-stage startups.
About Atlas Cloud
Atlas Cloud is your all-in-one AI competency center, powering leading AI teams with safe, simple, and scalable infrastructure for training and deploying models. Atlas Cloud also offers an on-demand GPU platform that delivers fast, serverless compute. Backed by Dell, HPE, and Supermicro, Atlas delivers near instant access to up to 5,000 GPUs across a global SuperCloud fabric with 99% uptime and baked-in compliance. Learn more at atlascloud.ai.
SOURCE: Atlas Cloud
press release

Hashtags

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

The tech trade hits pause ahead of Nvidia earnings

Yahoo

2 hours ago

Yahoo

The tech trade hits pause ahead of Nvidia earnings

The tech pullback this week wasn't anywhere close to a Deepseek moment, when investors questioned the long-term viability of Silicon Valley's AI transition after the Chinese company competed favorably with OpenAI at a fraction of the cost. But Wall Street did hit the breaks, alarmed by a handful of pessimistic headlines amplifying long-standing criticism about ballooning valuations and the costs of the AI boom. While it's too premature and extreme to forecast a major failure or correction — AI stocks have kept plenty enough gains to keep the S&P 500 in the 6,300s — it's clear that the market is signaling a measure of unease with current measures of success. Sign up for the Yahoo Finance Morning Brief By subscribing, you are agreeing to Yahoo's Terms and Privacy Policy It may not be enough anymore for the tech giants and other AI players to tout their enormous capital expenditures and new chatbot initiatives if the market demands actual results. And investors, enriched by rising stock prices (but also pressured by the staggering weight of them), keep asking: Where are they? For most companies using AI, you won't find them. That's the punch-in-the-stomach takeaway from a report published by an initiative at MIT that provided at least part of the moment's catalyst. The research revealed that while roughly 5% of AI pilot programs enjoyed rapid revenue acceleration, the other 95% of companies in the dataset delivered little to no measurable impact on their business. Generative AI initiatives, to be sure, are producing huge returns for a small subset of startups and firms. But the paper highlighted that failed projects were the norm, rather than the exception. As stark as those numbers are, the results might be expected. After all, a new technology that doesn't require technological mastery is also attracting entrepreneurs motivated by a get-rich-quick ethos. And as pliable and dynamic as generative AI tools appear to be, large language models and their derivative products can't address every market bottleneck or consumer pain point. A salesperson yapping about AI deployment and ROI can resemble a "solution" in search of a problem. Other factors sparked the tech sector wobble. Meta (META), wielding a gargantuan budget to assemble an elite AI geek squad, has embarked on another internal restructuring. Not even the daring aim of achieving superintelligence can defeat the tedium of corporate bureaucracy. And OpenAI's ( Sam Altman recently warned that investors might be in for some pain as an AI bubble bursts. "There will be periods of irrational exuberance," he said last week. "But on the whole, the value for society will be huge.' As AI favorites Nvidia (NVDA) and Palantir (PLTR) slipped this week, a broader narrative gained momentum. Perhaps this was another inflection point showcasing the potential paradigm shift away from the dominance of a handful of tech stocks to a less lopsided market. Don't tell that to the tech bulls, though. Especially as Nvidia prepares for a must-watch earnings report next week that could wipe these worries away — or compound them. Tech skeptics will be proven wrong, again, wrote Wedbush analyst Dan Ives in a note on Wednesday, viewing the tech losses as a blip. "While the bears will fret about valuations and have been skeptical of the historical tech rally, we stress that if you focus solely on valuation looking out a year with P/E ... you would have missed every transformational growth tech stock the last 20 years," he said. Hamza Shaban is a reporter for Yahoo Finance covering markets and the economy. Follow Hamza on X @hshaban. Error in retrieving data Sign in to access your portfolio Error in retrieving data Error in retrieving data Error in retrieving data Error in retrieving data

Yahoo

2 hours ago

Yahoo

The tech trade hits pause ahead of Nvidia earnings

DeepSeek Touts Model That Outdoes Flagship in Agentic AI Step

Bloomberg

3 hours ago

Bloomberg

DeepSeek Touts Model That Outdoes Flagship in Agentic AI Step

DeepSeek unveiled an update to an older model that it says surpasses the seminal R1 on key benchmarks, keeping the Chinese startup in the game while the industry awaits its next flagship offering. The V3.1 returns answers to queries much faster and marks the startup's first step toward creating an AI agent, DeepSeek said in a WeChat post Thursday. DeepSeek first outlined the V3.1 earlier this week, but the platform only just made it to the Hugging Face portal. The version has been customized to work with next-generation Chinese-made AI chips, DeepSeek said in a separate message.

Atlas Cloud Launches High-Efficiency AI Inference Platform, Outperforming DeepSeek

Hashtags

Try Our AI Features

Comments

Related Articles

The tech trade hits pause ahead of Nvidia earnings

The tech trade hits pause ahead of Nvidia earnings

DeepSeek Touts Model That Outdoes Flagship in Agentic AI Step

Get Started Now: Download the App