DeepSeek paper offers new details on how it used 2,048 Nvidia chips to take on OpenAI

16-05-2025

Chinese
artificial intelligence (AI) research lab
DeepSeek has released a new research paper revealing in detail for the first time how it built one of the world's most powerful open-source AI systems at a fraction of the cost of its competitors.
'Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures', co-authored by DeepSeek founder Liang Wenfeng and released on Wednesday, attributes the start-up's breakthrough in training high-performance, cost-efficient AI systems to a hardware-software co-design approach.
'DeepSeek-V3, trained on 2,048 Nvidia H800 GPUs, demonstrates how hardware-aware model co-design can effectively address these challenges, enabling cost-efficient training and inference at scale,' the researchers wrote. DeepSeek and its hedge fund owner High-Flyer had previously stockpiled the H800, which
Nvidia originally designed for the China market to comply with US export restrictions but were banned from export to to the country in 2023.
The start-up's training approach stemmed from the team's awareness of hardware constraints and the 'exorbitant costs' of training large language models (LLMs) – the technology behind AI chatbots such as OpenAI's
ChatGPT – according to the paper.
The paper details technical optimisations that boost memory efficiency, streamline inter-chip communication, and enhance overall AI infrastructure performance – key advancements for reducing operational costs while scaling capabilities. These offer a 'practical blueprint for innovation in next-generation AI systems', the researchers said.
Play
DeepSeek also highlighted its use of a mixture-of-experts (MoE) model architecture, a machine-learning approach that divides an AI model into separate sub-networks, or experts, each focused on a subset of the input data while working collaboratively.

Hashtags

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Where is DeepSeek's next AI model? Speculation rises after OpenAI unveils GPT-5

South China Morning Post

3 days ago

South China Morning Post

Where is DeepSeek's next AI model? Speculation rises after OpenAI unveils GPT-5

China's leading artificial intelligence start-up, DeepSeek , finds itself at the centre of speculation about the release of its next-generation R2 reasoning model, following OpenAI 's launch of its GPT-5 model last week. DeepSeek – which made waves in the industry with its V3 model in December and the R1 model in January – has not introduced major updates to its products in the past months, aside from two minor revisions. While the market had expected DeepSeek to introduce a new foundation model within months of R1's release, the Hangzhou-based start-up – founded by computer scientist Liang Wenfeng as a side project of his quantitative trading firm – has yet to announce a schedule for the launch of the R2 model. DeepSeek did not respond to a request for comment on Wednesday. Developing and training an advanced model is an expensive and complex task, requiring substantial computing resources and training data, as well as sophisticated algorithms. It took OpenAI two and a half years to release GPT-5 after launching GPT-4 in March 2023. Sam Altman, CEO of ChatGPT maker OpenAI. Photo: Reuters According to a report by The Financial Times on Thursday, DeepSeek has delayed the launch of its new model owing to challenges in training it with Huawei Technologies' Ascend AI chips. The start-up has faced 'persistent technical issues' during the process, said the report, which cited an unidentified source.

As AI chatbots gain popularity in China, so does the business of inserting ads into results

South China Morning Post

09-08-2025

South China Morning Post

As AI chatbots gain popularity in China, so does the business of inserting ads into results

As ChatGPT-like chatbots gain traction, generative engine optimisation (GEO) has emerged as an increasingly significant approach for brands seeking visibility in answers generated by artificial intelligence. Unlike traditional search engine optimisation (SEO) that is meant to improve rankings in results, GEO aims to ensure content is cited, summarised or recommended by AI models, which generate 'answers with a word limit', said Yuan Yong, branding director of Big Fish Marketing in Shenzhen. Yuan, who has been working with SEO since 2014, only ventured into GEO this year after Chinese AI start-up DeepSeek burst onto the scene, prompting many local companies to rethink their strategies. Optimising AI-generated content involves improving the information provided on a company's website, thereby increasing its exposure to news outlets, online portals and Wikipedia-like platforms, Yuan said. Yuan cited the example of online education platform Nuoyun, which received a recommendation from DeepSeek as one of the best such resources in China after the Big Fish team 'polished' the content on its website, as well as posting 40 articles on other websites. But that does not mean the new-found visibility stayed permanent, as continued attention would be required to fine-tune the content.

OpenAI poised to release GPT-5 model days after launching 2 new open systems

South China Morning Post

07-08-2025

South China Morning Post

OpenAI poised to release GPT-5 model days after launching 2 new open systems

San Francisco-based OpenAI hinted as much in an all-caps post on X that read: 'LIVE5TREAM THURSDAY 10 AM PT.' While there appeared to be a typo in the first word of that announcement, many of those who posted comments got the hint that the misplaced '5' could be about GPT-5. In a subsequent post on his personal X account, CEO Sam Altman did not address the speculation on GPT-5's release, but said the live-streaming session would be 'longer than usual, around an hour'. 'We have a lot to show and hope you can find the time to watch!' he added. The potential launch this week of GPT-5 – the latest iteration of the technology behind generative AI chatbot ChatGPT – would reflect a strong effort by OpenAI to finally release the product upgrade in the global market of large language models (LLMs), considering that GPT-4 was released in March 2023. In the meantime, open-source models from Chinese start-ups like DeepSeek and MoonshotAI , along with those from mainland Big Tech firms led by Alibaba Group Holding , have seen increased adoption in the industry on the back of their low-cost appeal and innovative features. Alibaba owns the South China Morning Post.

DeepSeek paper offers new details on how it used 2,048 Nvidia chips to take on OpenAI

Hashtags

Try Our AI Features

Comments

Related Articles

Where is DeepSeek's next AI model? Speculation rises after OpenAI unveils GPT-5

As AI chatbots gain popularity in China, so does the business of inserting ads into results

OpenAI poised to release GPT-5 model days after launching 2 new open systems

Get Started Now: Download the App