logo
DeepSeek paper offers new details on how it used 2,048 Nvidia chips to take on OpenAI

DeepSeek paper offers new details on how it used 2,048 Nvidia chips to take on OpenAI

Chinese
artificial intelligence (AI) research lab
DeepSeek has released a new research paper revealing in detail for the first time how it built one of the world's most powerful open-source AI systems at a fraction of the cost of its competitors.
'Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures', co-authored by DeepSeek founder Liang Wenfeng and released on Wednesday, attributes the start-up's breakthrough in training high-performance, cost-efficient AI systems to a hardware-software co-design approach.
'DeepSeek-V3, trained on 2,048 Nvidia H800 GPUs, demonstrates how hardware-aware model co-design can effectively address these challenges, enabling cost-efficient training and inference at scale,' the researchers wrote. DeepSeek and its hedge fund owner High-Flyer had previously stockpiled the H800, which
Nvidia originally designed for the China market to comply with US export restrictions but were banned from export to to the country in 2023.
The start-up's training approach stemmed from the team's awareness of hardware constraints and the 'exorbitant costs' of training large language models (LLMs) – the technology behind AI chatbots such as OpenAI's
ChatGPT – according to the paper.
The paper details technical optimisations that boost memory efficiency, streamline inter-chip communication, and enhance overall AI infrastructure performance – key advancements for reducing operational costs while scaling capabilities. These offer a 'practical blueprint for innovation in next-generation AI systems', the researchers said.
Play
DeepSeek also highlighted its use of a mixture-of-experts (MoE) model architecture, a machine-learning approach that divides an AI model into separate sub-networks, or experts, each focused on a subset of the input data while working collaboratively.

Orange background

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

Related Articles

DeepSeek job ads call for interns to label medical data to improve AI use in hospitals
DeepSeek job ads call for interns to label medical data to improve AI use in hospitals

South China Morning Post

time2 days ago

  • South China Morning Post

DeepSeek job ads call for interns to label medical data to improve AI use in hospitals

Chinese artificial intelligence (AI) start-up DeepSeek, which has remained mute over a release date for its R2 reasoning model, has begun recruiting interns to label medical data to improve the use of AI in hospitals. Advertisement According to recruitment ads posted on Boss Zhipin, one of China's largest hiring websites, DeepSeek is offering 500 yuan (US$70) per day for interns who can work four days a week to label medical data for applications involving 'advanced auxiliary diagnosis' tools. The jobs are based in Beijing. The intern roles are not listed on DeepSeek's official WeChat hiring channel. It is the first time that DeepSeek has publicly mentioned the need for 'medical data' in data labelling. The hiring notice on Boss said applicants should have medical backgrounds, and either be undergraduates in their fourth year or have a master's degree. As well, they need experience in using large language models (LLMs), and should be able to write Python code and write prompts for large AI models. DeepSeek did not immediately reply to a request for comment. A nurse moves a bed through a corridor at a hospital in Duan Yao autonomous county in Guangxi region, China, January 9, 2025. Photo: Reuters The move comes as Chinese hospitals embrace open-source AI models from DeepSeek to generate diagnoses and prescriptions. As of March, at least 300 hospitals in China have started using DeepSeek's LLMs in clinical diagnostics and medical decision support.

Huawei claims better AI training method than DeepSeek using own Ascend chips
Huawei claims better AI training method than DeepSeek using own Ascend chips

South China Morning Post

time4 days ago

  • South China Morning Post

Huawei claims better AI training method than DeepSeek using own Ascend chips

Researchers working on Huawei Technologies ' large language model (LLM) Pangu claimed they have improved on DeepSeek's original approach to training artificial intelligence (AI) by leveraging the US-sanctioned company's proprietary hardware. Advertisement A paper – published last week by Huawei's Pangu team, which comprises 22 core contributors and 56 additional researchers – introduced the concept of Mixture of Grouped Experts (MoGE). It is an upgraded version of the Mixture of Experts (MoE) technique that has been instrumental in DeepSeek's cost-effective AI models. While MoE offers low execution costs for large model parameters and enhanced learning capacity, it often results in inefficiencies, according to the paper. This is because of the uneven activation of so-called experts, which can hinder performance when running on multiple devices in parallel. In contrast, the improved MoGE 'groups the experts during selection and better balances the expert workload', researchers said. In AI training, 'experts' refer to specialised sub-models or components within a larger model, each designed to handle specific tasks or types of data. This allows the overall system to take advantage of diverse expertise to enhance performance. 01:38 China a 'key market', says Nvidia CEO Huang during Beijing visit as US bans AI chips China a 'key market', says Nvidia CEO Huang during Beijing visit as US bans AI chips The advancement comes at a crucial time, as Chinese AI companies are focused on enhancing model training and inference efficiency through algorithmic improvements and a synergy of hardware and software, despite US restrictions on the export of advanced AI chips like those from Nvidia

US venture capital firms visit China to study its AI scene as DeepSeek rekindles interest
US venture capital firms visit China to study its AI scene as DeepSeek rekindles interest

South China Morning Post

time4 days ago

  • South China Morning Post

US venture capital firms visit China to study its AI scene as DeepSeek rekindles interest

Joshua Kushner's Thrive Capital and investment firm Capital Group have in recent months visited China to learn about its artificial intelligence (AI) industry, joining a growing number of US investors rekindling interest in the country after DeepSeek's advances stunned Silicon Valley. Advertisement Senior people at Thrive met companies and funds in China to discuss AI, people familiar with their visit to the country said. Kushner did not join the delegation, one of the people said, asking to remain anonymous discussing a private event. At the same time, Capital Group – one of the world's largest funds – dispatched senior executives to China to find out more about the AI scene, the people said. Joshua Kushner, the brother of US President Donald Trump's son-in-law Jared Kushner, seen with his wife Karlie Kloss. Photo: Invision/AP The outreach underscores tentative but mounting interest in a once-overlooked Chinese AI industry that is getting reassessed since DeepSeek proved a home-grown firm can design a platform on par with the likes of OpenAI and Anthropic.

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into the world of global news and events? Download our app today from your preferred app store and start exploring.
app-storeplay-store