logo
#

Latest news with #ChineseAI

DeepSeek's distilled new R1 AI model can run on a single GPU
DeepSeek's distilled new R1 AI model can run on a single GPU

TechCrunch

time2 days ago

  • Business
  • TechCrunch

DeepSeek's distilled new R1 AI model can run on a single GPU

DeepSeek's updated R1 reasoning AI model might be getting the bulk of the AI community's attention this week. But the Chinese AI lab also released a smaller, 'distilled' version of its new R1, DeepSeek-R1-0528-Qwen3-8B, that DeepSeek claims beats comparably-sized models on certain benchmarks. The smaller updated R1, which was built using the Qwen3-8B model Alibaba launched in May as a foundation, performs better than Google's Gemini 2.5 Flash on AIME 2025, a collection of challenging math questions. DeepSeek-R1-0528-Qwen3-8B also nearly matches Microsoft's recently released Phi 4 reasoning plus model on another math skills test, HMMT. So-called distilled models like DeepSeek-R1-0528-Qwen3-8B are generally less capable than their full-sized counterparts. On the plus side, they're far less computationally demanding. According to the cloud platform NodeShift, Qwen3-8B requires a GPU with 40GB-80GB of RAM to run (e.g., an Nvidia H100). The full-sized new R1 needs around a dozen 80GB GPUs. DeepSeek trained DeepSeek-R1-0528-Qwen3-8B by taking text generated by the updated R1 and using it to fine-tune Qwen3-8B. In a dedicated webpage for the model on the AI dev platform Hugging Face, DeepSeek describes DeepSeek-R1-0528-Qwen3-8B as 'for both academic research on reasoning models and industrial development focused on small-scale models.' DeepSeek-R1-0528-Qwen3-8B is available under a permissive MIT license, meaning it can be used commercially without restriction. Several hosts, including LM Studio, already offer the model through an API.

DeepSeek's R1 Update Boosts Coding Capabilities
DeepSeek's R1 Update Boosts Coding Capabilities

Forbes

time2 days ago

  • Business
  • Forbes

DeepSeek's R1 Update Boosts Coding Capabilities

SAN ANSELMO, CALIFORNIA - JANUARY 27: In this photo illustration, the DeepSeek app is displayed on ... More an iPhone screen on January 27, 2025 in San Anselmo, California. Newly launched Chinese AI app DeepSeek has surged to number one in Apple's App Store and has triggered a sell-off of U.S. tech stocks over concerns that Chinese companies' AI advances could threaten the bottom line of tech giants in the United States and Europe. (Photo Illustration by) DeepSeek has rolled out an update to its R1 model, ushering in a new era of coding assistance at a much affordable cost. While users have yet to uncover every enhancement, the newly fortified programming capabilities stand out as potentially transformative. Novice and experienced programmers alike can now instruct DeepSeek to build simple, interactive video games and run them in Python. And for those without Python or Pygame installed locally, DeepSeek can translate its output into HTML5, enabling anyone to launch and test games directly in a web browser, no environment configuration required. This flexibility not only accelerates prototyping but also reduces technical barriers, making game development accessible to a wider spectrum of users. What sets DeepSeek apart from competing models such as Claude 3.7 Sonnet and GPT o3 is its cost structure. By offering these advanced coding features free of charge, DeepSeek positions itself as an ideal solution for educational institutions, nonprofit organizations, and individual creators operating on tight budgets. Students and community groups that previously lacked the resources to subscribe to premium AI services can now explore interactive programming projects without financial constraints. Beyond game development, DeepSeek's enhanced coding engine can scaffold website building from scratch. Users can prompt DeepSeek to fetch and leverage publicly available datasets, say, a Github repository containing 19th-century British novels, and transform raw text into dynamic web applications. In a single workflow, DeepSeek can generate code that ingests the dataset, constructs word clouds, performs sentiment analysis, and displays interactive visualizations. Users can also engage in multiple rounds of prompts to ask DeepSeek to improve the website with more specificity. This end-to-end functionality has the potential to streamline data journalism, digital humanities research, and business intelligence initiatives, simplifying the tasks between data extraction and front-end development. The implications extend far beyond academic research and entertainment. From financial analysts automating statistical models to healthcare professionals building real-time dashboards, DeepSeek's zero-cost coding assistance lowers the threshold for data-driven decision-making. Organizations can explore prototyping analytics tools and spinning up web-based reports with higher efficiency. DeepSeek's R1 update, especially the enhanced coding skills, may help democratizing software creation. By integrating powerful code generation with an open-source model, DeepSeek opens an avenue for innovators to experiment, iterate, and launch applications at minimal cost.

DOWNLOAD THE APP

Get Started Now: Download the App

Ready to dive into the world of global news and events? Download our app today from your preferred app store and start exploring.
app-storeplay-store