Latest news with #BitNetb1.58

Microsoft's New Compact 1-Bit LLM Needs Just 400MB of Memory

Yahoo

22-04-2025

Science
Yahoo

Microsoft's New Compact 1-Bit LLM Needs Just 400MB of Memory

Microsoft's new large language model (LLM) puts significantly less strain on hardware than other LLMs—and it's free to experiment with. The 1-bit LLM (1.58-bit, to be more precise) uses -1, 0, and 1 to indicate weights, which could be useful for running LLMs on small devices, such as smartphones. Microsoft put BitNet b1.58 2B4T on Hugging Face, a collaboration platform for the AI community. 'We introduce BitNet b1.58 2B4T, the first open-source, native 1-bit Large Language Model (LLM) at the 2-billion parameter scale,' the Microsoft researchers wrote. 'Trained on a corpus of 4 trillion tokens, the model has been rigorously evaluated across benchmarks covering language understanding, mathematical reasoning, coding proficiency, and conversational ability.' The keys to b1.58 2B4T are the performance and efficiency it provides. Where other LLMs often use 16-bit (or 32-bit) floating-point formats. The weights (parameters) are expressed using just the three values (-1, 0,1). Although this isn't the first BitNet of its kind, its size makes it unique. As TechRepublic points out, this is the first 2 billion-parameter, 1-bit LLM. Credit: Microsoft An important goal when developing LLMs for less-powerful hardware is to reduce the model's memory needs. In the case of b1.58 2B4T, it requires only 400MB, a dramatic drop from previous record holders, like Gemma 3 1B, which uses 1.4GB. 'The core contribution of this work is to demonstrate that a native 1-0bit LLM, when trained effectively at scale, can achieve performance comparable to leading open-weight, full-precision models of similar size across a wide range of tasks,' the researchers wrote in the report. One thing to keep in mind is that BitNet b1.58 2B4T only works on Microsoft's own system, instead of other traditional frameworks. Training the LLM requires three steps, or phases. The first is pre-training is broken into several of its own steps and (in the case of the researchers' testing) involves 'synthetically generated mathematical data,' along with data from large web crawls, educational web pages, and other 'publicly available' text. The next phase is supervised fine-tuning (SFT). Researchers used WildChat for conversational training. The last phase, direct preference optimization (DPO), is meant to improve the AI's conversational skills and to put it in sync with your target audience's preferences.

Latest news with #BitNetb1.58

Microsoft's New Compact 1-Bit LLM Needs Just 400MB of Memory

Get Started Now: Download the App