12 hours ago
Google launches Gemma 3n, multimodal Open Source AI model that runs on just 2GB RAM without internet
Google has announced the full launch of its latest on-device AI model, Gemma 3n, which was first announced in May 2025. The AI model brings advanced multimodal capabilities, including audio, image, video and text processing, to smartphones and edge devices with limited memory and no internet connection. With this release, developers can now deploy AI features that used to require powerful cloud infrastructure, directly on phones and low-power the heart of Gemma 3n is a new architecture called MatFormer, short for Matryoshka Transformer. Google explains that much like Russian nesting dolls, the model includes smaller, fully-functional sub-models inside larger ones. This design makes it easy for developers to scale performance based on available hardware. For example, Gemma 3n is available in two versions: E2B, which operates on as little as 2GB of memory, and E4B, which requires about having 5 to 8 billion raw parameters, both models perform like much smaller models in terms of resource use. This efficiency comes from innovations like Per-Layer Embeddings (PLE), which shift some of the workload from the phone's graphics processor to its central processor, freeing up valuable 3n also introduces KV Cache Sharing, which significantly speeds up how quickly the model processes long audio and video inputs. Google says this improves response times by up to two times, making real-time applications like voice assistants or video analysis much faster and more practical on mobile
For speech-based features, Gemma 3n includes a built-in audio encoder adapted from Google's Universal Speech Model. This allows it to perform tasks like speech-to-text and language translation directly on a phone. Early tests have shown especially strong results when translating between English and European languages like Spanish, French, Italian, and visual side of Gemma 3n is powered by MobileNet-V5, Google's new lightweight vision encoder. This system can handle video streams up to 60 frames per second on devices like the Google Pixel, enabling smooth real-time video analysis. Despite being smaller and faster, it outperforms previous vision models in both speed and can access Gemma 3n via popular tools like Hugging Face Transformers, Ollama, MLX, and others. Google has also launched the "Gemma 3n Impact Challenge," inviting developers to create applications using the model's offline capabilities. Winners will share a $150,000 prize the model can operate entirely offline, meaning it doesn't need an internet connection to work. This opens the door for AI-powered apps in remote areas or privacy-sensitive situations where cloud-based models aren't viable. With support for over 140 languages and the ability to understand content in 35, Gemma 3n sets a new standard for efficient, accessible on-device AI. - Ends