a day ago
Google Unveils Gemma 3n: Advanced Offline AI Model for Phones with Just 2GB RAM
Google has officially launched Gemma 3n, its latest on-device AI model that's designed to run seamlessly even on smartphones with as little as 2GB of memory—and it doesn't need an internet connection to function. First teased in May 2025, the model is now available for developers worldwide.
Gemma 3n stands out by supporting multimodal input, including text, audio, image, and video, all processed directly on low-power devices like smartphones and edge devices. This allows real-time, AI-driven features previously reliant on cloud computing to now be executed locally.
At its core is MatFormer—short for Matryoshka Transformer—Google's innovative architecture that mirrors the structure of Russian nesting dolls. According to the company, this design enables each model to contain smaller, independent models, allowing performance to scale according to device capability.
Gemma 3n is being offered in two variants:
E2B, optimized for devices with 2GB of RAM
E4B, designed for those with 3GB RAM
Despite comprising 5 to 8 billion parameters, both versions are optimized for efficient operation. A key innovation here is Per-Layer Embeddings (PLE), which help shift processing tasks from the device's GPU to the CPU, conserving memory while maintaining speed.
In addition, KV Cache Sharing allows for much faster processing of lengthy audio and video files. Google says this enhancement doubles the model's responsiveness, making it ideal for applications like voice assistants and live video analysis on the go.
For audio capabilities, Gemma 3n integrates a modified version of Google's Universal Speech Model. This enables it to support features like speech-to-text and language translation on-device. Tests show particularly strong results for English to European languages, including Spanish, French, Italian, and Portuguese.
On the visual front, MobileNet-V5, Google's latest lightweight vision encoder, powers Gemma 3n's image and video analysis features. It supports real-time video streams up to 60 frames per second, with better accuracy and speed than previous models—all while consuming less power.
To encourage innovation, Google is offering access to the model via tools such as Hugging Face Transformers, Ollama, MLX, and It also launched the Gemma 3n Impact Challenge, where developers can compete for a share of a $150,000 prize pool by building practical offline AI applications.
What truly sets Gemma 3n apart is its ability to run entirely offline, which is a game-changer for privacy-focused applications or regions with limited internet access. It supports content understanding in 35 languages and includes support for over 140 languages overall.
With Gemma 3n, Google is setting a new benchmark for what AI can achieve on mobile and edge devices, without needing the cloud.