Latest news with #MatFormer

Meet Gemma 3n: Google's lightweight AI model that works offline with just 2GB RAM

Time of India

3 hours ago

Business
Time of India

Meet Gemma 3n: Google's lightweight AI model that works offline with just 2GB RAM

Google has officially rolled out Gemma 3n, its latest on-device AI model first teased back in May 2025. What makes this launch exciting is that Gemma 3n brings full-scale multimodal processing think audio, video, image, and text straight to smartphones and edge devices, all without needing constant internet or heavy cloud support. It's a big step forward for developers looking to bring powerful AI features to low-power devices running on limited memory. At the core of Gemma 3n is a new architecture called MatFormer short for Matryoshka Transformer. Think Russian nesting dolls: smaller, fully-functional models tucked inside bigger ones. This clever setup lets developers scale AI performance based on the device's capability. You get two versions E2B runs on just 2GB RAM, and E4B works with around 3GB. Despite packing 5 to 8 billion raw parameters, both versions behave like much smaller models when it comes to resource use. That's thanks to smart design choices like Per-Layer Embeddings (PLE), which shift some of the load from the GPU to the CPU, helping save memory. It also features KV Cache Sharing, which speeds up processing of long audio and video inputs by nearly 2x perfect for real-time use cases like voice assistants and mobile video analysis. by Taboola by Taboola Sponsored Links Sponsored Links Promoted Links Promoted Links You May Like Contribute ToGau Seva At Hare Krishna Mandir Hare krishna Mandir Donate Now — GoogleDeepMind (@GoogleDeepMind) Gemma 3n isn't just light on memory it's stacked with serious capabilities. For speech-based features, it uses an audio encoder adapted from Google's Universal Speech Model, which means it can handle speech-to-text and even language translation directly on your phone. It's already showing solid results, especially when translating between English and European languages like Spanish, French, Italian, and Portuguese. On the visual front, it's powered by Google's new MobileNet-V5—a lightweight but powerful vision encoder that can process video at up to 60fps on phones like the Pixel. That means smooth, real-time video analysis without breaking a sweat. And it's not just fast—it's also more accurate than older models. You Might Also Like: Google DeepMind CEO warns of AI's true threat, and it is not your job Developers can plug into Gemma 3n using popular tools like Hugging Face Transformers, Ollama, MLX, and more. Google's also kicked off the Gemma 3n Impact Challenge , offering a $150,000 prize pool for apps that showcase the model's offline magic. The best part? Gemma 3n runs entirely offline. No cloud, no connection just pure on-device AI. With support for over 140 languages and the ability to understand content in 35, it's a game-changer for building AI apps where connectivity is patchy or privacy is a priority. Here's how you can try it out - Want to try Gemma 3n for yourself? Here's how you can get started: You Might Also Like: DeepMind scientist calls LLMs 'exotic mind-like entities': Why the future of AI needs a new vocabulary? Experiment instantly – Head over to Google AI Studio, where you can play around with Gemma 3n in just a few clicks. You can even deploy it directly to Cloud Run from there. Download the model – Prefer working locally? You'll find the model weights available on Hugging Face and Kaggle. Dive into the docs – Google's got solid documentation to help you integrate Gemma into your workflow. Start with inference, fine-tuning, or build from scratch. Use your favorite tools – Whether you're into Ollama, MLX, Docker, or Google's AI Edge Gallery—Gemma 3n fits right in. Bring your own dev stack – Already using Hugging Face Transformers, TRL, NVIDIA NeMo, Unsloth, or LMStudio? You're covered. Deploy it your way – Push to production with options like Google GenAI API, Vertex AI, SGLang, vLLM, or even the NVIDIA API Catalog.

Google Unveils Gemma 3n: Advanced Offline AI Model for Phones with Just 2GB RAM

Hans India

10 hours ago

Hans India

Google Unveils Gemma 3n: Advanced Offline AI Model for Phones with Just 2GB RAM

Google has officially launched Gemma 3n, its latest on-device AI model that's designed to run seamlessly even on smartphones with as little as 2GB of memory—and it doesn't need an internet connection to function. First teased in May 2025, the model is now available for developers worldwide. Gemma 3n stands out by supporting multimodal input, including text, audio, image, and video, all processed directly on low-power devices like smartphones and edge devices. This allows real-time, AI-driven features previously reliant on cloud computing to now be executed locally. At its core is MatFormer—short for Matryoshka Transformer—Google's innovative architecture that mirrors the structure of Russian nesting dolls. According to the company, this design enables each model to contain smaller, independent models, allowing performance to scale according to device capability. Gemma 3n is being offered in two variants: E2B, optimized for devices with 2GB of RAM E4B, designed for those with 3GB RAM Despite comprising 5 to 8 billion parameters, both versions are optimized for efficient operation. A key innovation here is Per-Layer Embeddings (PLE), which help shift processing tasks from the device's GPU to the CPU, conserving memory while maintaining speed. In addition, KV Cache Sharing allows for much faster processing of lengthy audio and video files. Google says this enhancement doubles the model's responsiveness, making it ideal for applications like voice assistants and live video analysis on the go. For audio capabilities, Gemma 3n integrates a modified version of Google's Universal Speech Model. This enables it to support features like speech-to-text and language translation on-device. Tests show particularly strong results for English to European languages, including Spanish, French, Italian, and Portuguese. On the visual front, MobileNet-V5, Google's latest lightweight vision encoder, powers Gemma 3n's image and video analysis features. It supports real-time video streams up to 60 frames per second, with better accuracy and speed than previous models—all while consuming less power. To encourage innovation, Google is offering access to the model via tools such as Hugging Face Transformers, Ollama, MLX, and It also launched the Gemma 3n Impact Challenge, where developers can compete for a share of a $150,000 prize pool by building practical offline AI applications. What truly sets Gemma 3n apart is its ability to run entirely offline, which is a game-changer for privacy-focused applications or regions with limited internet access. It supports content understanding in 35 languages and includes support for over 140 languages overall. With Gemma 3n, Google is setting a new benchmark for what AI can achieve on mobile and edge devices, without needing the cloud.

Google launches Gemma 3n, multimodal Open Source AI model that runs on just 2GB RAM without internet

India Today

12 hours ago

Business
India Today

Google launches Gemma 3n, multimodal Open Source AI model that runs on just 2GB RAM without internet

Google has announced the full launch of its latest on-device AI model, Gemma 3n, which was first announced in May 2025. The AI model brings advanced multimodal capabilities, including audio, image, video and text processing, to smartphones and edge devices with limited memory and no internet connection. With this release, developers can now deploy AI features that used to require powerful cloud infrastructure, directly on phones and low-power the heart of Gemma 3n is a new architecture called MatFormer, short for Matryoshka Transformer. Google explains that much like Russian nesting dolls, the model includes smaller, fully-functional sub-models inside larger ones. This design makes it easy for developers to scale performance based on available hardware. For example, Gemma 3n is available in two versions: E2B, which operates on as little as 2GB of memory, and E4B, which requires about having 5 to 8 billion raw parameters, both models perform like much smaller models in terms of resource use. This efficiency comes from innovations like Per-Layer Embeddings (PLE), which shift some of the workload from the phone's graphics processor to its central processor, freeing up valuable 3n also introduces KV Cache Sharing, which significantly speeds up how quickly the model processes long audio and video inputs. Google says this improves response times by up to two times, making real-time applications like voice assistants or video analysis much faster and more practical on mobile For speech-based features, Gemma 3n includes a built-in audio encoder adapted from Google's Universal Speech Model. This allows it to perform tasks like speech-to-text and language translation directly on a phone. Early tests have shown especially strong results when translating between English and European languages like Spanish, French, Italian, and visual side of Gemma 3n is powered by MobileNet-V5, Google's new lightweight vision encoder. This system can handle video streams up to 60 frames per second on devices like the Google Pixel, enabling smooth real-time video analysis. Despite being smaller and faster, it outperforms previous vision models in both speed and can access Gemma 3n via popular tools like Hugging Face Transformers, Ollama, MLX, and others. Google has also launched the "Gemma 3n Impact Challenge," inviting developers to create applications using the model's offline capabilities. Winners will share a $150,000 prize the model can operate entirely offline, meaning it doesn't need an internet connection to work. This opens the door for AI-powered apps in remote areas or privacy-sensitive situations where cloud-based models aren't viable. With support for over 140 languages and the ability to understand content in 35, Gemma 3n sets a new standard for efficient, accessible on-device AI. - Ends

Latest news with #MatFormer

Meet Gemma 3n: Google's lightweight AI model that works offline with just 2GB RAM

Google Unveils Gemma 3n: Advanced Offline AI Model for Phones with Just 2GB RAM

Google launches Gemma 3n, multimodal Open Source AI model that runs on just 2GB RAM without internet

Get Started Now: Download the App