Latest news with #Gemma3

From 2GB to 1TB: How to Maximize AI on Any Local Desktop Setup

Geeky Gadgets

27-05-2025

Business
Geeky Gadgets

From 2GB to 1TB: How to Maximize AI on Any Local Desktop Setup

What if your local desktop could rival the power of a supercomputer? As AI continues its meteoric rise, the ability to run complex models locally—on setups ranging from modest 2GB systems to innovative machines with a staggering 1TB of memory—is no longer a distant dream. But here's the catch: not all hardware is created equal, and choosing the wrong configuration could leave you stuck with sluggish performance or wasted potential. From lightweight models like Gemma3 to the resource-hungry Deepseek R1, the gap between what your hardware can handle and what your AI ambitions demand is wider than ever. So, how do you navigate this rapidly evolving landscape and make the most of your setup? This comprehensive comparison by Dave, unpacks the hidden trade-offs of running AI locally, from the surprising efficiency of entry-level systems to the jaw-dropping capabilities of high-end configurations. You'll discover how memory, GPUs, and CPUs shape the performance of AI workloads, and why token generation speed could be the metric that transforms your workflow. Whether you're a curious hobbyist or a professional looking to optimize large-scale deployments, this deep dive will help you decode the hardware puzzle and unlock the full potential of local desktop AI. After all, the future of AI isn't just in the cloud—it's sitting right on your desk. Optimizing AI on Desktops Why Run AI Models Locally? Running AI models on local hardware offers several distinct advantages over cloud-based solutions. It provides greater control over data, making sure privacy and security, while also reducing long-term costs associated with cloud subscriptions. Additionally, local deployment eliminates latency issues, allowing faster processing for time-sensitive tasks. However, the success of local AI deployment depends heavily on aligning your hardware's specifications with the demands of the AI models you intend to use. For instance, lightweight models like Gemma3 can operate effectively on systems with minimal resources, making them ideal for basic applications. In contrast, advanced models such as Deepseek R1 require robust setups equipped with substantial memory and processing power to function efficiently. Understanding these requirements is essential for achieving optimal performance. The Role of Memory in AI Performance Memory capacity plays a pivotal role in determining the performance of AI models. Tests conducted on systems ranging from 2GB to 1TB of memory reveal significant trade-offs between cost, speed, and scalability. Here's how different setups compare: 2GB systems: These are suitable for lightweight tasks such as license plate recognition or basic image classification. However, they struggle with larger, more complex models due to limited memory bandwidth. These are suitable for lightweight tasks such as license plate recognition or basic image classification. However, they struggle with larger, more complex models due to limited memory bandwidth. 8GB systems: Capable of handling mid-sized models, these setups offer moderate performance but experience slower token generation speeds, particularly with larger datasets. Capable of handling mid-sized models, these setups offer moderate performance but experience slower token generation speeds, particularly with larger datasets. 128GB and above: High-memory configurations excel at running advanced models, offering faster processing speeds and greater scalability for demanding workloads. One critical metric to consider is token generation speed, which improves significantly with higher memory configurations. Systems with more memory are better equipped to process large datasets and execute complex models, making them indispensable for tasks such as natural language processing, image generation, and predictive analytics. Local Desktop AI Compared : 2GB to 1024GB Watch this video on YouTube. Dive deeper into AI models with other articles and guides we have written below. Hardware Configurations: Matching Systems to Workloads Different hardware configurations cater to varying AI workloads, and selecting the right setup is crucial for achieving efficient performance. Below is a breakdown of how various configurations perform: Low-end systems: Devices like the Jetson Orin Nano (2GB RAM) are limited to lightweight models and basic applications, such as object detection or simple automation tasks. Devices like the Jetson Orin Nano (2GB RAM) are limited to lightweight models and basic applications, such as object detection or simple automation tasks. Mid-range GPUs: Options such as the Tesla P40 (8GB RAM) and RTX 6000 ADA (48GB RAM) strike a balance between cost and performance. These systems can handle larger models with moderate efficiency, making them suitable for small to medium-scale AI projects. Options such as the Tesla P40 (8GB RAM) and RTX 6000 ADA (48GB RAM) strike a balance between cost and performance. These systems can handle larger models with moderate efficiency, making them suitable for small to medium-scale AI projects. High-end systems: Machines like the Apple M2 Mac Pro (128GB RAM) and 512GB Mac M4 are designed for advanced models like Deepseek R1. These setups provide the memory and processing power needed for large-scale AI workloads, including deep learning and complex simulations. CPU-only setups, while less common, can also support massive models when paired with extensive memory. For example, systems equipped with 1TB of RAM can handle computationally intensive tasks, though they may lack the speed and efficiency of GPU-accelerated configurations. This highlights the importance of matching hardware capabilities to the specific computational demands of your AI tasks. AI Models: Size and Complexity Matter The size and complexity of AI models are key factors influencing their hardware requirements. Smaller models, such as Gemma3 with 1 billion parameters, are well-suited for low-memory setups and can perform tasks like text summarization or basic image recognition. These models are ideal for users with limited hardware resources or those seeking cost-effective solutions. In contrast, larger models like Deepseek R1, which scale up to 671 billion parameters, demand high-memory systems and advanced GPUs or CPUs to function efficiently. These models are designed for tasks requiring significant computational power, such as advanced natural language understanding, generative AI, and large-scale data analysis. The disparity in hardware requirements underscores the importance of tailoring your setup to the specific needs of your AI applications. Key Performance Insights Testing AI models across various hardware configurations has revealed several critical insights that can guide your decision-making: Memory capacity: Higher memory directly correlates with improved processing speed and scalability, making it a crucial factor for running complex models. Higher memory directly correlates with improved processing speed and scalability, making it a crucial factor for running complex models. Unified memory architecture: Found in Apple systems, this feature enhances AI workloads by allowing seamless access to shared memory resources, improving overall efficiency. Found in Apple systems, this feature enhances AI workloads by allowing seamless access to shared memory resources, improving overall efficiency. Consumer-grade hardware: While affordable, these systems often struggle with large-scale models due to limitations in memory and processing power, making them less suitable for demanding applications. These findings emphasize the need to carefully evaluate your hardware options based on the size, complexity, and computational demands of your AI tasks. Optimizing Local AI Deployment To achieve efficient and cost-effective AI performance on local desktop hardware, consider the following strategies: Ensure your hardware configuration matches the size and complexity of the AI models you plan to run. This alignment is critical for avoiding performance bottlenecks. Use tools like Olama to simplify the process of downloading, configuring, and running AI models locally. These tools can streamline deployment and reduce setup time. to simplify the process of downloading, configuring, and running AI models locally. These tools can streamline deployment and reduce setup time. Invest in high-memory systems if your workload involves large-scale models or extensive data processing. While the upfront cost may be higher, the long-term benefits in performance and scalability are significant. By following these recommendations, you can maximize the performance of your local AI deployments while staying within budget and making sure efficient resource utilization. Challenges and Future Developments Despite recent advancements, consumer hardware still faces limitations when supporting the largest AI models. Memory constraints, processing speed, and scalability remain significant challenges, particularly for users with budget-friendly setups. However, ongoing developments in GPUs, CPUs, and memory architectures are expected to address these issues, paving the way for more powerful and accessible AI systems. Emerging technologies, such as quantum computing and next-generation GPUs, hold the potential to transform local AI deployment. These advancements promise to deliver unprecedented processing power and efficiency, allowing broader adoption of AI across industries and applications. Media Credit: Dave's Garage Filed Under: AI, Guides Latest Geeky Gadgets Deals Disclosure: Some of our articles include affiliate links. If you buy something through one of these links, Geeky Gadgets may earn an affiliate commission. Learn about our Disclosure Policy.

Gemma 3n: All about Google's open model for on-device AI on phones, laptops

Business Standard

22-05-2025

Business
Business Standard

Gemma 3n: All about Google's open model for on-device AI on phones, laptops

At its annual Google I/O conference, Google unveiled the Gemma 3n, a new addition to its Gemma 3 series of open AI models. The company said that the model is designed to run efficiently on everyday devices like smartphones, laptops, and tablets. Gemma 3n shares its architecture with the upcoming generation of Gemini Nano, the lightweight AI model that already powers several on-device AI features on Android devices such as voice recorder summaries on Pixel smartphones. Gemma 3n model: Details Google says Gemma 3n makes use of a new technique called Per-Layer Embeddings (PLE), which allows the model to consume much less RAM than similarly sized models. Although the model has 5 billion and 8 billion parameters (5B and 8B), this new memory optimisation brings its RAM usage closer to that of a 2B or 4B model. In practical terms, this means Gemma 3n can run with just 2GB to 3GB of RAM, making it viable for a much wider range of devices. Gemma 3n model: Key capabilities Audio input: The model can process sound-based data, enabling applications like speech recognition, language translation, and audio analysis. Multimodal input: With support for visual, text, and audio inputs, the model can handle complex tasks that involve combining different types of data. Broad language support: Google said that the model is trained in over 140 languages. 32K token context window: Gemma 3n supports input sequences up to 32,000 tokens, allowing it to handle large chunks of data in one go—useful for summarising long documents or performing multi-step reasoning. PLE caching: The model's internal components (embeddings) can be stored temporarily in fast local storage (like the device's SSD), helping reduce the RAM needed during repeated use. Conditional parameter loading: If a task doesn't require audio or visual capabilities, the model can skip loading those parts, saving memory and speeding up performance. Gemma 3n model: Availability As part of the Gemma open model family, Gemma 3n is provided with accessible weights and licensed for commercial use, allowing developers to tune, adapt, and deploy it across a variety of applications. Gemma 3n is now available as a preview in Google AI Studio.

Gemma 3n AI model brings real-time multimodal power to mobiles

Techday NZ

22-05-2025

Business
Techday NZ

Gemma 3n AI model brings real-time multimodal power to mobiles

Gemma 3n, a new artificial intelligence model architected for mobile and on-device computing, has been introduced as an early preview for developers. Developed in partnership with mobile hardware manufacturers, Gemma 3n is designed to support real-time, multimodal AI experiences on phones, tablets, and laptops. The model extends the capabilities of the Gemma 3 family by focusing on performance and privacy in mobile scenarios. The new architecture features collaboration with companies such as Qualcomm Technologies, MediaTek, and Samsung System LSI. The objective is to optimise the model for fast, responsive AI that can operate directly on device, rather than relying on cloud computing. This marks an extension of the Gemma initiative towards enabling AI applications in everyday devices, utilising a shared foundation that will underpin future releases across platforms like Android and Chrome. According to information provided, Gemma 3n is also the core of the next generation of Gemini Nano, which is scheduled for broader release later in the year, bringing expanded AI features to Google apps and the wider on-device ecosystem. Developers can begin working with Gemma 3n today as part of the early preview, helping them to build and experiment with local AI functionalities ahead of general availability. The model has performed strongly in chatbot benchmark rankings. One chart included in the announcement ranks AI models by Chatbot Arena Elo scores, with Gemma 3n noted as ranking highly amongst both popular proprietary and open models. Another chart demonstrates the model's mix-and-match performance with respect to model size. Gemma 3n benefits from Google DeepMind's Per-Layer Embeddings (PLE) innovation, which leads to substantial reductions in RAM requirements. The model is available in 5 billion and 8 billion parameter versions, but, according to the release, it can operate with a memory footprint comparable to much smaller models—2 billion and 4 billion parameters—enabling operation with as little as 2GB to 3GB of dynamic memory. This allows the use of larger AI models on mobile devices or via cloud streaming, where memory overhead is often a constraint. The company states, "Gemma 3n leverages a Google DeepMind innovation called Per-Layer Embeddings (PLE) that delivers a significant reduction in RAM usage. While the raw parameter count is 5B and 8B, this innovation allows you to run larger models on mobile devices or live-stream from the cloud, with a memory overhead comparable to a 2B and 4B model, meaning the models can operate with a dynamic memory footprint of just 2GB and 3GB." Additional technical features of Gemma 3n include optimisations that allow the model to respond approximately 1.5 times faster on mobile devices compared to previous Gemma versions, with improved output quality and lower memory usage. The announcement highlights innovations such as Per Layer Embeddings, KVC sharing, and advanced activation quantisation as contributing to these improvements. The model also supports what the company calls "many-in-1 flexibility." Utilizing a 4B active memory footprint, Gemma 3n incorporates a nested 2B active memory footprint submodel through the MatFormer training process. This design allows developers to balance performance and quality needs without operating separate models, composing submodels on the fly to match a specific application's requirements. Upcoming technical documentation is expected to elaborate on this mix-and-match capability. Security and privacy are also prioritised. The development team states that local execution "enables features that respect user privacy and function reliably, even without an internet connection." Gemma 3n brings enhanced multimodal comprehension, supporting the integration and understanding of audio, text, images, and video. Its audio functionality supports high-quality automatic speech recognition and multilingual translation. Furthermore, the model can accept inputs in multiple modalities simultaneously, enabling the parsing of complex multimodal interactions. The company describes the expansion in audio capabilities: "Its audio capabilities enable the model to perform high-quality Automatic Speech Recognition (transcription) and Translation (speech to translated text). Additionally, the model accepts interleaved inputs across modalities, enabling understanding of complex multimodal interactions." A public release of these features is planned for the near future. Gemma 3n features improved performance in multiple languages, with notable gains in Japanese, German, Korean, Spanish, and French. This is reflected in benchmark scores such as a 50.1% result on WMT24++ (ChrF), a multilingual evaluation metric. The team behind Gemma 3n views the model as a catalyst for "intelligent, on-the-go applications." They note that developers will be able to "build live, interactive experiences that understand and respond to real-time visual and auditory cues from the user's environment," and design advanced applications capable of real-time speech transcription, translation, and multimodal contextual text generation, all executed privately on the device. The company also outlined its commitment to responsible development. "Our commitment to responsible AI development is paramount. Gemma 3n, like all Gemma models, underwent rigorous safety evaluations, data governance, and fine-tuning alignment with our safety policies. We approach open models with careful risk assessment, continually refining our practices as the AI landscape evolves." Developers have two initial routes for experimentation: exploring Gemma 3n via a cloud interface in Google AI Studio using browser-based access, or integrating the model locally through Google AI Edge's suite of developer tools. These options enable immediate testing of Gemma 3n's text and image processing capabilities. The announcement states: "Gemma 3n marks the next step in democratizing access to cutting-edge, efficient AI. We're incredibly excited to see what you'll build as we make this technology progressively available, starting with today's preview."

Red Hat & Google Cloud extend partnership for AI innovation

Techday NZ

21-05-2025

Business
Techday NZ

Red Hat & Google Cloud extend partnership for AI innovation

Red Hat and Google Cloud have agreed to extend their partnership to focus on advancing artificial intelligence (AI) for enterprises, specifically with new developments in open and agentic AI solutions. The collaboration will bring together Red Hat's open source technologies and Google Cloud's infrastructure, along with Google's Gemma family of open AI models. This initiative aims to offer cost-effective AI inference and greater hardware choices for businesses deploying generative AI at scale. Brian Stevens, Senior Vice President and Chief Technology Officer – AI, Red Hat said, "With this extended collaboration, Red Hat and Google Cloud are committed to driving groundbreaking AI innovations with our combined expertise and platforms. Bringing the power of vLLM and Red Hat open source technologies to Google Cloud and Google's Gemma equips developers with the resources they need to build more accurate, high-performing AI solutions, powered by optimized inference capabilities." The latest phase of the alliance will see the companies launch the llm-d open source project, with Google acting as a founding contributor. This project is intended to facilitate scalable and efficient AI inference across diverse computing environments. Red Hat is introducing the project as a response to enterprise challenges, such as the growing complexity of AI ecosystems and the need for distributed computing strategies. The companies have also announced that support for vLLM, an open source inference server used to speed up generative AI outputs, will be enabled on Google Cloud's Tensor Processing Units (TPUs) and GPU-based virtual machines. Google Cloud's TPUs, which are already a part of Google's own AI infrastructure, will now be accessible to developers using vLLM, allowing for improved performance and resource efficiency for fast and accurate inference. Red Hat will be among the earliest testers for Google's new open model Gemma 3, and it will provide 'Day 0' support for vLLM on Gemma 3 model distributions. This is part of Red Hat's broader efforts as a commercial contributor to the vLLM project, focusing on more cost-effective and responsive platforms for generative AI applications. The collaboration also includes the availability of Red Hat AI Inference Server on Google Cloud. This enterprise distribution of vLLM helps companies scale and optimise AI model inference within hybrid cloud environments. The integration with Google Cloud enables enterprises to deploy generative AI models that are ready for production and can deliver cost and responsiveness efficiencies at scale. Supporting community-driven AI development, Red Hat will join Google as a contributor to the Agent2Agent (A2A) protocol, an application-level protocol designed to enable communication between agents or end-users across different platforms and cloud environments. Through the A2A ecosystem, Red Hat aims to promote new ways to accelerate innovation and enhance the effectiveness of AI workflows through agentic AI. Mark Lohmeyer, Vice President and General Manager, AI and Computing Infrastructure, Google Cloud, commented, "The deepening of our collaboration with Red Hat is driven by our shared commitment to foster open innovation and bring the full potential of AI to our customers. As we enter a new age of AI inference, together we are paving the way for organisations to more effectively scale AI inference and enable agentic AI with the necessary cost-efficiency and high performance." The llm-d project builds upon the established vLLM community, aiming to create a foundation for generative AI inference that can adapt to the demands of large-scale enterprises while facilitating innovation and cost management. The intention is to enable AI workload scalability across different resource types and enhance workload efficiency. These initiatives highlight the companies' collective effort to offer business users production-ready, scalable, and efficient AI solutions powered by open source technologies and robust infrastructure options.

Study accuses LM Arena of helping top AI labs game its benchmark

Yahoo

01-05-2025

Business
Yahoo

Study accuses LM Arena of helping top AI labs game its benchmark

A new paper from AI lab Cohere, Stanford, MIT, and Ai2 accuses LM Arena, the organization behind the popular crowdsourced AI benchmark Chatbot Arena, of helping a select group of AI companies achieve better leaderboard scores at the expense of rivals. According to the authors, LM Arena allowed some industry-leading AI companies like Meta, OpenAI, Google, and Amazon to privately test several variants of AI models, then not publish the scores of the lowest performers. This made it easier for these companies to achieve a top spot on the platform's leaderboard, though the opportunity was not afforded to every firm, the authors say. "Only a handful of [companies] were told that this private testing was available, and the amount of private testing that some [companies] received is just so much more than others," said Cohere's VP of AI research and co-author of the study, Sara Hooker, in an interview with TechCrunch. "This is gamification." Created in 2023 as an academic research project out of UC Berkeley, Chatbot Arena has become a go-to benchmark for AI companies. It works by putting answers from two different AI models side-by-side in a "battle," and asking users to choose the best one. It's not uncommon to see unreleased models competing in the arena under a pseudonym. Votes over time contribute to a model's score — and, consequently, its placement on the Chatbot Arena leaderboard. While many commercial actors participate in Chatbot Arena, LM Arena has long maintained that its benchmark is an impartial and fair one. However, that's not what the paper's authors say they uncovered. One AI company, Meta, was able to privately test 27 model variants on Chatbot Arena between January and March leading up to the tech giant's Llama 4 release, the authors allege. At launch, Meta only publicly revealed the score of a single model — a model that happened to rank near the top of the Chatbot Arena leaderboard. In an email to TechCrunch, LM Arena Co-Founder and UC Berkeley Professor Ion Stoica said that the study was full of "inaccuracies" and "questionable analysis." "We are committed to fair, community-driven evaluations, and invite all model providers to submit more models for testing and to improve their performance on human preference," said LM Arena in a statement provided to TechCrunch. "If a model provider chooses to submit more tests than another model provider, this does not mean the second model provider is treated unfairly." Armand Joulin, a principal researcher at Google DeepMind, also noted in a post on X that some of the study's numbers were inaccurate, claiming Google only sent one Gemma 3 AI model to LM Arena for pre-release testing. Hooker responded to Joulin on X, promising the authors would make a correction. The paper's authors started conducting their research in November 2024 after learning that some AI companies were possibly being given preferential access to Chatbot Arena. In total, they measured more than 2.8 million Chatbot Arena battles over a five-month stretch. The authors say they found evidence that LM Arena allowed certain AI companies, including Meta, OpenAI, and Google, to collect more data from Chatbot Arena by having their models appear in a higher number of model "battles." This increased sampling rate gave these companies an unfair advantage, the authors allege. Using additional data from LM Arena could improve a model's performance on Arena Hard, another benchmark LM Arena maintains, by 112%. However, LM Arena said in a post on X that Arena Hard performance does not directly correlate to Chatbot Arena performance. Hooker said it's unclear how certain AI companies might've received priority access, but that it's incumbent on LM Arena to increase its transparency regardless. In a post on X, LM Arena said that several of the claims in the paper don't reflect reality. The organization pointed to a blog post it published earlier this week indicating that models from non-major labs appear in more Chatbot Arena battles than the study suggests. One important limitation of the study is that it relied on "self-identification" to determine which AI models were in private testing on Chatbot Arena. The authors prompted AI models several times about their company of origin, and relied on the models' answers to classify them — a method that isn't foolproof. However, Hooker said that when the authors reached out to LM Arena to share their preliminary findings, the organization didn't dispute them. TechCrunch reached out to Meta, Google, OpenAI, and Amazon — all of which were mentioned in the study — for comment. None immediately responded. In the paper, the authors call on LM Arena to implement a number of changes aimed at making Chatbot Arena more "fair." For example, the authors say, LM Arena could set a clear and transparent limit on the number of private tests AI labs can conduct, and publicly disclose scores from these tests. In a post on X, LM Arena rejected these suggestions, claiming it has published information on pre-release testing since March 2024. The benchmarking organization also said it "makes no sense to show scores for pre-release models which are not publicly available," because the AI community cannot test the models for themselves. The researchers also say LM Arena could adjust Chatbot Arena's sampling rate to ensure that all models in the arena appear in the same number of battles. LM Arena has been receptive to this recommendation publicly, and indicated that it'll create a new sampling algorithm. The paper comes weeks after Meta was caught gaming benchmarks in Chatbot Arena around the launch of its above-mentioned Llama 4 models. Meta optimized one of the Llama 4 models for 'conversationality,' which helped it achieve an impressive score on Chatbot Arena's leaderboard. But the company never released the optimized model — and the vanilla version ended up performing much worse on Chatbot Arena. At the time, LM Arena said Meta should have been more transparent in its approach to benchmarking. Earlier this month, LM Arena announced it was launching a company, with plans to raise capital from investors. The study increases scrutiny on private benchmark organization's — and whether they can be trusted to assess AI models without corporate influence clouding the process. This article originally appeared on TechCrunch at Error in retrieving data Sign in to access your portfolio Error in retrieving data Error in retrieving data Error in retrieving data Error in retrieving data