Latest news with #IronwoodTPU

Google Cloud CTO on LLM Use Cases: Tech Disruptors

Bloomberg

06-05-2025

Business
Bloomberg

Google Cloud CTO on LLM Use Cases: Tech Disruptors

Will Grannis, CTO of Google Cloud sits down with Bloomberg Intelligence's Mandeep Singh to talk about the variety of use cases with LLM deployments. They discuss the company's Ironwood TPU launch to give an end to end stack perspective around deploying inferencing workloads and what it means for cloud demand.

Google Cloud Gets More Serious About Infrastructure At Next 2025

Forbes

24-04-2025

Business
Forbes

Google Cloud Gets More Serious About Infrastructure At Next 2025

Google Cloud was especially bold in its competitive positioning against AWS at the Google Cloud Next ... More 2025 conference. Here, Mark Lohmeyer, vice president and general manager of AI and computing infrastructure at Google Cloud, presents head-to-head comparisons. This month's Google Cloud Next 2025 event was an excellent reference point for how far Google Cloud has come since CEO Thomas Kurian took the helm of the business at the start of 2019. Back then, Google Cloud had about $6 billion in revenue and was losing a ton of money; six years later, it's nearing a $50 billion annual run rate, and it's profitable. I remember that when Kurian started, early odds were that Google would get out of the cloud service business altogether — yet here we are. Typically for this conference, there was so much announced that I can't cover it all here. (Among the many progress stats that Kurian cited onstage: the business shipped more than 3,000 product advances in 2024.) For deeper dives into specific areas, see the articles from my colleagues Matt Kimball on the new Ironwood TPU chip, Jason Andersen on Google's approach to selling enterprise AI (especially agents) and Melody Brue on the company's approach to the connected future of AI in the workplace. Our colleague Robert Kramer also wrote an excellent preview of the event that still makes good background reading. What I want to focus on here are Next 25's most interesting developments in connectivity, infrastructure and AI. (Note: Google is an advisory client of my firm, Moor Insights & Strategy.) Kurian placed a strong focus on connectivity, specifically with the company's new Cloud WAN and Cloud Interconnect offerings. Cloud WAN makes the most of Google's network, which the company rightly calls 'planet-scale,' to deliver faster performance than the public internet (40% faster, according to the company) that's also significantly cheaper than enterprise WANs (with a claimed 40% lower TCO). Meanwhile, Cloud Interconnect is built to connect your own enterprise network to Google's — or even to your network hosted by a different CSP — with high availability and low latency. Interestingly, in the analyst readout at the conference, Kurian started off with networking, which highlights its importance to Google. This makes sense, as enterprises are all bought into the hybrid multicloud and the growing need to connect all those datacenters, whether public or private cloud. This went hand in hand with a lot of discussion about new infrastructure. For context, all of the hyperscalers have announced extra-large capex investments in infrastructure for this year, with Google weighing it at $75 billion. The presentations at Next 25 showed where a good chunk of that money is going. I'll talk more below about the infrastructure investments specific to AI, starting with the Ironwood TPU chip and AI Hypercomputer. For now I want to note that the infrastructure plays also include networking offload, new storage options, a new CPU . . . It's a long list, all aimed at supporting Google Cloud's strategy of combining hardware and software to enable bigger outputs — especially in AI — at a low price. Make special note of that low price element, which is unusual for Google. I'll come back to that in a minute. Strategically, I think that Google is recognizing that infrastructure as a service is an onramp to PaaS and SaaS services revenue. If you can get people signed on for your IaaS — because, say, you have competitive compute and storage and a planet-scale network that you're allowing them to piggyback on — that opens the door for using a bigger selection of your offerings at the platform level. And while we're at it, why not a PaaS or SaaS approach to handling a bigger slice of your enterprise AI needs? It's a solid move from Google, and I'm intrigued to see how it plays out competitively, especially given that Azure seemed to get serious about IaaS in the past couple of years. It's also notable that Next 25 is the first time I can remember Google Cloud going after AWS on the infrastructure front. As shown in the image accompanying this article, Google touts its Arm-based Axion CPU as outperforming the competing Arm-based processor from AWS, Graviton. In the Mark Lohmeyer breakout session, there was a lot of specific discussion of AWS Trainium chips, too. I'm a fan of stiff competition, so it's refreshing to see Google getting more aggressive with this. It's about time. Considering all the years I spent in the semiconductor industry, it's no surprise that my ears perked up at the announcement of Google's seventh-generation Ironwood tensor processing unit, which comes out later this year. (I wish Google had been more specific about when we can expect it, but so far it's just 'later in 2025.') Google was a pioneer in this area, and this TPU is miles ahead of its predecessors in performance, energy efficiency, interconnect and so on. My colleague Matt Kimball has analyzed Ironwood in detail, so I won't repeat his work here. I will note briefly that Google's Pathways machine-learning runtime can manage distributed workloads across thousands of TPUs, and that Ironwood comes in scale-up pods of 256 chips or 9,216 chips. It also natively supports the vLLM library for inference. vLLM is an accepted abstraction layer that enterprises can comfortably code to for their optionality, and it should allow users to run inference on Ironwood with an appealing price-to-performance profile — yet another instance of combining hardware and software to enable more output at a manageable price. Next 25 was also the enterprise coming-out party for the Gemini 2.5 model, which as I write this is the best AI model in the world according to Hugging Face's Chatbot Arena LLM Leaderboard. The event showcased some impressive visual physics simulations using the model. (Google also put together a modification of The Wizard of Oz for display on the inner surface of The Sphere in Las Vegas. I can be pretty jaded about that kind of thing, but in this case I was genuinely impressed.) I haven't been a big consumer of Google's generative AI products in the past, even though I am a paying customer for Workspace and Gemini. But based on what I saw at the event and what I'm hearing from people in my network about Gemini 2.5, I'm going to give it another try. For now, let's focus on what Google claims for the Gemini 2.0 Flash model, which allows control over how much the model reasons to balance performance and cost. In fact, Google says that Gemini 2.0 Flash achieves intelligence per dollar that's 24x better than GPT-4o and 5x better than DeepSeek-R1. Again, I want to emphasize how unusual the 'per dollar' part is for Google messaging. Assuming the comparison figures are accurate, Google Cloud is able to achieve this by running its own (very smart) models on its new AI Hypercomputer system, which benefits from tailored hardware (including TPUs), software and machine learning frameworks. AI Hypercomputer is designed to allow easy adaptation of hardware so it can make the most of new advances in chips. On a related note, Google says that it will be one of the first adopters of Nvidia's GB200 GPUs. At the keynote, there was also a video of Nvidia CEO Jensen Huang in which he praised the partnership between the two companies and said, 'No company is better at every single layer of computing than Google.' In my view, Google is doing a neat balancing act to reassure the market that it loves Nvidia — while also creating its own wares to deliver better price per outcome. Touting itself for delivering the best intelligence at the lowest cost was not something I expected from Google Cloud. But as I reflect on it, it makes sense. Huang has a point: even though it's a fairly distant third place in the CSP market, Google really is good at every layer of the computing stack. It has the homegrown chips. The performance of its homegrown AI models is outstanding. It understands the (open) software needed to deliver AI for enterprise uses. And it's only getting stronger in infrastructure, as Next 25 emphasized. Now it wants to take this a step further by using Google Distributed Cloud to bring all of that goodness on-premises. Imagine running high-performing Gemini models, Agentspace and so on in your own air-gapped environment to support your enterprise tools and needs. In comparison to this, I thought that the announcements at Next 25 about AI agents were perfectly nice, but not any kind of strategic change or differentiator for the company — at least not yet. To be sure, Google is building out its agent capabilities both internally and with APIs. Its Vertex AI and Agentspace offerings are designed to make it dead-simple for customers to pick models from a massive library, connect to just about any data source and choose from a gallery of agents or roll their own. On top of that, Google's new Agent2Agent open protocol promises to improve agent interoperability, even if the agents are on different frameworks. And as I said during the event, the team deserves credit for its simplicity in communicating about AI agents. So please don't get me wrong: all of this agentic stuff is good. My reservation is that I'm still not convinced that I see any clear differences among any of the horizontal agents offered by Google, AWS or Microsoft. And it's still very early days for agentic AI. I suspect we'll see a lot more changes in this area in the coming year or two. I just haven't seen anything yet that I would describe as an agentic watershed for any of the big CSPs — or as exciting for Google Cloud as the bigger strategic positioning in AI that I'm describing here. At the event, Kurian said that companies work with Google Cloud because it has an open, multi-cloud platform that is fully optimized to help them implement AI. I think that its path forward reflects those strengths. I really like the idea of combining Cloud WAN plus Cloud Interconnect — plus running Gemini on-prem (on high-performing Dell infrastructure) as a managed service. In fact, this may be the embodiment of the true hybrid multicloud vision that I've been talking about for the past 10 years. Why is this so important today? Well, stop me if you've heard me say this before, but something like 70% to 80% of all enterprise data lives on-prem, and the vast majority of it isn't moving to the cloud anytime soon. It doesn't matter if you think it should or if I think it should or if every SaaS vendor in the world thinks it should. What does matter is that for reasons of control, perceived security risks, costs and so on . . . it's just not moving. Yet enterprises still need to activate all that data to get value out of it, and some of the biggest levers available to do that are generative AI and, more and more each day, agentic AI. Google Cloud is in a position to deliver this specific solution — in all its many permutations — for enterprise customers across many industries. It has the hardware, the software and the know-how, and under the direction of Thomas Kurian and his team, it has a track record for smart execution. That's no guarantee of more success against AWS, Microsoft, Oracle and others, but I'll be fascinated to see how it plays out.

Google Cloud's Ironwood TPU Forges Better Enterprise AI

Forbes

15-04-2025

Business
Forbes

Google Cloud's Ironwood TPU Forges Better Enterprise AI

Artificial intelligence infrastructure has emerged as the critical battleground for cloud computing dominance. At this year's Google Cloud Next conference, the company demonstrated its intensified commitment to AI infrastructure, unveiling strategic investments, such as the Ironwood Tensor Processing Units (TPUs), designed to transform enterprise AI deployment across industries. "We're investing in the full stack of AI innovation," stated Sundar Pichai, CEO of Google and Alphabet, who outlined plans to allocate $75 billion in capital expenditure toward this vision. This substantial commitment reflects the scale of investment required to maintain competitive positioning in the rapidly evolving AI infrastructure market. Innovating in AI requires courage and deep pockets. Google Cloud articulated a full stack strategy focused on developing AI-optimized infrastructure spanning three integrated layers: purpose-built hardware, foundation models, and tooling for building and orchestrating multi-agent systems. During the keynote presentation, Google Cloud introduced the Ironwood TPU its seventh-generation Tensor Processing Units (TPUs), representing a significant advancement in AI computational architecture. Cloud Computing infrastructure started as a method of replacing and optimizing on-premise data centers. Today, cloud computing providers are adding specific infrastructure to support new computing requirements introduced with supporting AI. TPUs are specialized processors developed by Google specifically to accelerate AI and machine learning workloads—with particular optimization for deep learning operations. TPUs deliver superior performance-per-dollar compared to general-purpose GPUs or CPUs across numerous machine learning use cases, resulting in reduced infrastructure costs or increased computational capability within existing budget constraints. Google Cloud AI Hypercomputer Architecture Google Cloud Ironwood TPUs represent a cornerstone component of Google Cloud's AI Hypercomputer architecture, which integrates optimized hardware and software components for high-demand AI workloads. The AI Hypercomputer platform constitutes a supercomputing system that combines performance-optimized silicon, open software frameworks, machine learning libraries, and flexible consumption models designed to enhance efficiency throughout the AI lifecycle—from training and tuning to inference and serving. According to Google's technical specifications, these specialized AI processors deliver computational performance that's 3,600 times more powerful and 29 times more energy efficient than the original TPUs launched in 2013. Ironwood also demonstrates a 4-5x performance improvement across multiple operational functions compared to the previous version 6 Trillium TPU architecture. Ironwood implements advanced liquid cooling systems and proprietary high-bandwidth Inter-Chip Interconnect (ICI) technology to create scalable computational units called "pods" that integrate up to 9,216 chips. At maximum pod configuration, Ironwood delivers 24 times the computational capacity of El Capitan, currently ranked as the world's largest supercomputer. To maximize this infrastructure's utility, Google Cloud has developed Pathways, a machine learning runtime created by Google DeepMind that enables efficient distributed computing across multiple TPU chips. Pathways on Google Cloud simplifies scaling beyond individual Ironwood Pods, allowing for the orchestration of hundreds of thousands of Ironwood chips for next-generation AI computational requirements. Google uses Pathways internally to train advanced models such as Gemini and now extends these same distributed computation capabilities to Google Cloud customers. While the industry has witnessed a proliferation of smaller, specialized AI models, significant AI chip innovation remains essential to deliver the performance requirements for supporting advanced reasoning and multimodal models. According to Amin Vahdat, VP/GM of ML, Systems & Cloud AI at Google Cloud, "Ironwood is designed to gracefully manage the complex computation and communication demands of 'thinking models,' which encompass Large Language Models (LLMs), Mixture of Experts (MoEs) and advanced reasoning tasks." This architecture addresses the market requirement for modular, scalable systems that deliver improved performance and accuracy while optimizing both cost efficiency and energy utilization. For enterprises implementing large-scale AI initiatives, Google's hardware advancements translate to quantifiable benefits across three dimensions: Organizations are over the phase of interesting AI proof of concept trials that never make it to production-grade systems. 2025 is the year that organizations expect to deploy use cases with quantifiable business value while laying the foundation for what's next. Google Cloud's enhanced AI infrastructure enables practical enterprise applications today while supporting previously constrained by computational economics or performance limitations. Consider the impact of AI today and tomorrow in: As competition intensifies among cloud infrastructure providers, Google's substantial investment in AI represents a strategic assessment that enterprise computing will increasingly prioritize AI-driven workloads—and that organizations will select platforms offering the optimal combination of performance, cost efficiency, and energy sustainability. The only constant in the AI market will be change. Business leaders must be comfortable with continuously adapting strategies to leverage AI advancements. For CIOs and technology leaders developing their AI implementation roadmaps, Google Cloud's hardware innovations, such as the Ironwood TPU, present technical and economic justifications to reevaluate their infrastructure strategy as AI becomes increasingly central to operational excellence and competitive differentiation.

Latest news with #IronwoodTPU

Google Cloud CTO on LLM Use Cases: Tech Disruptors

Google Cloud Gets More Serious About Infrastructure At Next 2025

Google Cloud's Ironwood TPU Forges Better Enterprise AI

Get Started Now: Download the App