
Google unveils Ironwood TPU for AI inference tasks
Google has announced the introduction of Ironwood, its seventh-generation Tensor Processing Unit (TPU), which is the first designed specifically for AI inference tasks.
According to Amin Vahdat, Vice President & General Manager of ML, Systems, and Cloud AI at Google, "Ironwood is our most powerful, capable and energy efficient TPU yet. And it's purpose-built to power thinking, inferential AI models at scale." With this addition, Google aims to transition from responsive AI models to proactive systems that generate insights and interpretations collaboratively.
The development of Ironwood marks a shift towards the "age of inference" in AI technology, where AI systems actively generate and provide insights rather than just data. It is designed to manage the growing needs of generative AI, which demand significant computational and communication resources, scaling up to 9,216 liquid-cooled chips linked via an Inter-Chip Interconnect (ICI) network, consuming nearly 10 megawatts (MW).
Ironwood forms part of Google's Cloud AI Hypercomputer architecture, focusing on optimizing hardware and software integration for AI workloads. It allows developers to use Google's Pathways software stack to harness the computing power of multiple Ironwood TPUs efficiently.
Amin Vahdat elaborates on the capabilities of Ironwood, stating, "Ironwood is designed to gracefully manage the complex computation and communication demands of 'thinking models', which encompass Large Language Models (LLMs), Mixture of Experts (MoEs) and advanced reasoning tasks. These models require massive parallel processing and efficient memory access."
Google Cloud customers can opt for Ironwood configurations with either 256 or 9,216 chips, catering to different AI workload requirements. When scaled to its maximum configuration, Ironwood provides 42.5 Exaflops, significantly outdoing the current largest supercomputer, El Capitan, which offers 1.7 Exaflops per pod.
This represents a substantial leap in AI capabilities, facilitating the most demanding AI workloads like large dense LLM or MoE models with advanced reasoning features. Each Ironwood chip is capable of delivering peak compute performance of 4,614 TFLOPs, supported by a memory and network architecture designed for optimum data availability and performance.
Ironwood also enhances the SparseCore feature to handle complex processing of ultra-large embeddings, which are common in advanced ranking and recommendation workloads. This capability extends beyond traditional AI applications into financial and scientific realms.
Google's ML runtime, Pathways, promotes efficient distributed computing across TPU chips, making large-scale AI computation more accessible. It facilitates the use of hundreds of thousands of Ironwood chips in combination, advancing the boundaries of generative AI computation.
Ironwood showcases major improvements in power efficiency, boasting a 2x improvement over Google's previous sixth-generation TPU, Trillium, and being nearly 30 times more power efficient than their first TPUs from 2018. The improvements include enhanced High Bandwidth Memory (HBM) capacity of 192 GB per chip and 7.2 Terabits per second (Tbps) bandwidth per chip, crucial for memory-intensive AI workloads.
The ICI bandwidth has been increased to 1.2 Tbps bidirectional, facilitating faster inter-chip communication for efficient AI training and inference.
Through these advancements, Google aims to meet the increasing computational demands in AI research and application, maintaining high performance and low latency while improving power usage. As Amin Vahdat concludes, "Ironwood represents a unique breakthrough in the age of inference with increased computation power, memory capacity, ICI networking advancements and reliability." Leading AI models like Gemini 2.5 and AlphaFold are currently leveraging TPU technology, and Google anticipates further developments once Ironwood is broadly available.