Google unveils the Ironwood TPU, its most powerful AI accelerator yet, delivering massive improvements in inference performance and efficiency.
Last week, Google pulled back the curtain on its latest custom AI accelerator, the Ironwood TPU, showcasing a significant performance improvement for the increasingly demanding world of AI. Announced at Google Cloud Next 25, Ironwood is the seventh generation of Google’s TPUs, specifically engineered to handle the modern AI workloads, particularly in the realm of inference.
Understanding TPUs
Before diving into Ironwood, it’s helpful to understand what TPUs are. Tensor Processing Units are specialized chips developed by Google specifically for accelerating machine learning workloads. Unlike general-purpose CPUs or even GPUs, which are optimized for parallel processing, initially for graphics, TPUs are optimized for the matrix and tensor operations at the heart of neural networks. Historically, Google has offered different TPU versions, often distinguishing between ‘e’ series (focused on efficiency and inference, running pre-trained models) and ‘p’ series (focused on raw performance for training large models).
Introducing Ironwood
The new Ironwood TPU is Google’s most ambitious AI accelerator to date. It’s the company’s first TPU specifically designed for the demands of inference-heavy ‘reasoning models’. Ironwood brings substantial improvements across all key performance metrics compared to its predecessors, including:
TPU v5e | TPU v5p | TPU v6e | TPU v7e | |
BF16 Compute | 197 TFLOPs | 459 TFLOPs | 918 TFLOPs | 2.3 PFLOPs* |
INT8/FP8 Compute | 394 TOPs/TFLOPs* | 918 TOPs/TFLOPs* | 1836 TOPs/TFLOPs | 4.6 POPs/PFLOPs |
HBM Bandwidth | 0.8 TB/s | 2.8 TB/s | 1.6 TB/s | 7.4 TB/s |
HBM Capacity | 16 GB | 95 GB | 32 GB | 192 GB |
Inter Chip Interconnect Bandwidth (per link) | 400 Gbps | 800 Gbps | 800 Gbps | 1200 Gbps |
Interconnect Topology | 2D Torus | 3D Torus | 2D Torus | 3D Torus |
TPU Pod Size | 256 | 8960 | 256 | 9216 |
Spare Cores | No | No | Yes | Yes |
Note: Numbers marked with “*” are unofficial calculated numbers.
Most notably, Ironwood features:
- Massive computational power: Each chip delivers 4.6 petaFLOPS of FP8 performance, putting it in the same performance class as NVIDIA’s Blackwell B200
- Increased memory capacity: 192GB of High Bandwidth Memory (HBM) per chip
- Dramatically improved memory bandwidth: 7.37 TB/s per chip, 4.5x more than Trillium, enabling faster data access for memory-constrained AI Inference
- Enhanced interconnect capabilities: 1.2 TBps bidirectional bandwidth, a 1.5x improvement over Trillium, facilitating more efficient communication between chips
Speculation: Is Ironwood the Missing v6p?
Interestingly, Google appears to have skipped the expected TPU v6p generation and moved directly to releasing the v7e Ironwood. This suggests that this chip may have been originally intended as the v6p training chip. However, due to rapidly expanding model sizes and the need to compete with offerings like NVIDIA’s GB200 NVL72, Google likely repositioned it as the v7e Ironwood. The massive 9216 TPU pod size and the use of 3D Torus interconnect in what is designated as an “e” series chip (typically the more economical variant) strongly support this theory.
The Road Ahead
Google has announced that Ironwood TPUs will be available later this year through Google Cloud. The technology is already powering some of Google’s most advanced AI systems, including Gemini 2.5 and AlphaFold.
As these powerful new accelerators become available to developers and researchers, they are likely to enable breakthroughs in AI capabilities, particularly for large-scale inference workloads that require both massive computational power and sophisticated reasoning capabilities.
Engage with StorageReview
Newsletter | YouTube | Podcast iTunes/Spotify | Instagram | Twitter | TikTok | RSS Feed