Home EnterpriseAI Google Ironwood TPU: A Massive Leap in AI Inference Performance

Google Ironwood TPU: A Massive Leap in AI Inference Performance

by Divyansh Jain

Google unveils the Ironwood TPU, its most powerful AI accelerator yet, delivering massive improvements in inference performance and efficiency.

Last week, Google pulled back the curtain on its latest custom AI accelerator, the Ironwood TPU, showcasing a significant performance improvement for the increasingly demanding world of AI. Announced at Google Cloud Next 25, Ironwood is the seventh generation of Google’s TPUs, specifically engineered to handle the modern AI workloads, particularly in the realm of inference.

Ironwood TPU

Understanding TPUs

Before diving into Ironwood, it’s helpful to understand what TPUs are. Tensor Processing Units are specialized chips developed by Google specifically for accelerating machine learning workloads. Unlike general-purpose CPUs or even GPUs, which are optimized for parallel processing, initially for graphics, TPUs are optimized for the matrix and tensor operations at the heart of neural networks. Historically, Google has offered different TPU versions, often distinguishing between ‘e’ series (focused on efficiency and inference, running pre-trained models) and ‘p’ series (focused on raw performance for training large models).

Introducing Ironwood 

The new Ironwood TPU is Google’s most ambitious AI accelerator to date. It’s the company’s first TPU specifically designed for the demands of inference-heavy ‘reasoning models’. Ironwood brings substantial improvements across all key performance metrics compared to its predecessors, including:

TPU v5e TPU v5p TPU v6e TPU v7e
BF16 Compute 197 TFLOPs 459 TFLOPs 918 TFLOPs 2.3 PFLOPs*
INT8/FP8 Compute 394 TOPs/TFLOPs* 918 TOPs/TFLOPs* 1836 TOPs/TFLOPs 4.6 POPs/PFLOPs
HBM Bandwidth 0.8 TB/s 2.8 TB/s 1.6 TB/s 7.4 TB/s
HBM Capacity 16 GB 95 GB 32 GB 192 GB
Inter Chip Interconnect Bandwidth (per link) 400 Gbps 800 Gbps 800 Gbps 1200 Gbps
Interconnect Topology 2D Torus 3D Torus 2D Torus 3D Torus
TPU Pod Size 256 8960 256 9216
Spare Cores No No Yes Yes

Note: Numbers marked with “*” are unofficial calculated numbers.

Most notably, Ironwood features:

  • Massive computational power: Each chip delivers 4.6 petaFLOPS of FP8 performance, putting it in the same performance class as NVIDIA’s Blackwell B200
  • Increased memory capacity: 192GB of High Bandwidth Memory (HBM) per chip
  • Dramatically improved memory bandwidth: 7.37 TB/s per chip, 4.5x more than Trillium, enabling faster data access for memory-constrained AI Inference
  • Enhanced interconnect capabilities: 1.2 TBps bidirectional bandwidth, a 1.5x improvement over Trillium, facilitating more efficient communication between chips

Speculation: Is Ironwood the Missing v6p?

Interestingly, Google appears to have skipped the expected TPU v6p generation and moved directly to releasing the v7e Ironwood. This suggests that this chip may have been originally intended as the v6p training chip. However, due to rapidly expanding model sizes and the need to compete with offerings like NVIDIA’s GB200 NVL72, Google likely repositioned it as the v7e Ironwood. The massive 9216 TPU pod size and the use of 3D Torus interconnect in what is designated as an “e” series chip (typically the more economical variant) strongly support this theory.

The Road Ahead

Google has announced that Ironwood TPUs will be available later this year through Google Cloud. The technology is already powering some of Google’s most advanced AI systems, including Gemini 2.5 and AlphaFold.

As these powerful new accelerators become available to developers and researchers, they are likely to enable breakthroughs in AI capabilities, particularly for large-scale inference workloads that require both massive computational power and sophisticated reasoning capabilities.

Engage with StorageReview

Newsletter | YouTube | Podcast iTunes/Spotify | Instagram | Twitter | TikTok | RSS Feed