Skip to main content
  1. Blog/

Google Cloud Next 2025 — Ironwood TPU and the Infrastructure Arms Race

·867 words·5 mins
Osmond van Hemert
Author
Osmond van Hemert
Cloud Platform Watch - This article is part of a series.
Part : This Article

Google Cloud Next 2025 wrapped up yesterday in Las Vegas, and if you were paying attention, the message was unmistakable: the cloud wars have become AI infrastructure wars. The star of the show was Ironwood, Google’s 7th-generation TPU, and it represents a significant leap in what cloud providers are willing to build — and spend — to win the AI compute race.

Having attended Cloud Next events (virtually and in person) since the early days, I can say this one felt different. The entire conference was organized around a single thesis: AI workloads are the future of cloud computing, and everything else is secondary.

Ironwood: What Makes It Different
#

Google’s Ironwood TPU is purpose-built for large-scale AI inference and training. The numbers are impressive — Google claims a 4x improvement in performance-per-watt compared to TPU v5e, with significantly larger high-bandwidth memory pools that allow hosting bigger model shards per chip.

What strikes me most is the architectural decision to optimize heavily for inference workloads alongside training. Previous TPU generations were primarily training-focused, with inference being handled by separate infrastructure. Ironwood unifies this, which makes economic sense when you consider that inference costs are rapidly becoming the dominant expense for organizations running production AI systems.

The pod configurations scale up to 9,216 chips in a single cluster, connected via Google’s custom inter-chip interconnect (ICI). For context, that’s enough compute to run multiple copies of the largest foundation models simultaneously. Google is clearly building this for their own Gemini infrastructure first, but the fact that they’re making it available through Google Cloud is telling — they want enterprise customers locked into their AI compute stack.

Gemini 2.5 Pro and the Developer Story
#

The other major announcement was Gemini 2.5 Pro, which Google positions as their most capable model for coding and complex reasoning tasks. They demonstrated it handling multi-file code refactoring, long-context document analysis, and agentic workflows that chain multiple tool calls together.

What caught my attention was the emphasis on the 1-million-token context window in production. We’ve heard about long context windows before, but Google showed real enterprise use cases — feeding entire codebases into the model for analysis, processing lengthy legal documents, and maintaining coherent conversations across massive amounts of reference material.

From a developer tools perspective, Google also announced tighter integration between Gemini and their Cloud development suite. Firebase got AI-powered features, Cloud Run got streamlined model deployment, and BigQuery can now use Gemini for natural language data exploration. The platform play is becoming very cohesive.

The Multi-Cloud Reality Check
#

Here’s what I think gets lost in the excitement of these announcements: most enterprises I’ve worked with over the past few years aren’t all-in on a single cloud. They’re running workloads across AWS, Azure, and GCP, often with some on-premises infrastructure still in the mix.

Google’s strategy with Ironwood and the broader AI platform is clearly designed to change that calculus. If your AI inference runs best on TPUs, and your TPUs only exist in Google Cloud, you’ve got a strong incentive to centralize. It’s the same playbook AWS ran with custom Graviton instances — build hardware that only works in your cloud and make it compelling enough that migration becomes attractive.

The counter-argument is Kubernetes and the open ecosystem. Google themselves built Kubernetes to be cloud-agnostic, and tools like GKE Enterprise are designed to work across environments. But AI workloads don’t move easily between hardware architectures. A model optimized for TPU inference doesn’t just port to NVIDIA GPUs or AWS Trainium without significant engineering effort.

What About the Competition?
#

AWS has been building their own AI chips — Trainium2 is in preview, and they’ve been aggressive with pricing. Microsoft and Azure have their NVIDIA partnership and custom Maia chips in development. But Google has a unique advantage: they’ve been building custom AI hardware longer than anyone. TPU v1 shipped internally in 2015. That’s a decade of silicon design iteration.

The question isn’t whether these chips are good — they are. The question is whether Google can translate hardware leadership into cloud market share. Historically, having the best technology hasn’t been enough in the cloud market. AWS won with breadth of services and developer mindshare. Azure won with enterprise relationships and Microsoft 365 integration.

My Take
#

What I find most compelling about Cloud Next 2025 isn’t any single announcement — it’s the coherence of the vision. Google is betting that AI infrastructure will be the deciding factor in the next phase of cloud competition, and they’re building every layer of the stack: custom silicon, optimized networking, integrated ML frameworks, and application-layer AI services.

For those of us building systems that need to scale, the practical takeaway is this: it’s worth evaluating TPU-based inference seriously, especially if you’re running large language models in production. The cost-performance improvements from Ironwood could meaningfully change your infrastructure economics.

But don’t lock yourself in without an exit strategy. The cloud landscape shifts fast, and today’s best option might not be tomorrow’s. Architect for portability where you can, optimize for performance where you must.

The AI infrastructure war is just getting started, and as engineers, we’re the ones who get to decide where the workloads actually run.

Cloud Platform Watch - This article is part of a series.
Part : This Article