Nvidia GTC 2025 — Blackwell Ultra and the Infrastructure Race for AI

Jensen Huang took the stage at GTC this week in San Jose, and as usual, the keynote was a masterclass in product roadmap theatre. But behind the leather jacket and the carefully choreographed reveals, there’s a genuinely important story about where computing infrastructure is heading. Nvidia isn’t just selling GPUs anymore — they’re defining the architecture of AI-era data centres.

Blackwell Ultra and the Compute Trajectory
#

The headline announcement is the Blackwell Ultra GPU, which succeeds the Blackwell architecture that started shipping to cloud providers late last year. The numbers are impressive on paper: significantly higher memory bandwidth, larger HBM3e capacity, and improved interconnect speeds for multi-GPU configurations.

But the number that matters most for practitioners is the inference throughput improvement. Nvidia is claiming substantial gains in tokens-per-second for large language model inference, which directly translates to lower cost per query for anyone running AI services at scale. If you’re deploying models in production, the economics of each hardware generation determine what’s viable.

What’s more interesting to me is the NVLink 6 interconnect improvements. The bottleneck for large model training and inference isn’t just raw compute — it’s how fast you can move data between GPUs. Blackwell Ultra’s NVLink improvements mean you can scale to larger clusters without the interconnect becoming the limiting factor as quickly. For training runs that span thousands of GPUs, this is where the real gains come from.

Vera Rubin: Looking Two Steps Ahead
#

In typical Nvidia fashion, Jensen didn’t just announce the current generation — he previewed the next next generation: the Vera Rubin architecture, expected in 2026. Named after the astronomer who provided evidence for dark matter, the Rubin GPU paired with the Vera CPU represents Nvidia’s move toward tighter CPU-GPU integration.

This matters because the trend in AI workloads is moving toward more heterogeneous compute. Not everything in an AI pipeline benefits from GPU acceleration — data preprocessing, tokenization, and orchestration logic often run more efficiently on CPUs. Having a tightly integrated CPU-GPU system with high-bandwidth shared memory could simplify the software stack significantly.

For those of us building inference pipelines today, the implication is clear: the hardware is going to keep getting faster and more efficient, which means the software architecture decisions we make should optimise for flexibility rather than squeezing every last bit of performance from current hardware. What’s GPU-memory-bound today might not be in eighteen months.

DGX Cloud and the Democratisation Question
#

The other significant announcement is the expansion of DGX Cloud, Nvidia’s cloud-hosted AI supercomputing service. Partnerships with major cloud providers mean that teams without the capital (or power infrastructure) to buy racks of Blackwell GPUs can still access them on demand.

This is important for the broader developer ecosystem. The cost barrier to training or fine-tuning large models has been a significant filter on who gets to participate in AI development. Cloud access to cutting-edge hardware doesn’t eliminate the cost entirely — it’s still expensive — but it changes the economics from “multi-million dollar capital expenditure” to “operational expense you can scale up and down.”

I’ve been watching this dynamic play out in several projects where teams start with cloud GPU instances for experimentation, then evaluate whether on-premises hardware makes sense for production workloads with predictable demand patterns. The break-even calculation varies enormously based on utilisation rates, and Nvidia’s rapid hardware cadence makes the buy-versus-rent decision even more complex — do you buy Blackwell today knowing Rubin is eighteen months away?

The Software Stack Is the Moat
#

What often gets overlooked in the GTC hardware spectacle is that Nvidia’s real competitive advantage is the software ecosystem. CUDA has been the dominant GPU programming framework for over a decade, and the ecosystem of libraries built on top of it — cuDNN, TensorRT, NCCL, Triton Inference Server — creates enormous switching costs.

GTC 2025 continued this strategy with announcements around NIM (Nvidia Inference Microservices) and expanded framework support. The NIM containers package optimised models with the right runtime configurations, making it significantly easier to deploy models in production without deep GPU programming expertise.

For developers, this is a double-edged sword. The abstraction layers make it easier to get started and achieve good performance, but they also deepen the dependency on Nvidia’s stack. AMD’s ROCm and Intel’s oneAPI are making progress, but the gap in the software ecosystem remains the real barrier to GPU competition — not the hardware specs.

The Energy Elephant in the Room
#

One topic that received less attention than it deserves is power consumption. These new GPU systems draw enormous amounts of electricity, and the cooling requirements are pushing data centres toward liquid cooling solutions. Nvidia showcased some of this infrastructure, but the fundamental question remains: as AI compute demand grows exponentially, where does the power come from?

For developers and infrastructure teams, this has practical implications. Cloud providers are already seeing capacity constraints in certain regions, and pricing reflects the power costs. If you’re planning AI infrastructure deployments, energy availability and cost should be in your architecture decisions alongside the usual performance and latency considerations.

My Take
#

I’ve attended GTC presentations (remotely, at least) for years now, and the trajectory is remarkable. Nvidia has executed a strategy of controlling the full stack — hardware, interconnects, system software, and increasingly the application frameworks — that gives them a position in AI infrastructure similar to what Intel had in enterprise computing in the 2000s.

For most developers, the practical takeaway from GTC 2025 is that AI inference is going to get cheaper and faster, which expands the range of applications where it makes economic sense. If you’ve been holding off on integrating AI capabilities into your products because of cost concerns, revisit those calculations. The cost curve is dropping faster than most people expected.

The competitive landscape for AI hardware will evolve — AMD, Intel, and custom silicon from the hyperscalers will eventually provide real alternatives. But for the next couple of years at least, Nvidia’s ecosystem dominance means their roadmap is effectively the industry’s roadmap. Plan accordingly.

Part of my Infrastructure Notes series, examining the systems and platforms that underpin modern software development.

Cloud Platform Watch - This article is part of a series.

Part : Google Cloud Next 2026 — Platform Engineering Takes Center Stage

Part : GTC 2026 Preview — What NVIDIA's Next Move Means for AI Infrastructure

Part : AWS re:Invent 2025 Preview — What I'm Watching For

Part : AWS re:Inforce 2025 — Cloud Security Gets Serious About AI Workloads

Part : WWDC 2025 — Apple Doubles Down on On-Device AI

Part : Microsoft Build 2025 — The AI Platform Play Comes Into Focus

Part : Google Cloud Next 2025 — Ironwood TPU and the Infrastructure Arms Race

Part : This Article

Part : NVIDIA at CES 2025 — Jensen's Vision for AI Infrastructure

Part : AWS re:Invent 2024 — Amazon Bets Big on Custom Silicon and AI Infrastructure