Platform Engineering in 2025 — A Year-End Retrospective · Osmond van Hemert — Senior Software Engineer

It’s Christmas Day, the office is quiet, and there’s something fitting about using the downtime to reflect on where platform engineering stands as we close out 2025. This has been a transformative year for how organizations think about developer experience, infrastructure abstraction, and the relationship between platform teams and the developers they serve.

The Rise of the Internal Developer Platform
#

If 2024 was the year everyone talked about internal developer platforms (IDPs), 2025 was the year many organizations actually built them. The concept isn’t new — giving developers self-service access to infrastructure through curated abstractions has been a goal since the early days of DevOps. But the tooling has finally caught up with the ambition.

Backstage, Spotify’s open-source developer portal, continued its march toward ubiquity. The plugin ecosystem expanded significantly this year, and I’ve seen it adopted at organizations ranging from 50-person startups to Fortune 500 enterprises. The value proposition is clear: a single pane of glass for service catalogs, documentation, CI/CD pipelines, and infrastructure provisioning.

But Backstage alone isn’t a platform — it’s a portal. The real work happens in the layers beneath: Crossplane for infrastructure-as-code through Kubernetes custom resources, Argo CD for GitOps deployments, and increasingly, purpose-built platform orchestrators like Kratix that handle the promise-based composition of platform capabilities.

What I find encouraging is the shift from “let’s just give everyone Terraform access” to “let’s provide golden paths that embed our organization’s best practices.” Platform engineering done well means developers don’t need to understand the intricacies of VPC peering or IAM role chaining — they declare what they need, and the platform handles the how. These patterns enable the agent-based system architectures that are emerging.

Kubernetes: Still the Foundation, Less the Focus
#

Kubernetes itself has become almost invisible in well-run platform teams, and that’s exactly where it should be. The conversations this year haven’t been about Kubernetes — they’ve been about what runs on Kubernetes and how developers interact with it. The 1.32 release cycle demonstrated the platform’s continuing maturity, with stability and developer experience improvements as the focus.

The managed Kubernetes offerings from AWS (EKS), Google (GKE), and Azure (AKS) have matured to the point where the operational overhead of the control plane is negligible. The remaining complexity lives in networking (service meshes, ingress controllers), observability (the OpenTelemetry ecosystem), and multi-tenancy patterns.

Gateway API reached general availability for its core features and is steadily replacing the aging Ingress resource. If you’re still writing Ingress manifests, now is the time to migrate. Gateway API’s expressiveness and the clear separation between infrastructure provider and application developer roles make it a substantial improvement.

I’ve spent considerable time this year helping teams adopt Cilium for eBPF-based networking, and the results have been impressive. The performance improvements over traditional iptables-based networking are meaningful, and the observability features — being able to see L7 traffic flows without sidecars — have simplified debugging significantly.

The Observability Stack Consolidation
#

One of the most notable trends of 2025 has been the consolidation around OpenTelemetry as the standard instrumentation layer. The project has reached a level of maturity where it’s no longer a question of whether to adopt it, but how quickly you can migrate from proprietary agents.

The tracing and metrics APIs have been stable for a while, but this year the logging signal reached stability, completing the three pillars under a single standard. For those of us who’ve spent years wiring together separate logging, metrics, and tracing pipelines with different agents, formats, and backends, this convergence is a genuine relief.

On the backend side, Grafana continued to strengthen its position as the visualization layer of choice, while the LGTM stack (Loki, Grafana, Tempo, Mimir) provides a compelling open-source alternative to commercial observability platforms. I’ve migrated two production environments to this stack this year, and the cost savings compared to commercial alternatives were substantial — roughly 60% reduction in observability spend.

Infrastructure as Code: The Terraform Question
#

HashiCorp’s relicensing of Terraform to BSL in 2023 continues to ripple through the ecosystem. OpenTofu, the community fork, has gained significant traction this year, with several major organizations migrating their workflows. The broader supply chain and open-source security landscape has influenced how teams evaluate IaC tools. The OpenTofu 1.8 and 1.9 releases brought features that demonstrated the fork’s ability to innovate independently.

Meanwhile, Pulumi continues to attract developers who prefer writing infrastructure in real programming languages. The appeal is obvious — why learn HCL when you can write TypeScript or Python? But I’ve found that the discipline HCL imposes — its declarative nature and limited expressiveness — is actually a feature in larger organizations. Infrastructure code that “does too much” is infrastructure code that’s hard to review and reason about.

My current recommendation for teams starting fresh: evaluate OpenTofu as your default, keep an eye on Pulumi for complex orchestration scenarios, and consider Crossplane if you’re already deep in the Kubernetes ecosystem. The OpenTofu fork has continued to mature and represents a solid path forward for teams looking to escape HashiCorp’s licensing constraints.

Sub-Hub: Platform Engineering & DevOps Practices
#

For detailed exploration of platform engineering patterns, from internal developer platforms to AI-assisted operations, see Platform Engineering & DevOps Practices — Building Developer Experience Platforms. This sub-hub connects platform engineering disciplines to infrastructure tooling, observability, and the shift toward AI-assisted operations.

My Take
#

Platform engineering in 2025 has moved from buzzword to discipline, and that’s the most encouraging development. We have real patterns, real tools, and — crucially — real failure stories to learn from.

The teams that have succeeded are those who treated their platform as a product, with developer experience as the primary metric. They invested in golden paths, documentation, and feedback loops. They resisted the temptation to build everything custom and instead composed existing open-source tools into coherent platforms. These teams are also investing in observability maturity from day one, recognizing that internal platforms need visibility into their own health and performance.

The teams that struggled were those who confused “building a platform” with “adding more YAML.” If your developers need a PhD in Kubernetes to deploy a service, your platform has failed, regardless of how elegant its architecture is. Better developer tools and AI-assisted coding continue to improve this experience.

As we head into 2026, I expect the focus to shift increasingly toward AI-assisted platform operations — using LLMs to help with incident response, infrastructure optimization, and developer onboarding. Cloud cost optimization and FinOps will become more critical as teams scale their platforms and need better visibility into infrastructure spending.

Looking further ahead, distributed platform architectures will require platform teams to rethink how they deliver infrastructure abstractions across heterogeneous environments, especially as AI and autonomous systems shape operational models.

But that’s a topic for another post. For now, enjoy the holiday, and take a moment to appreciate how far our tooling has come. The infrastructure challenges we face today are orders of magnitude more complex than what I dealt with in the early 2000s, but our tools are orders of magnitude better too. The platform engineering discipline itself is evolving as new hardware and capabilities emerge. That’s progress worth celebrating.

Developer Tooling - This article is part of a series.

Part : Biome — The ESLint and Prettier Killer

Part : Platform Engineering & DevOps Practices — Building Developer Experience Platforms

Part : GitHub Copilot Agent Mode Goes GA — What It Means for Developer Workflows

Part : AI Agent Frameworks — The Wild West of Autonomous Systems

Part : This Article

Part : GitHub Universe 2025 — Copilot Grows Up and the IDE Fades Further

Part : AI Coding Assistants Are Growing Up — Beyond Autocomplete

Part : SWE-bench Benchmark Contamination — When the Test Answers Are in the Training Data

Part : Mistral's Le Chat Gets MCP Connectors — The Protocol That's Quietly Connecting Everything

Part : OpenTelemetry Reaches Full Maturity — Observability Finally Has a Standard

Part : AI-Native IDEs — The Editor Wars Have a New Front