It’s Christmas Day, the office is quiet, and there’s something fitting about using the downtime to reflect on where platform engineering stands as we close out 2025. This has been a transformative year for how organizations think about developer experience, infrastructure abstraction, and the relationship between platform teams and the developers they serve.
The Rise of the Internal Developer Platform#
If 2024 was the year everyone talked about internal developer platforms (IDPs), 2025 was the year many organizations actually built them. The concept isn’t new — giving developers self-service access to infrastructure through curated abstractions has been a goal since the early days of DevOps. But the tooling has finally caught up with the ambition.
Backstage, Spotify’s open-source developer portal, continued its march toward ubiquity. The plugin ecosystem expanded significantly this year, and I’ve seen it adopted at organizations ranging from 50-person startups to Fortune 500 enterprises. The value proposition is clear: a single pane of glass for service catalogs, documentation, CI/CD pipelines, and infrastructure provisioning.
But Backstage alone isn’t a platform — it’s a portal. The real work happens in the layers beneath: Crossplane for infrastructure-as-code through Kubernetes custom resources, Argo CD for GitOps deployments, and increasingly, purpose-built platform orchestrators like Kratix that handle the promise-based composition of platform capabilities.
What I find encouraging is the shift from “let’s just give everyone Terraform access” to “let’s provide golden paths that embed our organization’s best practices.” Platform engineering done well means developers don’t need to understand the intricacies of VPC peering or IAM role chaining — they declare what they need, and the platform handles the how.
Kubernetes: Still the Foundation, Less the Focus#
Kubernetes itself has become almost invisible in well-run platform teams, and that’s exactly where it should be. The conversations this year haven’t been about Kubernetes — they’ve been about what runs on Kubernetes and how developers interact with it.
The managed Kubernetes offerings from AWS (EKS), Google (GKE), and Azure (AKS) have matured to the point where the operational overhead of the control plane is negligible. The remaining complexity lives in networking (service meshes, ingress controllers), observability (the OpenTelemetry ecosystem), and multi-tenancy patterns.
Gateway API reached general availability for its core features and is steadily replacing the aging Ingress resource. If you’re still writing Ingress manifests, now is the time to migrate. Gateway API’s expressiveness and the clear separation between infrastructure provider and application developer roles make it a substantial improvement.
I’ve spent considerable time this year helping teams adopt Cilium for eBPF-based networking, and the results have been impressive. The performance improvements over traditional iptables-based networking are meaningful, and the observability features — being able to see L7 traffic flows without sidecars — have simplified debugging significantly.
The Observability Stack Consolidation#
One of the most notable trends of 2025 has been the consolidation around OpenTelemetry as the standard instrumentation layer. The project has reached a level of maturity where it’s no longer a question of whether to adopt it, but how quickly you can migrate from proprietary agents.
The tracing and metrics APIs have been stable for a while, but this year the logging signal reached stability, completing the three pillars under a single standard. For those of us who’ve spent years wiring together separate logging, metrics, and tracing pipelines with different agents, formats, and backends, this convergence is a genuine relief.
On the backend side, Grafana continued to strengthen its position as the visualization layer of choice, while the LGTM stack (Loki, Grafana, Tempo, Mimir) provides a compelling open-source alternative to commercial observability platforms. I’ve migrated two production environments to this stack this year, and the cost savings compared to commercial alternatives were substantial — roughly 60% reduction in observability spend.
Infrastructure as Code: The Terraform Question#
HashiCorp’s relicensing of Terraform to BSL in 2023 continues to ripple through the ecosystem. OpenTofu, the community fork, has gained significant traction this year, with several major organizations migrating their workflows. The OpenTofu 1.8 and 1.9 releases brought features that demonstrated the fork’s ability to innovate independently.
Meanwhile, Pulumi continues to attract developers who prefer writing infrastructure in real programming languages. The appeal is obvious — why learn HCL when you can write TypeScript or Python? But I’ve found that the discipline HCL imposes — its declarative nature and limited expressiveness — is actually a feature in larger organizations. Infrastructure code that “does too much” is infrastructure code that’s hard to review and reason about.
My current recommendation for teams starting fresh: evaluate OpenTofu as your default, keep an eye on Pulumi for complex orchestration scenarios, and consider Crossplane if you’re already deep in the Kubernetes ecosystem.
My Take#
Platform engineering in 2025 has moved from buzzword to discipline, and that’s the most encouraging development. We have real patterns, real tools, and — crucially — real failure stories to learn from.
The teams that have succeeded are those who treated their platform as a product, with developer experience as the primary metric. They invested in golden paths, documentation, and feedback loops. They resisted the temptation to build everything custom and instead composed existing open-source tools into coherent platforms.
The teams that struggled were those who confused “building a platform” with “adding more YAML.” If your developers need a PhD in Kubernetes to deploy a service, your platform has failed, regardless of how elegant its architecture is.
As we head into 2026, I expect the focus to shift increasingly toward AI-assisted platform operations — using LLMs to help with incident response, infrastructure optimization, and developer onboarding. But that’s a topic for another post. For now, enjoy the holiday, and take a moment to appreciate how far our tooling has come. The infrastructure challenges we face today are orders of magnitude more complex than what I dealt with in the early 2000s, but our tools are orders of magnitude better too. That’s progress worth celebrating.

