Cloud FinOps — Why Engineers Own the Cost Conversation Now · Osmond van Hemert — Senior Software Engineer

I had an interesting conversation with a CTO last week. She told me her team’s cloud bill had grown 40% year-over-year despite serving roughly the same traffic. Not because they were doing anything wrong — they’d adopted new managed services, scaled for peak capacity, and added observability tooling. All good engineering decisions. But nobody was tracking the cumulative cost impact until the quarterly review hit.

This is a story I’m hearing more and more. The FinOps Foundation’s latest State of FinOps report shows that cloud cost management is now the top concern for engineering leadership, surpassing even security. And the solution isn’t finance dashboards — it’s engineering ownership.

The Shift from Finance to Engineering
#

Traditional cost management follows a simple pattern: finance sets budgets, IT stays within them, everyone reviews monthly reports. This worked when infrastructure was capital expenditure — you bought servers, amortized them over three years, and that was that.

Cloud broke this model completely. Every API call, every container spin-up, every byte stored is a variable cost decision made by engineers, often implicitly through architectural choices. When a developer chooses DynamoDB over PostgreSQL, or configures auto-scaling with generous headroom, or enables detailed CloudWatch metrics on every Lambda function, they’re making cost decisions. They just don’t know it. This connects to broader platform engineering and infrastructure-as-code practices that make these decisions visible and auditable.

The FinOps movement recognizes this reality and pushes cost awareness into the engineering workflow. But the first generation of FinOps was still finance-led: tagging policies, chargeback models, monthly reports with scary graphs. That’s necessary infrastructure, but it doesn’t change behavior where it matters — at the point of architectural and operational decisions.

Engineering-First FinOps
#

The teams I see doing this well have shifted to what I’d call engineering-first FinOps. Here’s what that looks like in practice:

Cost as a metric in CI/CD. Tools like Infracost integrate into pull request workflows to show the estimated cost impact of infrastructure changes before they’re merged. This aligns with supply chain security practices where change verification matters. A Terraform change that adds a NAT Gateway to three availability zones gets a comment showing the ~$100/month per gateway cost. The engineer makes an informed decision, the reviewer has context, and surprises are caught early.

Unit economics in dashboards. Instead of tracking total cloud spend, track cost per request, cost per user, cost per transaction. These unit metrics let you distinguish between cost growth that’s proportional to business growth (healthy) and cost growth that’s disproportionate (a problem). Grafana and Datadog both have solid integrations with cloud billing APIs now, and embedding cost panels alongside performance metrics makes the trade-offs visible.

Architecture decision records with cost implications. When you’re choosing between a managed service and a self-hosted alternative, document the cost comparison alongside the operational trade-offs. A managed Kafka service might cost 3x what self-hosted Kafka costs in compute, but if it saves 0.5 FTE in operational overhead, that’s usually a win. Making these calculations explicit improves decision-making and creates institutional knowledge.

Reserved capacity and commitment planning as engineering work. Savings Plans and Reserved Instances can save 30-60% on baseline compute, but they require understanding your workload patterns. This isn’t finance work — it’s capacity planning, and engineers are better positioned to forecast it. Understanding infrastructure maturity and cloud platform capabilities helps predict cost-efficient workload patterns. The teams that treat commitment purchases as an engineering planning exercise consistently outperform those that delegate it to procurement.

The Tooling Landscape
#

The FinOps tooling space has matured significantly. Kubecost provides excellent Kubernetes cost allocation, breaking down spend by namespace, deployment, and even individual pod. CAST AI automates Kubernetes cost optimization through intelligent node selection and autoscaling. Cloud-native tools like AWS Cost Explorer and Azure Cost Management have gotten better at granular allocation.

OpenCost, the CNCF sandbox project, is worth watching as an open-source alternative to commercial solutions. It provides real-time cost monitoring for Kubernetes and integrates well with Prometheus-based observability stacks.

But tooling alone isn’t enough. I’ve seen teams deploy Kubecost, look at the dashboards for two weeks, and then ignore them. The tooling needs to be embedded in workflows — PR checks, sprint planning, architecture reviews — to actually change behavior.

The Waste Problem
#

Let me be blunt: most organizations are wasting 25-35% of their cloud spend. The FinOps Foundation’s data backs this up consistently. The biggest culprits:

Idle resources. Development environments running 24/7, load balancers with no backends, EBS volumes detached from instances. Automated cleanup policies — scale dev environments to zero on nights and weekends, alert on idle resources, automatically terminate instances that haven’t served traffic in 72 hours — can cut this dramatically.

Over-provisioned instances. Running m5.xlarge when your workload fits in a t3.medium. Right-sizing recommendations from cloud providers are often accurate and consistently ignored. Make right-sizing a quarterly engineering task, not a suggestion.

Data transfer costs. The hidden tax of cloud computing. Cross-AZ traffic, NAT Gateway data processing, CloudFront to origin transfers — these costs are invisible until they’re not. Architect for data locality, use VPC endpoints, and understand your data flow patterns.

My Take
#

The most important shift in FinOps isn’t technological — it’s cultural. Engineers need to care about cost the same way they care about performance and reliability. Not because it’s their job to minimize spend, but because cost is a signal about architectural health. A system that costs twice as much as it should is usually a system with other problems: over-complexity, poor resource management, missing automation.

The best engineers I’ve worked with have always had an intuitive sense of cost efficiency. They choose the right tool for the job, not the most expensive managed service. They build systems that scale down as well as they scale up. They understand that every architecture decision has a long-term cost implication.

If your team doesn’t have visibility into the cost of the systems they build and operate, fix that first. Everything else follows from awareness.

More infrastructure perspectives in my Infrastructure Notes series.

Cloud Operations - This article is part of a series.

Part : This Article

Part : OpenTelemetry Reaches GA for Logs — The Three Pillars Are Finally Complete

Part : The Stargate Project — $500 Billion and the Future of AI Infrastructure

Part : NVIDIA's Q2 Numbers Are Staggering — What It Tells Us About AI Infrastructure Demand

Part : IBM Acquires HashiCorp — What It Means for the Infrastructure-as-Code Ecosystem

Part : Broadcom's VMware Overhaul — The Virtualization World Is Rattled

Part : NVIDIA's $22 Billion Quarter — The AI Infrastructure Gold Rush Is Real

Part : Cloudflare R2 Goes GA — The S3-Compatible Storage War Heats Up

Part : Heroku Kills the Free Tier — End of an Era for Developer Onboarding