SG · 01° 17′ N
§ F06 · Field Note PUBLISHED
← Index

Deterministic Cloud Costs — Treat Your FinOps Model Like a Build Artifact

Cloud cost surprises happen because cost lives outside your build artifacts. The infrastructure is in code; the bill is in a vendor dashboard; the gap between them is filled with hope. The fix is to pull cost into the same plane as the rest of your engineering work — make it a deterministic property of your infra-as-code, generated at PR time, reviewed before merge, manifested at release.

Where the surprises come from

Three patterns produce most cost incidents.

Resource changes that look small. Bumping an instance type from m6i.large to m6i.2xlarge is two characters in a Terraform file and a 4× cost increase nobody caught in review.

Stateful storage drift. A bucket grows by 50 GB a day for six months. Nobody set a retention policy. The line item arrives at 9 TB.

Vendor pricing changes. Egress rates change. A managed service moves to a new tier. Reserved instance pricing shifts. Your IaC didn’t change; your bill did.

None of these are visible in code review unless cost is part of the review.

What “cost as a build artifact” means

For every PR that touches infrastructure, a CI step computes a cost diff. Not the full bill — the difference between this PR’s infra and main’s. The diff lives on the PR like any other check:

cost-diff: +$847/mo  (+12.3%)
breakdown:
  compute:     +$612/mo   m6i.large → m6i.2xlarge x 4
  storage:     +$128/mo   gp3 1000GB → gp3 2000GB x 4
  egress:      +$107/mo   estimated from p99 traffic
within-budget: NO         (budget: +$500/mo)
review-required: cost-owners

A reviewer sees this exactly the way they see a test failure or a type error. If the diff blows budget, the PR can’t merge without explicit signoff from a cost owner.

What you need to make it work

  1. A pricing model that maps your IaC to dollars. Open-source tools like Infracost handle the common case. Custom resources need custom pricers, but they’re small.
  2. Pricing data that updates. Vendor prices drift. The pricing model needs a refresh job, ideally weekly, that pulls current rates from each cloud’s pricing API.
  3. A budget manifest per environment. budget.yaml at the root: monthly cap, soft alert thresholds, owner. Same review discipline as anything else.
  4. A drift detector. A nightly job compares the cost the manifest predicted vs the cost the vendor actually billed. Alert on >5% deviation. This catches the storage and pricing-change cases that PR-time review can’t.

What this does to engineering culture

Engineers see the dollar impact of their changes at PR time. They start asking different questions. Do we really need 4 replicas of this in staging? Why does the analytics warehouse have a 90-day retention? Could we use a smaller instance type with autoscaling?

None of those questions get asked when cost is a monthly surprise. All of them get asked when cost is a number on a PR that has to be approved.

Where it doesn’t work cleanly

Pure usage-based services (some serverless, some AI APIs) are harder to predict at PR time — the cost depends on traffic you haven’t sent yet. For those, model it as cost per invocation and require a forecast field on the PR (“expected requests per day”). The drift detector picks up the rest.

Multi-tenant clusters and shared resources get fuzzy too. Allocate proportionally to declared usage. Wrong is better than missing.

The deeper point

“FinOps” as a separate function is a sign that cost has been pushed out of engineering. Pull it back in. Make it a number on a PR. Make it a check in CI. Make it a property of the build, not a meeting on the calendar.

Surprises become diffs. Diffs get reviewed. Bills stop being news.