SG · 01° 17′ N
§ F02 · Field Note PUBLISHED
← Index

Write-Ahead Feature Flags — Shipping Faster Without Shipping Chaos

Most feature flag systems are reactive. You ship the code, the code checks a runtime switch, you flip the switch in a dashboard, you wait to see if anything breaks. The flag is a panic button bolted onto the release after the fact.

We’ve stopped doing it that way. Our flags are write-ahead: declared in a versioned manifest before the code that uses them lands, with the rollout plan, kill criteria, and metric attachments specified up front. The flag is not a switch; it’s a contract.

Why the dashboard model breaks

The reactive pattern fails in three places at once. First, rollout plans live in someone’s head, not in the repo. Second, kill criteria get debated during an incident, which is the worst possible time. Third, metrics that should attach to a rollout are usually discovered after the fact, by the unlucky engineer paged into the incident.

The result is theatre. Teams adopt feature flags because “it’s good practice,” and then ship code with the flag default-on, no rollback plan, and no canary stage — which is exactly what they did before the flag existed.

What write-ahead looks like

A write-ahead flag is a small YAML manifest that lives next to the code, gets reviewed in the same PR, and is enforced by CI:

flag: pricing.v2
owner: pricing-team
introduced: 2026-02-04
rollout:
  - stage: internal     # cohort: employees       min: 24h
  - stage: canary       # cohort: 1% of traffic   min: 48h
  - stage: ramp         # cohort: 25% of traffic  min: 72h
  - stage: ga           # cohort: 100%            ttl: 30d
kill_if:
  - error_rate.pricing > 0.5%
  - p99_latency.pricing > 800ms
  - revenue_per_session < baseline - 2%
metrics:
  - pricing.conversion
  - pricing.session_revenue
sunset: 2026-04-30

The code that uses the flag references it by typed name, not string. A deleted flag is a compile error. A missing kill criterion is a linter failure. A flag past its sunset date with no extension request gets force-removed in the next release.

Why this changes the shipping rhythm

Three things get easier.

Rollback becomes a commit. The rollout plan is versioned. To roll back, you revert the manifest. The runtime state follows the source of truth, not the other way around.

Staged rollouts become reviewable. Reviewers can look at the manifest the same way they look at the code. If a flag goes straight to 100%, that’s a review comment, not an incident.

Metrics auto-attach. The metrics field tells the observability layer which dashboards to wire up. Engineers stop “forgetting to instrument” because the manifest demands it.

What it costs

Mostly, it costs you the freedom to ship a flag without thinking. That’s the point. The 30 seconds it takes to fill in a manifest is 30 seconds you’ll save 100x over by not having a 2 a.m. page about a runaway feature.

There’s some upfront work: a typed flag layer, a manifest format, CI that validates the manifest, an evaluator that reads kill criteria from your metrics store. All of it can be built in a week if you have a reasonable observability backend already.

The deeper point

Feature flags should be a property of the code, not a runtime accident. When the rollout plan, the kill criteria, and the metric attachments live in the same review as the code, you stop treating production rollouts as a separate phase of work. The flag is part of the change. The change isn’t done until the flag is done.

That’s how you ship faster without shipping chaos.