Usage-based pricing implementation: metering and reconciliation

Q: Should I bill from raw events or from aggregated totals?

Use both: - Raw events (append-only): the source of truth for audit, disputes, and backfills - Aggregated usage: fast queries for dashboards and invoicing If you only store aggregates, one buggy rule can permanently corrupt history. If you only store raw events, invoices and dashboards get slow and expensive.

Q: How do I prevent double charging when retries happen?

Make duplicates impossible to count by design: - Generate an idempotency key that represents the real-world action , not the HTTP attempt - Enforce uniqueness at the first durable write (for example, a unique constraint) - Accept retries safely: ingest idempotently, then aggregate from stored events This way a timeout-and-retry can’t turn into a double charge.

Q: What should I do with late or out-of-order events?

Pick a clear policy and automate it. A practical default: - Aggregate by (event time), not ingestion time - “Close and freeze” a billed period so late arrivals don’t rewrite invoices - Record late usage as an adjustment on the next invoice with a reason This keeps accounting clean and avoids surprises where past invoices silently change.

Q: How can support answer “why was I charged?” quickly?

Make invoices explainable with a consistent “paper trail”: - Store the raw events behind the charge - Store the aggregation version and pricing-rule version used - Keep an invoice snapshot that can be reproduced later When a ticket arrives, support should be able to answer: - Which events created the line item - Whether any duplicates were removed (and why) - Whether an adjustment or credit was applied That turns disputes into a quick lookup instead of a manual investigation.

Usage-based pricing implementation: metering and reconciliation | Koder.ai

What goes wrong with usage billing, in plain terms

Usage billing breaks when the number on the invoice doesn't match what your product actually delivered. The gap can be tiny at first (a few missing API calls), then grow into refunds, angry tickets, and a finance team that stops trusting dashboards.

The causes are usually predictable. Events go missing because a service crashed before it reported usage, a queue was down, or a client went offline. Events get counted twice because retries happened, workers reprocessed the same message, or an import job ran again. Time adds its own problems: clock drift between servers, time zones, daylight savings, and late-arriving events can push usage into the wrong billing period.

A quick example: a chat product that charges per AI generation might emit one event when a request starts, then another when it finishes. If you bill from the start event, you can charge for failures. If you bill from the finish event, you can miss usage when the final callback never arrives. If both get billed, you double charge.

Multiple people need to trust the same numbers:

Customers need invoices that match what they felt they used.
Support needs a clear trail to answer “why was I charged?” quickly.
Finance needs totals they can close the books with, not estimates.
Engineering needs signals that catch metering bugs before they hit money.

The target isn't only accurate totals. It's explainable invoices and fast dispute handling. If you can't trace a line item back to raw usage, one outage can turn your billing into guesswork, and that's when billing bugs become billing incidents.

Define the billable units and the billing rules

Start with one simple question: what, exactly, are you charging for? If you can't explain the unit and the rules in a minute, the system will end up guessing and customers will notice.

Pick one primary billable unit per meter. Common choices are API calls, requests, tokens, minutes of compute, GB stored, GB transferred, or seats. Avoid blended units (like “active user minutes”) unless you truly need them. They are harder to audit and explain.

Define the boundaries of usage. Be specific about when usage starts and ends: does a trial include metered overages, or is it free up to a cap? If you offer a grace period, does usage during grace get billed later, or forgiven? Plan changes are where confusion spikes. Decide whether you prorate, reset allowances immediately, or apply changes at the next billing cycle.

Write down rounding and minimums instead of letting them be implied. For example: round up to the nearest second, minute, or 1,000 tokens; apply a daily minimum charge; or enforce a minimum billable increment (like 1 MB). Small rules like this create big “why was I charged?” tickets.

Rules worth pinning down early:

The billable unit and its exact definition.
When counting starts and stops (trial, grace, cancellation, plan change).
Rounding rules, minimum charges, and free tiers.
How refunds, credits, and goodwill adjustments apply to overages.

Example: a team is on Pro, then upgrades mid-month. If you reset allowances on upgrade, they might effectively get two free allowances in one month. If you don't reset, they might feel punished for upgrading. Either choice can be valid, but it must be consistent, documented, and testable.

What events to track (and the fields you will regret skipping)

Decide what counts as a billable event and write it down as data. If you can't replay the story of “what happened” from events alone, you'll end up guessing during disputes.

Event types to record

Track more than “usage happened.” You also need the events that change what the customer should pay.

Usage consumed (the billable action: API call, token, minute, seat-day, etc.).
Credit granted (promo credits, make-good credits, referral credits).
Refund or adjustment (manual or automated corrections).
Plan change (upgrade, downgrade, trial start/end).
Cancellation (and any end-of-service timestamp).

Fields you will miss later

Most billing bugs come from missing context. Capture the boring fields now so support, finance, and engineering can answer questions later.

Tenant or account ID, plus optional user ID (who pays, who triggered it).
Precise timestamp in UTC (and an ingestion timestamp, separately).
Amount and unit (10 requests, 3.2 GB-hours, 1 seat-day).
Source (service name, environment, and the exact feature name).
A stable idempotency key (unique per real-world action) to prevent duplicates.

Support-grade metadata also pays off: request ID or trace ID, region, app version, and the pricing rules version that applied. When a customer says “I was charged twice at 2:03 PM,” those fields are what let you prove what happened, reverse it safely, and prevent a repeat.

Where to emit events so they can be trusted

The first rule is simple: emit billable events from the system that truly knows the work happened. Most of the time, that's your server, not the browser or the mobile app.

Client-side counters are easy to fake and easy to lose. Users can block requests, replay them, or run old code. Even without bad intent, mobile apps crash, clocks drift, and retries happen. If you must read a client signal, treat it as a hint, not the invoice.

A practical approach is to emit usage when your backend crosses an irreversible point, like when you persisted a record, completed a job, or delivered a response you can prove was produced. Trusted emission points include:

After a successful write to the primary database (the action is now durable).
After a background job finishes (not when it is queued).
At an API gateway or backend endpoint right after authorization (with the final status code).
At the worker that actually consumed compute or called a paid third-party API.
In the billing service itself, when it confirms a paid feature was unlocked.

Offline mobile is the main exception. If a Flutter app needs to work without a connection, it may track usage locally and upload later. Add guardrails: include a unique event ID, device ID, and a monotonic sequence number, and have the server validate what it can (account status, plan limits, duplicate IDs, impossible timestamps). When the app reconnects, the server should accept events idempotently so retries don't double charge.

Event timing depends on what users expect to see. Real time works for API calls where customers watch usage in a dashboard. Near real time (every few minutes) is often enough and cheaper. Batch can work for high-volume signals (like storage scans), but be clear about delays and use the same source-of-truth rules so late data doesn't silently change past invoices.

Where to calculate totals: raw events vs aggregated usage

You need two things that feel redundant but save you later: immutable raw events (what happened) and derived totals (what you bill). Raw events are your source of truth. Aggregated usage is what you query quickly, explain to customers, and turn into invoices.

You can compute totals in two common places. Doing it in the database (SQL jobs, materialized tables, scheduled queries) is simpler to operate at first and keeps the logic close to the data. A dedicated aggregator service (a small worker that reads events and writes rollups) is easier to version, test, and scale, and it can enforce consistent rules across products.

Why you should keep both layers

Raw events protect you from bugs, refunds, and disputes. Aggregates protect you from slow invoices and expensive queries. If you only store aggregates, one wrong rule can permanently corrupt history.

A practical setup:

Store append-only raw events.
Build rollups (hourly and daily) for fast reporting.
Build a billing-period total used only for invoicing.

Make aggregation windows explicit. Pick a billing time zone (often the customer's, or UTC for everyone) and stick to it. “Day” boundaries change with time zones, and customers notice when usage shifts between days.

Late and out-of-order events are normal (mobile offline, retries, queue delays). Don't silently change a past invoice because a late event arrived. Use a close-and-freeze rule: once a billing period is invoiced, write corrections as an adjustment in the next invoice with a clear reason.

Example: if API calls are billed monthly, you can roll up hourly counts for dashboards, daily counts for alerts, and a monthly frozen total for invoicing. If 200 calls arrive two days late, record them, but bill them as a +200 adjustment next month, not by rewriting last month’s invoice.

A simple step-by-step metering pipeline

Design events you can audit

Draft event schemas with idempotency keys and versioned pricing rules in one place.

Start Free

A working usage pipeline is mostly data flow with strong guardrails. Get the order right and you can change pricing later without reprocessing everything by hand.

Step 1: make events consistent before you trust them

When an event arrives, validate it and normalize it immediately. Check required fields, convert units (bytes to GB, seconds to minutes), and clamp timestamps to a clear rule (event time vs received time). If something is invalid, store it as rejected with a reason instead of quietly dropping it.

After normalization, keep an append-only mindset and never “fix” history in place. Raw events are your source of truth.

Steps 2-6 in practice

This flow works for most products:

Store immutable raw events (append-only), including the normalized payload and the original payload.
Dedupe with an idempotency key and a uniqueness rule (for example: account_id + event_name + idempotency_key).
Aggregate into per-customer totals by billing period (hourly or daily rollups are often enough).
Price the totals into invoice-ready line items (tiers, included bundles, minimums, discounts).
Generate an invoice draft that references the exact aggregation version used.

Then freeze the invoice version. “Freeze” means keeping an audit trail that answers: which raw events, which dedupe rule, which aggregation code version, and which pricing rules produced these line items. If you later change a price or fix a bug, create a new invoice revision, not a silent edit.

How to avoid double charging and missing usage

Double charging and missing usage usually come from the same root problem: your system can't tell whether an event is new, duplicated, or lost. This is less about clever billing logic and more about strict controls around event identity and validation.

Idempotency keys are the first line of defense. Generate a key that's stable for the real-world action, not the HTTP request. A good key is deterministic and unique per billable unit, for example: tenant_id + billable_action + source_record_id + time_bucket (only use a time bucket when the unit is time-based). Enforce it at the first durable write, typically your ingestion database or event log, with a unique constraint so duplicates can't land.

Retries and timeouts are normal, so design for them. A client may send the same event again after a 504 even if you already received it. Your rule should be: accept repeats, but don't count them twice. Keep receiving separate from counting: ingest once (idempotent), then aggregate from stored events.

Validation prevents “impossible usage” from corrupting totals. Validate at ingest and again at aggregation, because bugs happen in both places.

Reject negative quantities unless your product truly supports credits or refunds as a different event type.
Lock units to a single canonical form (seconds vs milliseconds, tokens vs characters).
Require currency-like precision rules (for example, integer units only) when you can.
Only allow known meters and known plan mappings.

Missing usage is hardest to notice, so treat ingestion errors as first-class data. Store failed events separately with the same fields as successful ones (including idempotency key), plus an error reason and a retry count.

Reconciliation checks that catch billing bugs early

Create your billing backend

Spin up a Go + PostgreSQL backend for usage events and rollups without boilerplate.

Start Building

Reconciliation checks are the boring guardrails that catch “we charged too much” and “we missed usage” before customers notice.

Start by reconciling the same time window in two places: raw events and aggregated usage. Pick a fixed window (for example, yesterday in UTC), then compare counts, sums, and unique IDs. Small differences happen (late events, retries), but they should be explained by known rules, not mystery.

Next, reconcile what you billed against what you priced. An invoice should be reproducible from a priced usage snapshot: the exact usage totals, the exact price rules, the exact currency, and the exact rounding. If the invoice changes when you rerun the calculation later, you don't have an invoice, you have a guess.

Daily sanity checks catch issues that are not “wrong math” but “weird reality”:

Zero usage for a normally active customer (possible ingestion failure).
Sudden spikes (possible duplicate events or retry storms).
Sudden drops right after a deploy (possible meter rename or filtering bug).
Outliers compared to the customer's own history (possible time window error).
Outliers compared to similar customers (possible pricing tier mapping bug).

When you find a problem, you'll need a backfill process. Backfills should be intentional and logged. Record what changed, which window, which customers, who triggered it, and the reason. Treat adjustments like accounting entries, not silent edits.

A simple dispute workflow keeps support calm. When a customer questions a charge, you should be able to reproduce their invoice from raw events using the same snapshot and pricing version. That turns a vague complaint into a fixable bug.

Common mistakes and traps (so you do not learn them in production)

Most billing fires aren't caused by complex math. They come from small assumptions that only break at the worst time: end of month, after an upgrade, or during a retry storm. Staying careful is mostly about picking one truth for time, identity, and rules, then refusing to bend it.

The traps that create wrong invoices

These show up again and again, even in mature teams:

Using the wrong timestamp: If you bill by ingestion time instead of event time, a delayed batch can push real usage into the next month. Pick one “billing time” field, document it, and keep ingestion time only for debugging.
Counting the same action twice: It's easy to meter at the API gateway and also inside the app service. If both emit billable events, you double charge. Decide which layer is the source of truth for each unit.
Plan changes breaking totals: Mid-cycle upgrades can split a month into two rule sets. If you apply the new price to the whole month, customers will notice. You need proration rules and clear “effective from” times.
Rewriting history by accident: If you don't version pricing rules, reruns and backfills can recalculate old invoices with new prices. Store the pricing version used for every invoice line.
Not testing failure reality: Retries, partial failures, concurrency, and backfills are normal. If your pipeline isn't idempotent, the same event can be billed twice or dropped silently.

Example: a customer upgrades on the 20th and your event processor retries a day’s data after a timeout. Without idempotency keys and rule versioning, you can duplicate the 19th and price the 1st-19th at the new rate.

Example: turning real usage events into an invoice

Here’s a simple example for one customer, Acme Co, billed on three meters: API calls, storage (GB-days), and premium feature runs.

These are the events your app emits over one day (Jan 5). Notice the fields that make the story easy to reconstruct later: event_id, customer_id, occurred_at, meter, quantity, and an idempotency key.

{"event_id":"evt_1001","customer_id":"cust_acme","occurred_at":"2026-01-05T09:12:03Z","meter":"api_calls","quantity":1,"idempotency_key":"req_7f2"}
{"event_id":"evt_1002","customer_id":"cust_acme","occurred_at":"2026-01-05T09:12:03Z","meter":"api_calls","quantity":1,"idempotency_key":"req_7f2"}
{"event_id":"evt_1003","customer_id":"cust_acme","occurred_at":"2026-01-05T10:00:00Z","meter":"storage_gb_days","quantity":42.0,"idempotency_key":"daily_storage_2026-01-05"}
{"event_id":"evt_1004","customer_id":"cust_acme","occurred_at":"2026-01-05T15:40:10Z","meter":"premium_runs","quantity":3,"idempotency_key":"run_batch_991"}

At month end, your aggregation job groups raw events by customer_id, meter, and billing period. The totals for January are sums across the month: API calls sum to 1,240,500; storage GB-days sum to 1,310.0; premium runs sum to 68.

Now a late event arrives on Feb 2, but it belongs to Jan 31 (a mobile client was offline). Because you aggregate by occurred_at (not ingest time), the January totals change. You either (a) generate an adjustment line on the next invoice or (b) reissue January if your policy allows it.

Reconciliation catches a bug here: evt_1001 and evt_1002 share the same idempotency_key (req_7f2). Your check flags “two billable events for one request” and marks one as a duplicate before invoicing.

Support can explain it plainly: “We saw the same API request reported twice due to a retry. We removed the duplicate usage event, so you’re charged once. Your invoice includes an adjustment reflecting the corrected total.”

Quick checklist before you ship usage billing

Plan the billing rules clearly

Map meters, rules, and reconciliation checks before you write a single event schema.

Use Planning

Before you turn on billing, treat your usage system like a small financial ledger. If you can't replay the same raw data and get the same totals, you'll spend nights chasing “impossible” charges.

Use this checklist as a final gate:

Every event is complete and traceable. Each record includes customer ID, timestamp (with time zone), unit name, amount, source (service/job name), and an idempotency key so retries don't create extra usage.
Raw events are append-only. No edits, no deletes. If something needs correction, write a new adjustment event. Aggregates should be derived from raw events and reproducible from scratch.
Totals agree in three places. For a sampled set of customers and days, raw-event totals match your aggregated usage tables, and both match the “invoice snapshot” stored at billing time.
Plan changes and money moves are explicit events. Upgrades, downgrades, mid-cycle prorations, refunds, and credits are modeled as events (or ledger entries), not hidden logic in a billing script.
You have safety alarms. Alerts fire for missing ingestion (no events when there should be), sudden spikes or drops, negative totals, and repeated idempotency keys. Include a daily reconciliation job that reports deltas, not just pass/fail.

A practical test: pick one customer, replay the last 7 days of raw events into a clean database, then generate usage and an invoice. If the result differs from production, you have a determinism problem, not a math problem.

Next steps: shipping safely and iterating without surprises

Treat the first release like a pilot. Pick one billable unit (for example, “API calls” or “GB stored”) and one reconciliation report that compares what you expected to bill vs what you actually billed. Once that stays stable for a full cycle, add the next unit.

Make support and finance successful on day one by giving them a simple internal page that shows both sides: raw events and the computed totals that end up on the invoice. When a customer asks “why was I charged?”, you want a single screen that answers it in minutes.

Before you charge real money, replay reality. Use staging data to simulate a full month of usage, run your aggregation, generate invoices, and compare them to what you'd expect if you counted manually for a small sample of accounts. Pick a few customers with different patterns (low, spiky, steady) and verify their totals are consistent across raw events, daily aggregates, and invoice lines.

If you're building the metering service itself, a vibe-coding platform like Koder.ai (koder.ai) can be a quick way to prototype an internal admin UI and a Go + PostgreSQL backend, then export the source code once the logic is stable.

When billing rules change, reduce risk with a release routine:

Snapshot the current rules and aggregation logic before changes.
Run a full-month replay in staging with the new rules.
Compare old vs new invoices for the same period.
Roll back quickly if totals drift or reconciliation fails.
Expand to new units only after one clean billing cycle.

FAQ

What does it mean when “usage billing breaks”?

Usage billing breaks when the invoice total doesn’t match what the product actually delivered.

Common causes are:

Missing events (crashes, queue outages, offline clients)
Duplicate events (retries, reprocessing, reruns)
Time issues (clock drift, time zones, late events landing in the wrong period)

The fix is less about “better math” and more about making events trustworthy, deduped, and explainable end-to-end.

How do I choose the right billable unit and rules?

Pick one clear unit per meter and define it in one sentence (for example: “one successful API request” or “one AI generation completed”).

Then write down the rules customers will argue about:

When counting starts/stops (trial, grace, cancellation)
What happens on plan changes (prorate vs reset vs next-cycle)
Rounding and minimum increments

If you can’t explain the unit and rules quickly, you’ll struggle to audit and support it later.

Which event types should I track for usage billing?

Track both usage and “money-changing” events, not just consumption.

At minimum:

Usage consumed (the billable action)
Credit granted (promos, referrals, make-good credits)
Refund/adjustment (manual or automated)
Plan change (upgrade/downgrade, trial start/end)
Cancellation (including end-of-service time)

This keeps invoices reproducible when plans change or corrections happen.

What fields should every usage event include?

Capture the context you’ll need to answer “why was I charged?” without guesswork:

Account/tenant ID (and optional user ID)
occurred_at timestamp in UTC and an ingestion timestamp
Quantity + unit (keep a single canonical unit)
Meter/feature name + source service/job name
A stable idempotency key (unique per real-world action)

Support-grade extras (request/trace ID, region, app version, pricing-rule version) make disputes much faster to resolve.

Where should usage events be emitted from so they’re trustworthy?

Emit billable events from the system that truly knows the work happened—usually your backend, not the browser or mobile app.

Good emission points are “irreversible” moments, like:

After a successful write to the primary database
After a background job finishes
Right after authorization with the final status code
In the worker that actually consumes compute or calls paid APIs

Client-side signals are easy to lose and easy to spoof, so treat them as hints unless you can validate them strongly.

Should I bill from raw events or from aggregated totals?

Use both:

Raw events (append-only): the source of truth for audit, disputes, and backfills
Aggregated usage: fast queries for dashboards and invoicing

If you only store aggregates, one buggy rule can permanently corrupt history. If you only store raw events, invoices and dashboards get slow and expensive.

How do I prevent double charging when retries happen?

Make duplicates impossible to count by design:

Generate an idempotency key that represents the real-world action, not the HTTP attempt
Enforce uniqueness at the first durable write (for example, a unique constraint)
Accept retries safely: ingest idempotently, then aggregate from stored events

This way a timeout-and-retry can’t turn into a double charge.

What should I do with late or out-of-order events?

Pick a clear policy and automate it.

A practical default:

Aggregate by occurred_at (event time), not ingestion time
“Close and freeze” a billed period so late arrivals don’t rewrite invoices
Record late usage as an adjustment on the next invoice with a reason

This keeps accounting clean and avoids surprises where past invoices silently change.

What reconciliation checks catch billing bugs before customers do?

Run small, boring checks every day—those catch the expensive bugs early.

Useful reconciliations:

Raw events vs aggregates for the same window (counts, sums, unique IDs)
Priced totals vs invoice line items (reproducibility with the same rule versions)
Anomaly checks (sudden spikes/drops, zero usage for active customers)

Differences should be explainable by known rules (late events, dedupe), not mystery deltas.

How can support answer “why was I charged?” quickly?

Make invoices explainable with a consistent “paper trail”:

Store the raw events behind the charge
Store the aggregation version and pricing-rule version used
Keep an invoice snapshot that can be reproduced later

When a ticket arrives, support should be able to answer:

Which events created the line item
Whether any duplicates were removed (and why)