0005-snapshot-metadata-and-distribution

ADR 0005: Snapshot metadata and progress distribution

Date: 2025-09-21 Status: accepted

Context

The service periodically produces point-in-time analytics snapshots (/stats) which are used by external aggregators to compute weekly rollups. When multiple service instances publish snapshots, aggregators need lightweight metadata to deduplicate or attribute samples to instances. Additionally, consumers requested a compact representation of project progress distribution that is efficient to transmit and simple to compute on the producer side.

Decision

Add optional instance metadata to each SnapshotStats payload:
- instance_id (string, optional): an identifier for the producing service instance. Populated from the INSTANCE_ID environment variable when present.
- env (string, optional): the runtime environment name (e.g., prod, staging), populated from the ENV environment variable when present.
Add a compact progress_distribution histogram to SnapshotStats:
- Representation: array of 10 integers where index 0 counts projects with progress in [0..9]%, index 1 counts [10..19]%, …, index 9 counts [90..100]%. This is intentionally coarse but sufficient for visualizations and rollup computations.
Producers will compute the histogram by inspecting Project.Progress (clamped 0..100) and incrementing the appropriate bucket. Consumers can expand buckets as needed.

Rationale:

instance_id is lightweight and gives aggregators a mechanism to deduplicate or track per-instance contribution without introducing distributed consensus or durable instance registries.
A fixed 10-bucket histogram is simple to compute, small to transmit, and covers the common visualization needs without needing full percentiles or heavy-weight sketches.

Consequences

Positive:
- Aggregators can detect duplicate samples and attribute samples to instances.
- Reduced bandwidth for progress distribution compared to sending full sample lists.
- Simple producer implementation (single pass over projects).
Negative / trade-offs:
- Lossy histogram: fine-grained percentile computation is less accurate than full distributions.
- Reliance on INSTANCE_ID environment being correctly set for deduplication — if absent, aggregators must rely on other signals (e.g., source IP, service name + timestamp) which may be less reliable.

Next steps

Update SPEC.md and docs/examples.md to document instance_id, env, and progress_distribution (done).
Add OpenAPI schema examples for the new fields (optional).
Consider an ADR or note if a different histogram resolution or sketch (e.g., HDR or DDSketch) is later required.