0004-weekly-rollups-and-jetstream
ADR 0004: Weekly Rollups and Optional JetStream Export
- Status: Proposed
- Date: 2025-09-21
- Author: TBD
- Approver: TBD
Context
The service is intentionally in-memory and read-only. We recently aligned on:
- Adding a
/statsendpoint (snapshot-only) with totals, status/priority counts, progress summaries, and optional durations. - A background sampler to take snapshots every 15 minutes, with a daily pick strategy of first-after-midnight (UTC) and an in-memory ring buffer to retain recent snapshots.
- A need to expose weekly aggregates while keeping the service free of durable persistence.
We also discussed potential external storage for history. JetStream (NATS) can act as a durable event sink without modifying the in-service persistence story.
Decision
We will:
- Keep weekly rollups in memory and expose them via new weekly endpoints.
- Compute weekly rollups at the ISO week boundary (Mon 00:05 UTC) from the 7 daily picks.
- Optionally publish weekly rollups to a NATS JetStream stream for durable storage/integration.
- Auto-provision the JetStream stream on startup if it does not exist; do not mutate existing streams.
- On startup, if JetStream is configured and reachable, backfill recent weekly rollups from the weekly stream into the in-memory ring (bounded by retention and config).
This keeps the core design (in-memory only) intact, while enabling history via an external sink.
Additionally (preferred for production), we will support a decoupled model:
- The main service publishes 15-minute snapshot events to NATS (snapshots stream).
- A separate aggregator service consumes snapshots durably, computes daily picks and weekly rollups, and publishes weekly rollups to JetStream.
- The main service can still keep a small in-memory weekly ring and optional read endpoints for convenience, but durable truth lives in JetStream.
Details
Sampling & Aggregation
- Snapshot cadence: every 15 minutes (
SNAPSHOT_INTERVAL_MINUTES=15). - Daily pick: first-after-midnight (UTC), with a cutoff window (e.g., 00:05).
- Weekly rollup: computed at Monday 00:05 UTC from that week’s daily picks.
- Retention: keep the last N weekly rollups in an in-memory ring buffer (configurable).
Stats Snapshot Endpoint
A new read-only endpoint exposes the current snapshot over the (optionally) filtered project set.
GET /stats— returns counts and summaries for the current adapter store.- Query params: accepts the same scoping filters as
/projects(project_id,customer_id,person_id,priority,q,status,active). Semantics match/projects(OR within a param, AND across params). Stats respect all provided filters. - Response shape:
Notes:
backlog_projectsappears only if backlog semantics are implemented and enabled.durationsfields are included only when sufficient timestamps exist.
Progress Semantics
- Source:
Project.Progressafter clamping to [0,100] during adapter refresh. - Inclusion: all projects in the filtered set are included. The model uses an
int(not a pointer), so a value of0is a valid value and is included. There is no notion of “missing” progress in the current model. - Aggregates:
avg: arithmetic mean over all included values; returned as a floating-point number with one decimal (round half up).p50andp90: nearest-rank percentiles over the sorted ascending list of values. For N values and percentile P, index k = ceil(P/100*N); pick value at 1-based index k. Returned as integers.
- Edge cases:
- If the filtered set is empty (N=0), the
progressobject is omitted. - If N=1,
p50andp90both equal that single value.
- If the filtered set is empty (N=0), the
- Stability: computed on the clamped values; sorting is stable across identical values.
Weekly Rollup Shape
Each weekly record contains:
week_id: ISO week, e.g.,2025-W38period_start,period_end: RFC3339 timestampsend_snapshot: totals from the last daily pick (final day of the period):total_projects,active_projects,inactive_projects,backlog_projects?,active_ratio,avg_progress
weekly_avg: mean across daily picks:active_ratio_avg,avg_progress_avg
deltas: change vs prior week if available:active_projects_delta,backlog_projects_delta?
incomplete: boolean, true when <7 daily picks were availablegenerated_at: RFC3339 server time when rollup computedservice:{ name, version }copied from OpenAPI info
API Endpoints (read-only)
GET /stats/weekly— return recent weekly rollups (up to retention).GET /stats/weekly/{week}— return a single rollup by ISO week id.- Tag under
Analytics. These endpoints read from in-memory data; they do not require JetStream.
JetStream Export (optional)
- Stream:
STATS_WEEKLY - Subject:
stats.weekly.go-test-project.<env>(service-first), configurable - Storage: file; Retention: limits; Replicas: 1 (dev)/3 (prod), configurable
- Max age: 2 years; Duplicate window: 30 days; Max msg size: 64KB
- Auto-provision: if stream missing, create with defaults; do not mutate if it exists (log drift)
- Deduplication: publish with
Nats-Msg-Id=<week_id>andContent-Type=application/json - Backoff: capped exponential retry on transient publish failures
- Degrade gracefully: if NATS unavailable, keep in-memory weekly ring and continue
Startup Backfill
Hydrate the in-memory weekly ring by reading recent records from the weekly stream at process start:
- Scope: only the weekly stream
STATS_WEEKLYis read (not snapshots). - Bounds: backfill up to
min(WEEKLY_HISTORY_WEEKS, NATS_BACKFILL_WEEKS)most recent weeks. - Consumer: use a pull consumer (ephemeral) with batched pulls and a short timeout per batch.
- Order: read in reverse chronological order when supported; otherwise read forward then keep only the most recent N.
- Deduplication: keep the last record per
week_id; ignore duplicates or stale re-publishes. - Resilience: if backfill fails or times out, log and continue with whatever was loaded (or empty) — no hard startup failure.
Preferred (Production): Snapshot Bus + Aggregator
Publish snapshots from the main service and delegate aggregation to a separate consumer:
- Snapshot subject:
stats.snap.go-test-project.<env>(service-first). - Snapshot body: the
/statspayload plus metadata (server_time,captured_at,service {name,version}, optionalinstance_id/env). - Snapshots stream
STATS_SNAPSHOTS:- Subjects:
stats.snap.>; Storage: file; Replicas: 1 (dev)/3 (prod); - Max age: 30–60 days (enough to reprocess); Duplicate window: 7 days.
- Subjects:
- Aggregator (separate service):
- Uses a durable consumer to read snapshots (deliver all, flow control, retries).
- Computes daily picks (first-after-midnight UTC) and weekly rollups (ISO boundary).
- Publishes weekly records to
STATS_WEEKLYwithNats-Msg-Id=<week_id>. - Optionally writes daily picks to a
stats.daily.*stream for audit/backfill.
Configuration
-
Sampler:
SNAPSHOT_INTERVAL_MINUTES=15SNAPSHOT_DAILY_STRATEGY=first-after-midnightWEEKLY_HISTORY_WEEKS=26
-
JetStream:
NATS_URLS(comma-separated)NATS_CREDSorNATS_NKEY_SEED(+ optionalNATS_JWT)NATS_STREAM=STATS_WEEKLYNATS_SUBJECT_WEEKLY=stats.weekly.go-test-project.${ENV}NATS_AUTOPROVISION=trueNATS_ALLOW_STREAM_MUTATION=falseNATS_PUBLISH_TIMEOUT=2sNATS_MAX_RETRIES=5NATS_BACKFILL_ON_START=true— enable startup backfillNATS_BACKFILL_WEEKS=26— max weeks to read at startupNATS_BACKFILL_TIMEOUT=3s— per-batch pull timeout
-
Snapshots (when enabled):
NATS_SNAP_STREAM=STATS_SNAPSHOTSNATS_SUBJECT_SNAPSHOT=stats.snap.go-test-project.${ENV}SNAPSHOT_PUBLISH=true
Rationale
- Preserves the in-memory contract and keeps the service simple.
- Weekly history is immediately useful and small; in-memory retention suits common use.
- JetStream export provides durable history and integration without embedding a database.
Alternatives Considered
- Pure external collector: simplest service, but centralizes logic outside and complicates deployment.
- In-service file/database persistence: violates the current spec, requires a broader architecture change.
- Daily-only publishes: fewer events but weaker week-level integrity and deduplication guarantees across retries.
Consequences
- Slightly increased complexity in the service (sampler, rollup logic, NATS client).
- Adds a dependency on JetStream only when configured; otherwise remains no-op.
- Requires clear ops ownership for stream lifecycle in production (even with auto-provision).
Open Questions
- Exact metric set for weekly rollups (current proposal: totals + averages + optional deltas).
- Whether to include progress percentiles in weekly aggregates.
- Whether to include a small
notesfield for operator comments in weekly records.