0004-weekly-rollups-and-jetstream

ADR 0004: Weekly Rollups and Optional JetStream Export

Status: Proposed
Date: 2025-09-21
Author: TBD
Approver: TBD

Context

The service is intentionally in-memory and read-only. We recently aligned on:

Adding a /stats endpoint (snapshot-only) with totals, status/priority counts, progress summaries, and optional durations.
A background sampler to take snapshots every 15 minutes, with a daily pick strategy of first-after-midnight (UTC) and an in-memory ring buffer to retain recent snapshots.
A need to expose weekly aggregates while keeping the service free of durable persistence.

We also discussed potential external storage for history. JetStream (NATS) can act as a durable event sink without modifying the in-service persistence story.

Decision

We will:

Keep weekly rollups in memory and expose them via new weekly endpoints.
Compute weekly rollups at the ISO week boundary (Mon 00:05 UTC) from the 7 daily picks.
Optionally publish weekly rollups to a NATS JetStream stream for durable storage/integration.
Auto-provision the JetStream stream on startup if it does not exist; do not mutate existing streams.
On startup, if JetStream is configured and reachable, backfill recent weekly rollups from the weekly stream into the in-memory ring (bounded by retention and config).

This keeps the core design (in-memory only) intact, while enabling history via an external sink.

Additionally (preferred for production), we will support a decoupled model:

The main service publishes 15-minute snapshot events to NATS (snapshots stream).
A separate aggregator service consumes snapshots durably, computes daily picks and weekly rollups, and publishes weekly rollups to JetStream.
The main service can still keep a small in-memory weekly ring and optional read endpoints for convenience, but durable truth lives in JetStream.

Details

Sampling & Aggregation

Snapshot cadence: every 15 minutes (SNAPSHOT_INTERVAL_MINUTES=15).
Daily pick: first-after-midnight (UTC), with a cutoff window (e.g., 00:05).
Weekly rollup: computed at Monday 00:05 UTC from that week’s daily picks.
Retention: keep the last N weekly rollups in an in-memory ring buffer (configurable).

Stats Snapshot Endpoint

A new read-only endpoint exposes the current snapshot over the (optionally) filtered project set.

GET /stats — returns counts and summaries for the current adapter store.
Query params: accepts the same scoping filters as /projects (project_id, customer_id, person_id, priority, q, status, active). Semantics match /projects (OR within a param, AND across params). Stats respect all provided filters.
Response shape:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27


{
  "server_time": "2025-09-21T12:34:56Z",   // RFC3339
  "totals": {
    "total_projects": 42,
    "active_projects": 21,
    "inactive_projects": 20,
    "backlog_projects": 1        // optional; present if backlog semantics enabled
  },
  "by_status": [
    { "value": "active", "count": 10, "active": true },
    { "value": "planned", "count": 5, "active": false }
  ],
  "by_priority": [
    { "value": "critical", "count": 3 },
    { "value": "high", "count": 7 }
  ],
  "progress": {
    "avg": 57.3,
    "p50": 60,
    "p90": 95
  },
  "durations": {                   // optional; present when data available
    "avg_age_days": 123.4,         // now - registered_date
    "avg_cycle_time_days": 45.6    // end_date/closed_date - start_date (when both present)
  },
  "service": { "name": "go-test-project", "version": "vX.Y.Z" }
}

Notes:

backlog_projects appears only if backlog semantics are implemented and enabled.
durations fields are included only when sufficient timestamps exist.

Progress Semantics

Source: Project.Progress after clamping to [0,100] during adapter refresh.
Inclusion: all projects in the filtered set are included. The model uses an int (not a pointer), so a value of 0 is a valid value and is included. There is no notion of “missing” progress in the current model.
Aggregates:
- avg: arithmetic mean over all included values; returned as a floating-point number with one decimal (round half up).
- p50 and p90: nearest-rank percentiles over the sorted ascending list of values. For N values and percentile P, index k = ceil(P/100*N); pick value at 1-based index k. Returned as integers.
Edge cases:
- If the filtered set is empty (N=0), the progress object is omitted.
- If N=1, p50 and p90 both equal that single value.
Stability: computed on the clamped values; sorting is stable across identical values.

Weekly Rollup Shape

Each weekly record contains:

week_id: ISO week, e.g., 2025-W38
period_start, period_end: RFC3339 timestamps
end_snapshot: totals from the last daily pick (final day of the period):
- total_projects, active_projects, inactive_projects, backlog_projects?, active_ratio, avg_progress
weekly_avg: mean across daily picks:
- active_ratio_avg, avg_progress_avg
deltas: change vs prior week if available:
- active_projects_delta, backlog_projects_delta?
incomplete: boolean, true when <7 daily picks were available
generated_at: RFC3339 server time when rollup computed
service: { name, version } copied from OpenAPI info

API Endpoints (read-only)

GET /stats/weekly — return recent weekly rollups (up to retention).
GET /stats/weekly/{week} — return a single rollup by ISO week id.
Tag under Analytics. These endpoints read from in-memory data; they do not require JetStream.

JetStream Export (optional)

Stream: STATS_WEEKLY
Subject: stats.weekly.go-test-project.<env> (service-first), configurable
Storage: file; Retention: limits; Replicas: 1 (dev)/3 (prod), configurable
Max age: 2 years; Duplicate window: 30 days; Max msg size: 64KB
Auto-provision: if stream missing, create with defaults; do not mutate if it exists (log drift)
Deduplication: publish with Nats-Msg-Id=<week_id> and Content-Type=application/json
Backoff: capped exponential retry on transient publish failures
Degrade gracefully: if NATS unavailable, keep in-memory weekly ring and continue

Startup Backfill

Hydrate the in-memory weekly ring by reading recent records from the weekly stream at process start:

Scope: only the weekly stream STATS_WEEKLY is read (not snapshots).
Bounds: backfill up to min(WEEKLY_HISTORY_WEEKS, NATS_BACKFILL_WEEKS) most recent weeks.
Consumer: use a pull consumer (ephemeral) with batched pulls and a short timeout per batch.
Order: read in reverse chronological order when supported; otherwise read forward then keep only the most recent N.
Deduplication: keep the last record per week_id; ignore duplicates or stale re-publishes.
Resilience: if backfill fails or times out, log and continue with whatever was loaded (or empty) — no hard startup failure.

Preferred (Production): Snapshot Bus + Aggregator

Publish snapshots from the main service and delegate aggregation to a separate consumer:

Snapshot subject: stats.snap.go-test-project.<env> (service-first).
Snapshot body: the /stats payload plus metadata (server_time, captured_at, service {name,version}, optional instance_id/env).
Snapshots stream STATS_SNAPSHOTS:
- Subjects: stats.snap.>; Storage: file; Replicas: 1 (dev)/3 (prod);
- Max age: 30–60 days (enough to reprocess); Duplicate window: 7 days.
Aggregator (separate service):
- Uses a durable consumer to read snapshots (deliver all, flow control, retries).
- Computes daily picks (first-after-midnight UTC) and weekly rollups (ISO boundary).
- Publishes weekly records to STATS_WEEKLY with Nats-Msg-Id=<week_id>.
- Optionally writes daily picks to a stats.daily.* stream for audit/backfill.

Configuration

Sampler:
- SNAPSHOT_INTERVAL_MINUTES=15
- SNAPSHOT_DAILY_STRATEGY=first-after-midnight
- WEEKLY_HISTORY_WEEKS=26
JetStream:
- NATS_URLS (comma-separated)
- NATS_CREDS or NATS_NKEY_SEED (+ optional NATS_JWT)
- NATS_STREAM=STATS_WEEKLY
- NATS_SUBJECT_WEEKLY=stats.weekly.go-test-project.${ENV}
- NATS_AUTOPROVISION=true
- NATS_ALLOW_STREAM_MUTATION=false
- NATS_PUBLISH_TIMEOUT=2s
- NATS_MAX_RETRIES=5
- NATS_BACKFILL_ON_START=true — enable startup backfill
- NATS_BACKFILL_WEEKS=26 — max weeks to read at startup
- NATS_BACKFILL_TIMEOUT=3s — per-batch pull timeout
Snapshots (when enabled):
- NATS_SNAP_STREAM=STATS_SNAPSHOTS
- NATS_SUBJECT_SNAPSHOT=stats.snap.go-test-project.${ENV}
- SNAPSHOT_PUBLISH=true

Rationale

Preserves the in-memory contract and keeps the service simple.
Weekly history is immediately useful and small; in-memory retention suits common use.
JetStream export provides durable history and integration without embedding a database.

Alternatives Considered

Pure external collector: simplest service, but centralizes logic outside and complicates deployment.
In-service file/database persistence: violates the current spec, requires a broader architecture change.
Daily-only publishes: fewer events but weaker week-level integrity and deduplication guarantees across retries.

Consequences

Slightly increased complexity in the service (sampler, rollup logic, NATS client).
Adds a dependency on JetStream only when configured; otherwise remains no-op.
Requires clear ops ownership for stream lifecycle in production (even with auto-provision).

Open Questions

Exact metric set for weekly rollups (current proposal: totals + averages + optional deltas).
Whether to include progress percentiles in weekly aggregates.
Whether to include a small notes field for operator comments in weekly records.