0004-weekly-rollups-and-jetstream

ADR 0004: Weekly Rollups and Optional JetStream Export

  • Status: Proposed
  • Date: 2025-09-21
  • Author: TBD
  • Approver: TBD

Context

The service is intentionally in-memory and read-only. We recently aligned on:

  • Adding a /stats endpoint (snapshot-only) with totals, status/priority counts, progress summaries, and optional durations.
  • A background sampler to take snapshots every 15 minutes, with a daily pick strategy of first-after-midnight (UTC) and an in-memory ring buffer to retain recent snapshots.
  • A need to expose weekly aggregates while keeping the service free of durable persistence.

We also discussed potential external storage for history. JetStream (NATS) can act as a durable event sink without modifying the in-service persistence story.

Decision

We will:

  1. Keep weekly rollups in memory and expose them via new weekly endpoints.
  2. Compute weekly rollups at the ISO week boundary (Mon 00:05 UTC) from the 7 daily picks.
  3. Optionally publish weekly rollups to a NATS JetStream stream for durable storage/integration.
  4. Auto-provision the JetStream stream on startup if it does not exist; do not mutate existing streams.
  5. On startup, if JetStream is configured and reachable, backfill recent weekly rollups from the weekly stream into the in-memory ring (bounded by retention and config).

This keeps the core design (in-memory only) intact, while enabling history via an external sink.

Additionally (preferred for production), we will support a decoupled model:

  • The main service publishes 15-minute snapshot events to NATS (snapshots stream).
  • A separate aggregator service consumes snapshots durably, computes daily picks and weekly rollups, and publishes weekly rollups to JetStream.
  • The main service can still keep a small in-memory weekly ring and optional read endpoints for convenience, but durable truth lives in JetStream.

Details

Sampling & Aggregation

  • Snapshot cadence: every 15 minutes (SNAPSHOT_INTERVAL_MINUTES=15).
  • Daily pick: first-after-midnight (UTC), with a cutoff window (e.g., 00:05).
  • Weekly rollup: computed at Monday 00:05 UTC from that week’s daily picks.
  • Retention: keep the last N weekly rollups in an in-memory ring buffer (configurable).

Stats Snapshot Endpoint

A new read-only endpoint exposes the current snapshot over the (optionally) filtered project set.

  • GET /stats — returns counts and summaries for the current adapter store.
  • Query params: accepts the same scoping filters as /projects (project_id, customer_id, person_id, priority, q, status, active). Semantics match /projects (OR within a param, AND across params). Stats respect all provided filters.
  • Response shape:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
{
  "server_time": "2025-09-21T12:34:56Z",   // RFC3339
  "totals": {
    "total_projects": 42,
    "active_projects": 21,
    "inactive_projects": 20,
    "backlog_projects": 1        // optional; present if backlog semantics enabled
  },
  "by_status": [
    { "value": "active", "count": 10, "active": true },
    { "value": "planned", "count": 5, "active": false }
  ],
  "by_priority": [
    { "value": "critical", "count": 3 },
    { "value": "high", "count": 7 }
  ],
  "progress": {
    "avg": 57.3,
    "p50": 60,
    "p90": 95
  },
  "durations": {                   // optional; present when data available
    "avg_age_days": 123.4,         // now - registered_date
    "avg_cycle_time_days": 45.6    // end_date/closed_date - start_date (when both present)
  },
  "service": { "name": "go-test-project", "version": "vX.Y.Z" }
}

Notes:

  • backlog_projects appears only if backlog semantics are implemented and enabled.
  • durations fields are included only when sufficient timestamps exist.

Progress Semantics

  • Source: Project.Progress after clamping to [0,100] during adapter refresh.
  • Inclusion: all projects in the filtered set are included. The model uses an int (not a pointer), so a value of 0 is a valid value and is included. There is no notion of “missing” progress in the current model.
  • Aggregates:
    • avg: arithmetic mean over all included values; returned as a floating-point number with one decimal (round half up).
    • p50 and p90: nearest-rank percentiles over the sorted ascending list of values. For N values and percentile P, index k = ceil(P/100*N); pick value at 1-based index k. Returned as integers.
  • Edge cases:
    • If the filtered set is empty (N=0), the progress object is omitted.
    • If N=1, p50 and p90 both equal that single value.
  • Stability: computed on the clamped values; sorting is stable across identical values.

Weekly Rollup Shape

Each weekly record contains:

  • week_id: ISO week, e.g., 2025-W38
  • period_start, period_end: RFC3339 timestamps
  • end_snapshot: totals from the last daily pick (final day of the period):
    • total_projects, active_projects, inactive_projects, backlog_projects?, active_ratio, avg_progress
  • weekly_avg: mean across daily picks:
    • active_ratio_avg, avg_progress_avg
  • deltas: change vs prior week if available:
    • active_projects_delta, backlog_projects_delta?
  • incomplete: boolean, true when <7 daily picks were available
  • generated_at: RFC3339 server time when rollup computed
  • service: { name, version } copied from OpenAPI info

API Endpoints (read-only)

  • GET /stats/weekly — return recent weekly rollups (up to retention).
  • GET /stats/weekly/{week} — return a single rollup by ISO week id.
  • Tag under Analytics. These endpoints read from in-memory data; they do not require JetStream.

JetStream Export (optional)

  • Stream: STATS_WEEKLY
  • Subject: stats.weekly.go-test-project.<env> (service-first), configurable
  • Storage: file; Retention: limits; Replicas: 1 (dev)/3 (prod), configurable
  • Max age: 2 years; Duplicate window: 30 days; Max msg size: 64KB
  • Auto-provision: if stream missing, create with defaults; do not mutate if it exists (log drift)
  • Deduplication: publish with Nats-Msg-Id=<week_id> and Content-Type=application/json
  • Backoff: capped exponential retry on transient publish failures
  • Degrade gracefully: if NATS unavailable, keep in-memory weekly ring and continue

Startup Backfill

Hydrate the in-memory weekly ring by reading recent records from the weekly stream at process start:

  • Scope: only the weekly stream STATS_WEEKLY is read (not snapshots).
  • Bounds: backfill up to min(WEEKLY_HISTORY_WEEKS, NATS_BACKFILL_WEEKS) most recent weeks.
  • Consumer: use a pull consumer (ephemeral) with batched pulls and a short timeout per batch.
  • Order: read in reverse chronological order when supported; otherwise read forward then keep only the most recent N.
  • Deduplication: keep the last record per week_id; ignore duplicates or stale re-publishes.
  • Resilience: if backfill fails or times out, log and continue with whatever was loaded (or empty) — no hard startup failure.

Preferred (Production): Snapshot Bus + Aggregator

Publish snapshots from the main service and delegate aggregation to a separate consumer:

  • Snapshot subject: stats.snap.go-test-project.<env> (service-first).
  • Snapshot body: the /stats payload plus metadata (server_time, captured_at, service {name,version}, optional instance_id/env).
  • Snapshots stream STATS_SNAPSHOTS:
    • Subjects: stats.snap.>; Storage: file; Replicas: 1 (dev)/3 (prod);
    • Max age: 30–60 days (enough to reprocess); Duplicate window: 7 days.
  • Aggregator (separate service):
    • Uses a durable consumer to read snapshots (deliver all, flow control, retries).
    • Computes daily picks (first-after-midnight UTC) and weekly rollups (ISO boundary).
    • Publishes weekly records to STATS_WEEKLY with Nats-Msg-Id=<week_id>.
    • Optionally writes daily picks to a stats.daily.* stream for audit/backfill.

Configuration

  • Sampler:

    • SNAPSHOT_INTERVAL_MINUTES=15
    • SNAPSHOT_DAILY_STRATEGY=first-after-midnight
    • WEEKLY_HISTORY_WEEKS=26
  • JetStream:

    • NATS_URLS (comma-separated)
    • NATS_CREDS or NATS_NKEY_SEED (+ optional NATS_JWT)
    • NATS_STREAM=STATS_WEEKLY
    • NATS_SUBJECT_WEEKLY=stats.weekly.go-test-project.${ENV}
    • NATS_AUTOPROVISION=true
    • NATS_ALLOW_STREAM_MUTATION=false
    • NATS_PUBLISH_TIMEOUT=2s
    • NATS_MAX_RETRIES=5
    • NATS_BACKFILL_ON_START=true — enable startup backfill
    • NATS_BACKFILL_WEEKS=26 — max weeks to read at startup
    • NATS_BACKFILL_TIMEOUT=3s — per-batch pull timeout
  • Snapshots (when enabled):

    • NATS_SNAP_STREAM=STATS_SNAPSHOTS
    • NATS_SUBJECT_SNAPSHOT=stats.snap.go-test-project.${ENV}
    • SNAPSHOT_PUBLISH=true

Rationale

  • Preserves the in-memory contract and keeps the service simple.
  • Weekly history is immediately useful and small; in-memory retention suits common use.
  • JetStream export provides durable history and integration without embedding a database.

Alternatives Considered

  • Pure external collector: simplest service, but centralizes logic outside and complicates deployment.
  • In-service file/database persistence: violates the current spec, requires a broader architecture change.
  • Daily-only publishes: fewer events but weaker week-level integrity and deduplication guarantees across retries.

Consequences

  • Slightly increased complexity in the service (sampler, rollup logic, NATS client).
  • Adds a dependency on JetStream only when configured; otherwise remains no-op.
  • Requires clear ops ownership for stream lifecycle in production (even with auto-provision).

Open Questions

  • Exact metric set for weekly rollups (current proposal: totals + averages + optional deltas).
  • Whether to include progress percentiles in weekly aggregates.
  • Whether to include a small notes field for operator comments in weekly records.