Comprehensive Architecture

Comprehensive Architecture Documentation

Table of Contents

  1. Overview
  2. Core Principles
  3. System Architecture
  4. Package Structure
  5. Data Flow
  6. Key Subsystems
  7. Extension Points
  8. Operational Considerations

Overview

DocBuilder is a Go CLI tool and daemon that aggregates documentation from multiple Git repositories into a unified Hugo static site. It implements a staged pipeline architecture with event sourcing, typed configuration/state management, and comprehensive observability.

Key Characteristics

  • Event-Driven: Build lifecycle modeled as events in an event store
  • Type-Safe: Strongly typed configuration and state (no map[string]any in primary paths)
  • Observable: Unified error system, structured logging, metrics, and tracing
  • Incremental: Change detection and partial rebuilds for performance
  • Multi-Tenant: Supports forge namespacing and per-repository configuration
  • Theme-Aware: Hugo theme integration via modules (Relearn only)

Core Principles

1. Clean Architecture

The codebase follows clean architecture principles with clear dependency direction:

presentation → application → domain → infrastructure
     ↓              ↓           ↓            ↓
   cmd/        services/    forge/       git/
   cli/        pipeline/    config/      storage/
   server/                  state/       workspace/

Dependency Rules:

  • Inner layers never depend on outer layers
  • Domain logic has no infrastructure dependencies
  • Infrastructure adapters implement domain interfaces

2. Event Sourcing

Build lifecycle is captured as events in an immutable event store:

1
2
3
4
5
6
type Event struct {
    ID        string
    Timestamp time.Time
    Type      EventType
    Data      json.RawMessage
}

Event Types:

  • BuildStarted, BuildCompleted, BuildFailed
  • RepositoryCloned, RepositoryUpdated
  • DocumentationDiscovered
  • HugoSiteGenerated

3. Typed State Management

State is decomposed into focused sub-states:

1
2
3
4
5
type BuildState struct {
    Git      *GitState      // Repository management
    Docs     *DocsState     // Documentation discovery
    Pipeline *PipelineState // Execution metadata
}

Each sub-state has:

  • Clear ownership boundaries
  • Validation methods
  • JSON serialization
  • Test builders

4. Unified Error Handling

All errors use internal/foundation/errors.ClassifiedError:

1
2
3
4
5
6
7
8
type ClassifiedError struct {
    category ErrorCategory  // Type-safe category enum
    severity ErrorSeverity  // Fatal, Error, Warning, Info
    retry    RetryStrategy  // Never, Immediate, Backoff, RateLimit, User
    message  string
    cause    error
    context  ErrorContext   // map[string]any
}

Error Categories:

  • CategoryConfig, CategoryValidation (non-retryable, user-facing)
  • CategoryNetwork, CategoryGit (retryable with backoff)
  • CategoryFileSystem (transient, retry immediate)
  • CategoryAuth, CategoryNotFound (non-retryable)

Key Features:

  • Type-safe categories and severity levels
  • Built-in retry semantics
  • HTTP/CLI adapters for boundary translation
  • Fluent builder API for error construction
  • Structured context via WithContext(key, value)

System Architecture

High-Level Components

┌───────────────────────────────────────────────────────────┐
│                     Presentation Layer                    │
│  ┌──────────┐  ┌──────────┐  ┌───────────┐  ┌──────────┐  │
│  │   CLI    │  │  Daemon  │  │  Server   │  │  Tests   │  │
│  └─────┬────┘  └─────┬────┘  └─────┬─────┘  └─────┬────┘  │
└────────┼─────────────┼─────────────┼──────────────┼───────┘
         │             │             │              │
         └─────────────┴─────────────┴──────────────┘
                       │
         ┌─────────────▼─────────────┐
         │    Service Layer          │
         │  ┌─────────────────────┐  │
         │  │  BuildService       │  │
         │  │  PreviewService     │  │
         │  │  DiscoveryService   │  │
         │  └──────────┬──────────┘  │
         └─────────────┼─────────────┘
                       │
         ┌─────────────▼─────────────┐
         │   Pipeline Layer          │
         │  ┌─────────────────────┐  │
         │  │  StageExecutor      │◄─┤ PrepareOutput
         │  │  PipelineRunner     │◄─┤ CloneRepos
         │  │  ChangeDetector     │◄─┤ DiscoverDocs
         │  └─────────┬───────────┘◄─┤ GenerateConfig
         └────────────┼──────────────┤ Layouts
                      │              ┤ CopyContent
         ┌────────────▼──────────────┤ Indexes
         │   Domain Layer            ┤ RunHugo
         │  ┌─────────────────────┐  │
         │  │  Config             │  │
         │  │  State              │  │
         │  │  DocFile            │  │
         │  │  Repository         │  │
         │  └─────────┬───────────┘  │
         └────────────┼──────────────┘
                      │
         ┌────────────▼──────────────┐
         │  Infrastructure Layer     │
         │  ┌─────────────────────┐  │
         │  │  Git Client         │  │
         │  │  Forge Clients      │  │
         │  │  Storage            │  │
         │  │  Workspace Manager  │  │
         │  │  Event Store        │  │
         │  └─────────────────────┘  │
         └───────────────────────────┘

Pipeline Stages

The build process executes 8 sequential stages:

1. PrepareOutput    → Initialize directories
2. CloneRepos       → Git operations
3. DiscoverDocs     → Find markdown files
4. GenerateConfig   → Create hugo.yaml
5. Layouts          → Copy theme templates
6. CopyContent      → Process markdown with transforms
7. Indexes          → Generate index pages
8. RunHugo          → Render static site (optional)

Each stage:

  • Implements StageExecutor interface
  • Records duration and outcome
  • Emits events to event store
  • Returns typed errors

Package Structure

Foundation Packages

internal/foundation/

Core types used across all layers:

  • errors/ - Unified error system (DocBuilderError)
  • validation/ - Validation result types
  • logging/ - Structured logging setup

internal/config/

Configuration management:

config/
├── v2.go              # YAML loading with env expansion
├── validation.go      # Top-level validation orchestration
├── typed/             # Domain-specific config structs
│   ├── hugo_config.go
│   ├── daemon_config.go
│   └── forge_config.go
└── normalize.go       # Configuration normalization

Key Types:

  • Config - Root configuration
  • RepositoryConfig - Per-repo settings
  • HugoConfig - Hugo site configuration
  • BuildConfig - Build behavior settings

internal/state/

State management:

state/
├── build_state.go     # Root build state
├── git_state.go       # Repository state
├── docs_state.go      # Documentation discovery state
├── pipeline_state.go  # Execution metadata
└── store/             # State persistence
    ├── json_*.go      # JSON-based stores
    └── helpers.go     # Common store utilities

Core Domain Packages

internal/forge/

Git hosting platform abstraction:

forge/
├── base_forge.go      # Common HTTP operations
├── github.go          # GitHub implementation
├── gitlab.go          # GitLab implementation
├── forgejo.go         # Forgejo/Gitea implementation
└── capabilities.go    # Feature detection

Key Abstractions:

  • Forge interface - Platform operations
  • BaseForge - Shared HTTP client
  • Capabilities - Feature flags (webhooks, tokens, etc.)

internal/git/

Git operations:

git/
├── git.go             # Client implementation
├── auth.go            # Authentication strategies
├── workspace.go       # Workspace management
└── head.go            # HEAD reference reading

Auth Methods:

  • SSH keys
  • Personal access tokens
  • Basic username/password

internal/docs/

Documentation discovery:

docs/
├── discovery.go       # File discovery logic
├── doc_file.go        # DocFile model
└── filters.go         # Ignore patterns

Discovery Rules:

  • Walk configured paths
  • Filter .md and .markdown files
  • Ignore README.md, CONTRIBUTING.md, etc.
  • Respect .docignore files

internal/hugo/

Hugo site generation:

hugo/
├── generator.go       # Main generator
├── config.go          # hugo.yaml generation
├── content_copy.go    # Content processing
├── index.go           # Index page generation
├── runner.go          # Hugo binary execution
├── models/            # Data models
│   ├── frontmatter.go
│   ├── editlink.go
│   └── config.go
└── pipeline/          # Fixed transform pipeline (ADR-003)
    ├── transforms.go  # All 11 transforms
    └── generators.go  # Document generators

Content Pipeline:

Parse Front Matter
    ↓
Build Front Matter (metadata injection)
    ↓
Edit Link Injection
    ↓
Merge Front Matter
    ↓
Apply Transforms
    ↓
Serialize Front Matter

Infrastructure Packages

internal/workspace/

Temporary directory management:

1
2
3
4
5
6
type Manager struct {
    basePath string
}

func (m *Manager) Create() (string, error)
func (m *Manager) Cleanup() error

Lifecycle:

  • Creates timestamped temp directories
  • Tracks creation for cleanup
  • Safe concurrent operations

internal/storage/ (Removed)

Note: This package was removed to simplify CLI build complexity. The daemon’s skip evaluation uses internal/state instead.

Historical Purpose: Content-addressed storage for CLI incremental builds with hash-based object paths, Put/Get/Delete/List operations, and garbage collection.

internal/eventstore/

Event persistence:

eventstore/
├── store.go           # Event store interface
├── memory.go          # In-memory implementation
└── file.go            # File-based implementation

Operations:

  • Append events (immutable)
  • Query by type, time range, correlation ID
  • Projections for state reconstruction

Application Packages

internal/services/

Service interfaces for lifecycle management:

services/
└── interfaces.go          # ManagedService, StateManager interfaces

Defines contracts for service lifecycle (Start, Stop, Health) and state persistence (Load, Save).

internal/build/

Sequential pipeline for documentation generation:

build/
├── service.go             # BuildService interface  
├── default_service.go     # Default pipeline executor
├── stages.go              # Stage definitions
└── report.go              # Build reporting

Executes pipeline stages (PrepareOutput → CloneRepos → DiscoverDocs → GenerateConfig → Layouts → CopyContent → Indexes → RunHugo).

internal/incremental/ (Removed)

Note: This package was removed to simplify CLI build complexity. It overlapped with the daemon’s skip evaluation system (which uses internal/state with rule-based validation).

Historical Purpose: Change detection for CLI incremental builds. Compared repository HEAD refs, hashed discovered documentation files, and cached signatures to skip unchanged repositories.

Presentation Packages

cmd/docbuilder/commands

Command-line interface:

cmd/docbuilder/
├── main.go            # CLI entry point
└── commands/
    ├── build.go       # Build command
    ├── init.go        # Init command  
    ├── discover.go    # Discovery command
    ├── daemon.go      # Daemon command
    ├── preview.go     # Preview command
    ├── generate.go    # Generate command
    ├── visualize.go   # Visualize command
    └── common.go      # Shared helpers

Uses Kong for argument parsing. Error handling via internal/foundation/errors CLI adapter.

internal/server/

HTTP server:

server/
├── server.go          # Server setup
├── handlers/          # Request handlers
│   ├── webhook.go
│   ├── build.go
│   ├── status.go
│   └── metrics.go
├── middleware/        # Middleware
│   ├── logging.go
│   ├── auth.go
│   └── recovery.go
└── responses/         # Response types

Testing Packages

internal/testing/

Test utilities:

testing/
├── config_builder.go      # Fluent config builders
├── file_assertions.go     # File/directory assertions
├── cli_runner.go          # CLI integration testing
└── fixtures.go            # Test data

internal/testforge/

Forge test doubles:

testforge/
├── mock_forge.go          # Mock forge implementation
└── README.md

Data Flow

Build Flow

User invokes CLI/API
    ↓
BuildService.Build(config)
    ↓
PipelineRunner.Run(stages)
    ↓
┌─────────────────────────────────┐
│ Stage 1: PrepareOutput          │
│  - Create/clean output dirs     │
│  - Initialize staging           │
└────────────┬────────────────────┘
             ↓
┌─────────────────────────────────┐
│ Stage 2: CloneRepos             │
│  - Authenticate with forges     │
│  - Clone/update repositories    │
│  - Detect changes               │
└────────────┬────────────────────┘
             ↓
┌─────────────────────────────────┐
│ Stage 3: DiscoverDocs           │
│  - Walk documentation paths     │
│  - Filter markdown files        │
│  - Build DocFile list           │
│  - Compute doc set hash         │
└────────────┬────────────────────┘
             ↓
┌─────────────────────────────────┐
│ Stage 4: GenerateConfig         │
│  - Load theme capabilities      │
│  - Apply theme params           │
│  - Merge user config            │
│  - Write hugo.yaml              │
└────────────┬────────────────────┘
             ↓
┌─────────────────────────────────┐
│ Stage 5: Layouts                │
│  - Copy custom layouts          │
│  - Set up index templates       │
└────────────┬────────────────────┘
             ↓
┌─────────────────────────────────┐
│ Stage 6: CopyContent            │
│  - Parse front matter           │
│  - Inject metadata              │
│  - Add edit links               │
│  - Apply transforms             │
│  - Write to content/            │
└────────────┬────────────────────┘
             ↓
┌─────────────────────────────────┐
│ Stage 7: Indexes                │
│  - Generate _index.md files     │
│  - Repository indexes           │
│  - Section indexes              │
└────────────┬────────────────────┘
             ↓
┌─────────────────────────────────┐
│ Stage 8: RunHugo (optional)     │
│  - Execute hugo build           │
│  - Generate public/             │
└────────────┬────────────────────┘
             ↓
BuildReport generated
    ↓
Events persisted
    ↓
Metrics updated
    ↓
Return to caller

Configuration Loading Flow

1. Load YAML file
    ↓
2. Expand ${ENV_VAR} references
    ↓
3. Parse into Config struct
    ↓
4. Apply defaults
    ↓
5. Normalize (fill implicit values)
    ↓
6. Validate (orchestration)
    ├→ ValidateHugoConfig()
    ├→ ValidateDaemonConfig()
    ├→ ValidateForgeConfig()
    └→ ValidateRepositories()
    ↓
7. Return validated Config

State Persistence Flow

In-Memory State (BuildState)
    ↓
Sub-State Updates
    ├→ GitState.Update()
    ├→ DocsState.AddDocFile()
    └→ PipelineState.RecordStage()
    ↓
State Store Operations
    ├→ DaemonInfoStore.Save()
    ├→ StatisticsStore.Save()
    └→ JSONStore.Save()
    ↓
Filesystem Persistence
    └→ .docbuilder/state.json

Event Flow

Pipeline Stage Execution
    ↓
Emit Event(s)
    ├→ BuildStarted
    ├→ RepositoryCloned
    ├→ DocumentationDiscovered
    └→ BuildCompleted
    ↓
Event Store Append
    ↓
Event Handlers (async)
    ├→ Metrics Update
    ├→ Webhook Notification
    └→ Log Aggregation

Key Subsystems

1. Relearn Theme Configuration

DocBuilder is now hardcoded to use the Relearn theme exclusively. Theme configuration is applied directly in the Hugo generator without a plugin system or theme registry.

Configuration Location: internal/hugo/config_writer.go

1
2
3
4
5
6
7
func (g *Generator) applyRelearnThemeDefaults(params map[string]any) {
    // Set Relearn-specific defaults
    // - themeVariant
    // - disableSearch
    // - disableLandingPageButton
    // etc.
}

Hugo Config Generation:

  1. Core defaults (title, baseURL, markup)
  2. Relearn-specific defaults (via applyRelearnThemeDefaults)
  3. User param deep-merge (user values override)
  4. Dynamic fields (build_date)
  5. Hardcoded module import: github.com/McShelby/hugo-theme-relearn
  6. Language configuration for i18n

No Theme Abstraction:

  • Previous multi-theme system removed
  • No theme registry or plugin system
  • Relearn configuration is hardcoded
  • Simplifies codebase and maintenance

2. Forge Integration

Forges implement the Forge interface:

1
2
3
4
5
6
7
8
type Forge interface {
    Name() string
    Type() string
    GetRepository(owner, repo string) (*Repository, error)
    GetFileContent(owner, repo, path, ref string) ([]byte, error)
    ListRepositories(org string) ([]*Repository, error)
    Capabilities() Capabilities
}

HTTP Consolidation: All forge clients use BaseForge for common operations:

1
2
3
4
5
6
type BaseForge struct {
    client      *http.Client
    baseURL     string
    authHeader  string
    customHeaders map[string]string
}

Auth Methods:

  • GitHub: Bearer token + custom headers
  • GitLab: Bearer token
  • Forgejo: Token prefix + dual event headers

3. Change Detection

The incremental system uses multiple strategies:

1
2
3
type ChangeDetector interface {
    DetectChanges(repos []*RepositoryConfig) (*ChangeSet, error)
}

Detection Levels:

  1. Repository HEAD - Git ref comparison
  2. Quick Hash - Fast directory tree hashing
  3. Doc Files Hash - SHA-256 of sorted Hugo paths
  4. Deletion Detection - Optional file removal tracking

Skip Conditions:

  • Unchanged HEAD ref
  • Identical doc set hash
  • No deletions detected (if enabled)

4. Content Transform Pipeline

Content processing uses a pipeline pattern:

1
2
3
type Transformer interface {
    Transform(ctx context.Context, doc *DocFile) error
}

Built-in Transformers:

  • FrontMatterParser - Extract YAML headers
  • FrontMatterBuilder - Add metadata
  • EditLinkInjector - Generate edit URLs
  • FrontMatterMerger - Combine metadata
  • FrontMatterSerializer - Write YAML

Custom Transformers: Users can add custom transforms in config:

1
2
3
4
5
hugo:
  content_transforms:
    - type: replace
      pattern: "{{OLD}}"
      replacement: "{{NEW}}"

5. Observability Stack

Logging

Structured logging with slog:

1
2
3
4
logger.Info("Documentation discovered",
    "repository", repoName,
    "files", fileCount,
)

Metrics

Prometheus-compatible metrics:

1
2
3
buildDuration.Observe(duration.Seconds())
buildsTotal.WithLabelValues("success").Inc()
reposProcessed.WithLabelValues(repoName).Inc()

Tracing

Context-based distributed tracing:

1
2
ctx, span := tracer.Start(ctx, "clone-repository")
defer span.End()

Error Handling

All errors use unified type:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
return foundation.NewDocBuilderError(
    foundation.ErrCodeGitClone,
    "failed to clone repository",
    err,
    foundation.WithContext(map[string]any{
        "repository": repoURL,
        "branch": branch,
    }),
    foundation.WithRetryable(true),
)

Extension Points

1. Custom Stages

Add new pipeline stages:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
type CustomStage struct {
    config *config.Config
}

func (s *CustomStage) Execute(ctx context.Context, state *state.BuildState) error {
    // Custom logic
    return nil
}

func (s *CustomStage) Name() string {
    return "custom"
}

Register in pipeline configuration.

2. Custom Transformers

Implement transformer interface:

1
2
3
4
5
6
type CustomTransformer struct {}

func (t *CustomTransformer) Transform(ctx context.Context, doc *docs.DocFile) error {
    // Modify doc.Content or doc.FrontMatter
    return nil
}

Register in content copy stage.

3. Custom Stores

Implement state store interface:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
type CustomStore struct {}

func (s *CustomStore) Save(ctx context.Context, data any) error {
    // Custom persistence
    return nil
}

func (s *CustomStore) Load(ctx context.Context) (any, error) {
    // Custom retrieval
    return nil, nil
}

Register in state management.

4. Event Handlers

Subscribe to build events:

1
2
3
4
5
6
type CustomHandler struct {}

func (h *CustomHandler) Handle(ctx context.Context, event *eventstore.Event) error {
    // React to events
    return nil
}

Register with event store.


Operational Considerations

Performance

Incremental Builds:

  • Enable with build.incremental: true
  • Typically 10-100x faster for unchanged repos
  • Requires persistent workspace

Pruning:

  • Enable with pruning.enabled: true
  • Removes non-doc directories
  • Reduces workspace size by 50-90%

Shallow Clones:

  • Enable with git.shallow: true
  • Depth 1 clones
  • Faster for large repositories

Scalability

Multi-Tenancy:

  • Per-tenant configuration
  • Isolated workspaces
  • Resource quotas

Horizontal Scaling:

  • Stateless build workers
  • Shared event store
  • Load balancer distribution

Resource Limits:

  • Memory: ~100MB base + 10MB per repo
  • CPU: 1-2 cores for typical builds
  • Disk: 500MB workspace + output size

Reliability

Error Recovery:

  • Retryable errors auto-retry (3x default)
  • Partial build state preserved
  • Atomic staging promotion

Health Checks:

  • /health endpoint
  • Workspace availability
  • Git connectivity
  • Hugo binary presence

Monitoring:

  • Build success/failure rates
  • Stage durations
  • Repository update lag
  • Disk usage

Security

Authentication:

  • Token-based API auth
  • SSH key management
  • Credential encryption at rest

Authorization:

  • Repository access control
  • API endpoint permissions
  • Webhook signature verification

Secrets Management:

  • Environment variable expansion
  • .env file support
  • External secret stores (planned)

Maintenance

Configuration Updates:

  • Restart daemon to apply new configuration
  • Validate before restart

State Management:

  • State stored in .docbuilder/
  • JSON format for portability
  • Manual cleanup supported

Dependency Management:

  • Go modules for dependencies
  • Hugo binary external
  • Theme modules auto-fetched

Architecture Decision Records

See docs/adr/ for detailed architectural decisions:


Migration Status

The architecture has undergone significant evolution. See ARCHITECTURE_MIGRATION_PLAN.md for:

  • 19 completed phases (A-M, O-P, R-S-T-U)
  • 2 deferred phases (Q, J)
  • ~1,290 lines eliminated
  • Zero breaking changes

Current State: Architecture migration complete. Codebase follows:

  • Event-driven patterns
  • Typed configuration/state
  • Unified observability
  • Single execution pipeline
  • Clean domain boundaries

References