Comprehensive Architecture
Comprehensive Architecture Documentation
Table of Contents
- Overview
- Core Principles
- System Architecture
- Package Structure
- Data Flow
- Key Subsystems
- Extension Points
- Operational Considerations
Overview
DocBuilder is a Go CLI tool and daemon that aggregates documentation from multiple Git repositories into a unified Hugo static site. It implements a staged pipeline architecture with event sourcing, typed configuration/state management, and comprehensive observability.
Key Characteristics
- Event-Driven: Build lifecycle modeled as events in an event store
- Type-Safe: Strongly typed configuration and state (no
map[string]anyin primary paths) - Observable: Unified error system, structured logging, metrics, and tracing
- Incremental: Change detection and partial rebuilds for performance
- Multi-Tenant: Supports forge namespacing and per-repository configuration
- Theme-Aware: Hugo theme integration via modules (Relearn only)
Core Principles
1. Clean Architecture
The codebase follows clean architecture principles with clear dependency direction:
Dependency Rules:
- Inner layers never depend on outer layers
- Domain logic has no infrastructure dependencies
- Infrastructure adapters implement domain interfaces
2. Event Sourcing
Build lifecycle is captured as events in an immutable event store:
Event Types:
BuildStarted,BuildCompleted,BuildFailedRepositoryCloned,RepositoryUpdatedDocumentationDiscoveredHugoSiteGenerated
3. Typed State Management
State is decomposed into focused sub-states:
Each sub-state has:
- Clear ownership boundaries
- Validation methods
- JSON serialization
- Test builders
4. Unified Error Handling
All errors use internal/foundation/errors.ClassifiedError:
Error Categories:
CategoryConfig,CategoryValidation(non-retryable, user-facing)CategoryNetwork,CategoryGit(retryable with backoff)CategoryFileSystem(transient, retry immediate)CategoryAuth,CategoryNotFound(non-retryable)
Key Features:
- Type-safe categories and severity levels
- Built-in retry semantics
- HTTP/CLI adapters for boundary translation
- Fluent builder API for error construction
- Structured context via
WithContext(key, value)
System Architecture
High-Level Components
Pipeline Stages
The build process executes 8 sequential stages:
Each stage:
- Implements
StageExecutorinterface - Records duration and outcome
- Emits events to event store
- Returns typed errors
Package Structure
Foundation Packages
internal/foundation/
Core types used across all layers:
- errors/ - Unified error system (
DocBuilderError) - validation/ - Validation result types
- logging/ - Structured logging setup
internal/config/
Configuration management:
Key Types:
Config- Root configurationRepositoryConfig- Per-repo settingsHugoConfig- Hugo site configurationBuildConfig- Build behavior settings
internal/state/
State management:
Core Domain Packages
internal/forge/
Git hosting platform abstraction:
Key Abstractions:
Forgeinterface - Platform operationsBaseForge- Shared HTTP clientCapabilities- Feature flags (webhooks, tokens, etc.)
internal/git/
Git operations:
Auth Methods:
- SSH keys
- Personal access tokens
- Basic username/password
internal/docs/
Documentation discovery:
Discovery Rules:
- Walk configured paths
- Filter
.mdand.markdownfiles - Ignore
README.md,CONTRIBUTING.md, etc. - Respect
.docignorefiles
internal/hugo/
Hugo site generation:
Content Pipeline:
Infrastructure Packages
internal/workspace/
Temporary directory management:
Lifecycle:
- Creates timestamped temp directories
- Tracks creation for cleanup
- Safe concurrent operations
internal/storage/ (Removed)
Note: This package was removed to simplify CLI build complexity. The daemon’s skip evaluation uses internal/state instead.
Historical Purpose: Content-addressed storage for CLI incremental builds with hash-based object paths, Put/Get/Delete/List operations, and garbage collection.
internal/eventstore/
Event persistence:
Operations:
- Append events (immutable)
- Query by type, time range, correlation ID
- Projections for state reconstruction
Application Packages
internal/services/
Service interfaces for lifecycle management:
Defines contracts for service lifecycle (Start, Stop, Health) and state persistence (Load, Save).
internal/build/
Sequential pipeline for documentation generation:
Executes pipeline stages (PrepareOutput → CloneRepos → DiscoverDocs → GenerateConfig → Layouts → CopyContent → Indexes → RunHugo).
internal/incremental/ (Removed)
Note: This package was removed to simplify CLI build complexity. It overlapped with the daemon’s skip evaluation system (which uses internal/state with rule-based validation).
Historical Purpose: Change detection for CLI incremental builds. Compared repository HEAD refs, hashed discovered documentation files, and cached signatures to skip unchanged repositories.
Presentation Packages
cmd/docbuilder/commands
Command-line interface:
Uses Kong for argument parsing.
Error handling via internal/foundation/errors CLI adapter.
internal/server/
HTTP server:
Testing Packages
internal/testing/
Test utilities:
internal/testforge/
Forge test doubles:
Data Flow
Build Flow
Configuration Loading Flow
State Persistence Flow
Event Flow
Key Subsystems
1. Relearn Theme Configuration
DocBuilder is now hardcoded to use the Relearn theme exclusively. Theme configuration is applied directly in the Hugo generator without a plugin system or theme registry.
Configuration Location: internal/hugo/config_writer.go
Hugo Config Generation:
- Core defaults (title, baseURL, markup)
- Relearn-specific defaults (via
applyRelearnThemeDefaults) - User param deep-merge (user values override)
- Dynamic fields (build_date)
- Hardcoded module import:
github.com/McShelby/hugo-theme-relearn - Language configuration for i18n
No Theme Abstraction:
- Previous multi-theme system removed
- No theme registry or plugin system
- Relearn configuration is hardcoded
- Simplifies codebase and maintenance
2. Forge Integration
Forges implement the Forge interface:
HTTP Consolidation:
All forge clients use BaseForge for common operations:
Auth Methods:
- GitHub: Bearer token + custom headers
- GitLab: Bearer token
- Forgejo: Token prefix + dual event headers
3. Change Detection
The incremental system uses multiple strategies:
Detection Levels:
- Repository HEAD - Git ref comparison
- Quick Hash - Fast directory tree hashing
- Doc Files Hash - SHA-256 of sorted Hugo paths
- Deletion Detection - Optional file removal tracking
Skip Conditions:
- Unchanged HEAD ref
- Identical doc set hash
- No deletions detected (if enabled)
4. Content Transform Pipeline
Content processing uses a pipeline pattern:
Built-in Transformers:
FrontMatterParser- Extract YAML headersFrontMatterBuilder- Add metadataEditLinkInjector- Generate edit URLsFrontMatterMerger- Combine metadataFrontMatterSerializer- Write YAML
Custom Transformers: Users can add custom transforms in config:
5. Observability Stack
Logging
Structured logging with slog:
Metrics
Prometheus-compatible metrics:
Tracing
Context-based distributed tracing:
Error Handling
All errors use unified type:
Extension Points
1. Custom Stages
Add new pipeline stages:
Register in pipeline configuration.
2. Custom Transformers
Implement transformer interface:
Register in content copy stage.
3. Custom Stores
Implement state store interface:
Register in state management.
4. Event Handlers
Subscribe to build events:
Register with event store.
Operational Considerations
Performance
Incremental Builds:
- Enable with
build.incremental: true - Typically 10-100x faster for unchanged repos
- Requires persistent workspace
Pruning:
- Enable with
pruning.enabled: true - Removes non-doc directories
- Reduces workspace size by 50-90%
Shallow Clones:
- Enable with
git.shallow: true - Depth 1 clones
- Faster for large repositories
Scalability
Multi-Tenancy:
- Per-tenant configuration
- Isolated workspaces
- Resource quotas
Horizontal Scaling:
- Stateless build workers
- Shared event store
- Load balancer distribution
Resource Limits:
- Memory: ~100MB base + 10MB per repo
- CPU: 1-2 cores for typical builds
- Disk: 500MB workspace + output size
Reliability
Error Recovery:
- Retryable errors auto-retry (3x default)
- Partial build state preserved
- Atomic staging promotion
Health Checks:
/healthendpoint- Workspace availability
- Git connectivity
- Hugo binary presence
Monitoring:
- Build success/failure rates
- Stage durations
- Repository update lag
- Disk usage
Security
Authentication:
- Token-based API auth
- SSH key management
- Credential encryption at rest
Authorization:
- Repository access control
- API endpoint permissions
- Webhook signature verification
Secrets Management:
- Environment variable expansion
.envfile support- External secret stores (planned)
Maintenance
Configuration Updates:
- Restart daemon to apply new configuration
- Validate before restart
State Management:
- State stored in
.docbuilder/ - JSON format for portability
- Manual cleanup supported
Dependency Management:
- Go modules for dependencies
- Hugo binary external
- Theme modules auto-fetched
Architecture Decision Records
See docs/adr/ for detailed architectural decisions:
Migration Status
The architecture has undergone significant evolution. See ARCHITECTURE_MIGRATION_PLAN.md for:
- 19 completed phases (A-M, O-P, R-S-T-U)
- 2 deferred phases (Q, J)
- ~1,290 lines eliminated
- Zero breaking changes
Current State: Architecture migration complete. Codebase follows:
- Event-driven patterns
- Typed configuration/state
- Unified observability
- Single execution pipeline
- Clean domain boundaries