Data Flow Diagrams
This document shows how data flows through DocBuilder during configuration loading, build execution, and state persistence.
Last Updated: January 4, 2026 - Reflects current implementation.
Configuration Loading
sequenceDiagram
participant User
participant CLI
participant Config
participant ENV
participant Validator
participant TypedConfig
User->>CLI: docbuilder build -c config.yaml
CLI->>Config: Load(configPath)
Config->>ENV: Read .env files
ENV-->>Config: Environment variables
Config->>Config: Parse YAML
Config->>Config: Expand ${VAR} references
Config->>Config: Apply defaults
Config->>Validator: ValidateConfig()
Validator->>TypedConfig: HugoConfig.Validate()
TypedConfig-->>Validator: ValidationResult
Validator->>TypedConfig: DaemonConfig.Validate()
TypedConfig-->>Validator: ValidationResult
Validator-->>Config: Validation complete
Config-->>CLI: Validated Config
CLI->>CLI: Start build
Configuration Loading Steps
- File Discovery: Locate config file (explicit path or defaults)
- Environment Loading: Read
.envand.env.localfiles - YAML Parsing: Parse configuration file structure
- Variable Expansion: Replace
${VAR}with environment values - Default Application: Fill in missing optional fields
- Validation: Check required fields, types, and constraints
- Type Coercion: Convert to strongly-typed config structs
Key Files
internal/config/config.go- Main config loaderinternal/config/typed/hugo_config.go- Hugo configuration validationinternal/config/typed/daemon_config.go- Daemon configuration validation
Build Execution
sequenceDiagram
participant CLI
participant BuildService
participant Pipeline
participant Git
participant Docs
participant Hugo
participant EventStore
CLI->>BuildService: Build(config)
BuildService->>Pipeline: Run(stages)
Pipeline->>EventStore: Emit BuildStarted
Pipeline->>Git: CloneRepos()
Git->>Git: Authenticate
Git->>Git: Clone/Update
Git-->>Pipeline: Repository ready
Pipeline->>EventStore: Emit RepositoryCloned
Pipeline->>Docs: DiscoverDocs()
Docs->>Docs: Walk paths
Docs->>Docs: Filter markdown
Docs-->>Pipeline: DocFile list
Pipeline->>EventStore: Emit DocumentationDiscovered
Pipeline->>Hugo: GenerateConfig()
Hugo->>Hugo: Apply Relearn params
Hugo->>Hugo: Write hugo.yaml
Hugo-->>Pipeline: Config ready
Pipeline->>Hugo: CopyContent()
Hugo->>Hugo: Transform files
Hugo-->>Pipeline: Content ready
Pipeline->>Hugo: RunHugo()
Hugo->>Hugo: Execute hugo build
Hugo-->>Pipeline: Site generated
Pipeline->>EventStore: Emit BuildCompleted
Pipeline-->>BuildService: BuildReport
BuildService-->>CLI: Success
Build Flow Phases
Phase 1: Initialization
- BuildService receives config and output path
- Creates BuildState with initial values
- Prepares output directory
Phase 2: Repository Operations
- For each repository:
- Authenticate with credentials
- Clone or update repository
- Record HEAD commit reference
- Emit RepositoryCloned/Updated event
Phase 3: Documentation Discovery
- Walk configured documentation paths
- Filter for markdown files (.md, .markdown)
- Exclude standard files (README, CONTRIBUTING, etc.)
- Build list of DocFile objects
- Compute documentation set hash
Phase 4: Hugo Configuration
- Load theme defaults (Relearn)
- Merge user-provided parameters
- Add dynamic fields (build_date, version)
- Configure Hugo modules
- Write hugo.yaml
Phase 5: Content Processing
- For each DocFile:
- Run 12-step transform pipeline
- Write to Hugo content/ directory
- Generate static assets
- Update DocsState
Phase 6: Index Generation
- Create main site index (_index.md)
- Generate repository indexes
- Generate section indexes
Phase 7: Rendering
- Execute hugo command (if render_mode permits)
- Capture output and errors
- Count rendered pages
Phase 8: Finalization
- Atomic staging promotion
- Generate build report
- Emit BuildCompleted event
- Return result to CLI
Key Files
internal/build/default_service.go- Build orchestrationinternal/hugo/generator.go- Hugo site generationinternal/hugo/stages.go- Stage execution
State Persistence
sequenceDiagram
participant Pipeline
participant BuildState
participant GitState
participant StateStore
participant FileSystem
Pipeline->>BuildState: Create()
Pipeline->>GitState: Update(repo, head)
GitState->>BuildState: Merge update
Pipeline->>BuildState: RecordStage(name, duration)
Pipeline->>StateStore: Save(state)
StateStore->>StateStore: Serialize to JSON
StateStore->>FileSystem: Write .docbuilder/state.json
FileSystem-->>StateStore: Success
StateStore-->>Pipeline: State persisted
Note over Pipeline,FileSystem: Later: Incremental build
Pipeline->>StateStore: Load()
StateStore->>FileSystem: Read .docbuilder/state.json
FileSystem-->>StateStore: JSON data
StateStore->>StateStore: Deserialize
StateStore-->>Pipeline: Previous BuildState
Pipeline->>Pipeline: Compare HEAD refs
Pipeline->>Pipeline: Decide skip/clone
State Persistence Flow
During Build:
- BuildState created with initial values
- GitState updated after each repository clone
- DocsState updated after discovery
- PipelineState tracks execution metadata
- State serialized to JSON
- Written to
.docbuilder/state.json
Incremental Build:
- Load previous state from
.docbuilder/state.json - Compare HEAD references for each repository
- Compute documentation set hash
- Skip unchanged repositories
- Update only changed state
State Components
GitState:
WorkspaceDir- Temporary workspace pathRepositories- List of configured repositoriesCommits- Map of repository β HEAD referenceCommitDates- Map of repository β commit timestamp
DocsState:
Files- Discovered DocFile listIsSingleRepo- Whether single repository modeFilesByRepository- Map of repository β files
PipelineState:
ConfigHash- Configuration fingerprintExecutedStages- List of completed stages
Key Files
internal/state/git_state.go- Git state managementinternal/state/docs_state.go- Documentation stateinternal/state/pipeline_state.go- Pipeline metadata
Content Transform Flow
Transform Pipeline Data Flow
Each transform:
- Receives
Documentobject - Reads current state (Path, Content, FrontMatter)
- Modifies one or more fields
- Returns modified
Document - May generate additional documents (e.g., )
Transforms are idempotent: Running twice produces same result.
Key Files
internal/hugo/pipeline/processor.go- Pipeline orchestrationinternal/hugo/pipeline/document.go- Document modelinternal/hugo/pipeline/transform_*.go- Individual transforms
Repository Metadata Flow
Metadata Sources
- Git Repository: Clone operation provides URL, branch
- Git HEAD: Read commit provides SHA and timestamp
- Forge API: (Optional) Additional metadata from GitHub/GitLab API
- Configuration: Repository name, paths, authentication
Key Files
internal/git/git.go- Git operationsinternal/hugo/generator.go- Metadata collectioninternal/hugo/pipeline/transform_metadata.go- Metadata injection
Event Emission Flow
Event Types
Build Lifecycle:
BuildStarted- Build beginsBuildCompleted- Build finishes successfullyBuildFailed- Build encounters error
Repository Events:
RepositoryCloned- Fresh clone completedRepositoryUpdated- Git pull completedRepositorySkipped- No changes detected
Documentation Events:
DocumentationDiscovered- Files foundDocumentTransformed- File processedIndexGenerated- Index page created
Configuration Events:
ConfigGenerated- hugo.yaml writtenConfigValidated- Configuration passed validation
Key Files
internal/eventstore/- Event storage and emissioninternal/hugo/stages.go- Event emission points
Change Detection Flow
Change Detection Levels
Level 1: HEAD Comparison (fastest)
- Fetch remote HEAD reference
- Compare to stored HEAD
- Skip if identical
- Cost: Single network request per repository
Level 2: Quick Hash (fast)
- Hash directory tree structure
- Compare to previous hash
- Detect added/removed directories
- Cost: File system scan
Level 3: Doc Files Hash (medium)
- Discover all documentation files
- Sort by path
- Compute SHA-256 hash
- Compare to previous
- Cost: File content reading
Level 4: Deletion Detection (thorough)
- Compare current files to previous
- Detect removed files
- Require rebuild if deletions found
- Cost: Full file list comparison
Key Files
internal/hugo/doc_changes.go- Change detection logicinternal/git/git.go- HEAD reference checking
References
permalink[data-flow-diagrams](https://docs.home.luguber.info/_uid/13690187-bce4-4683-a34a-3743ba03d7ac/)