ADR-001: Golden Testing Strategy for Output Verification
Date: 2025-12-12
Status
Proposed
Context
DocBuilder’s core value is transforming documentation from Git repositories into Hugo-rendered sites. Currently, we lack comprehensive integration tests that verify the entire pipeline: repository retrieval → content discovery → Hugo generation → theme-specific transformations → final rendered output.
Current State
- Unit tests: Cover individual components (git, docs, hugo packages)
- CLI integration tests: Test command execution but not output correctness
- Manual verification: Developers run builds and visually inspect Hugo sites
- Regression risk: Refactoring breaks transformations without detection
- Feature validation: No systematic way to verify new content transformations
Problems
- No end-to-end validation: Can’t verify the complete pipeline produces correct output
- Refactoring fear: Large-scale changes (like pipeline refactor) lack safety net
- Bug reproduction: No standard way to create minimal reproducible test cases
- Feature documentation: Supported transformations not demonstrated in tests
- Configuration validation: Can’t systematically verify Relearn theme configuration
Decision
Implement a golden testing framework that:
- Uses test repositories stored as directory structures in
test/testdata/repos/ - Creates temporary Git repos from these structures during test execution
- Runs full build pipeline with test-specific configurations
- Verifies output against stored golden files (Hugo config, content structure, rendered samples)
- Supports update workflow via
-update-goldenflag for intentional changes
Architecture
Golden File Format
Hugo Config (hugo-config.golden.yaml):
Content Structure (content-structure.golden.json):
Content Structure (content-structure.golden.json):
Rendered Samples (rendered-samples.golden.json - optional):
Test Implementation Pattern
Rationale
Why Golden Testing?
- Deterministic output: Hugo builds are reproducible with fixed input
- Comprehensive verification: Catches subtle regressions in transformations
- Living documentation: Test repos demonstrate supported features
- Regression confidence: Safe to refactor with verified snapshots
- Bug reproduction: Minimal test cases for issue reports
Why Not Full HTML Snapshots?
- Size: Complete Hugo sites are large (MBs per test)
- Brittleness: Relearn theme updates change HTML constantly
- Diffability: Binary archives are hard to review in Git
- Focus: We care about our transformations, not theme rendering
Why Separate Levels?
- Hugo config: Always verify (our direct output)
- Content structure: Lightweight, covers 90% of bugs
- Rendered HTML: Optional, only for transformation features (math, callouts, etc.)
Why JSON for Structure?
- Human-readable diffs in Git
- Easy to update with
-update-golden - Programmatic verification (no string matching)
- Selective verification (check only relevant fields)
Consequences
Positive
- Refactoring confidence: Pipeline changes, configuration updates are safe
- Bug prevention: Regressions caught before merge
- Feature documentation: Tests show what’s supported
- Developer productivity: Fast feedback on changes
- CI integration: Automated verification on every PR
Negative
- Test maintenance: Golden files need updates when output intentionally changes
- Initial investment: Writing helpers and first test cases takes time
- Storage: Git repo grows with golden files (mitigated by JSON, not archives)
- Test duration: Full builds slower than unit tests (use
-shortto skip)
Neutral
- Test location: New
test/integration/package separate from unit tests - Naming convention:
TestGolden_*prefix for golden tests - Flag dependencies: Requires
-update-goldenflag support
Status
Implemented - Golden testing framework is in place and actively used.
Notes
- Single Relearn theme simplifies testing (no theme variation needed)
- Focus on transform pipeline and configuration generation
- Integration tests cover full build workflow
Alternatives Considered
Alternative 1: Snapshot Entire Hugo Site
Rejected: Too large, brittle, not Git-friendly
Alternative 2: Compare Only File Counts
Rejected: Too coarse, misses content errors
Alternative 3: String Matching on hugo.yaml
Rejected: Brittle to formatting, whitespace
Alternative 4: No Integration Tests
Rejected: Refactoring risk too high, manual testing unreliable
References
- Testing Go Applications Using Golden Files
- Go Wiki: Table Driven Tests
- DocBuilder:
internal/hugo/*_golden_test.go(existing pattern) - Hugo: Testing Documentation