ADR-003: Fixed Transform Pipeline
Date: 2025-12-16
Status
Implemented - December 16, 2025 (Now Default)
Implementation Summary
The fixed transform pipeline is fully implemented and is now the default and only content processing system in DocBuilder.
Deliverables:
- ✅ New pipeline package (
internal/hugo/pipeline/) - ✅ Document type replacing Page/PageShim
- ✅ Generator and Transform function types
- ✅ 3 generators (main/repo/section indexes)
- ✅ 11 transforms (parse FM, normalize indexes, build FM, extract title, strip heading, rewrite links/images, keywords, metadata, edit link, serialize)
- ✅ Comprehensive unit tests (71 passing in internal/hugo)
- ✅ Old system removed - Transform registry, patch system, and all legacy code deleted
Migration Complete (December 16, 2025):
- Removed
internal/hugo/transforms/directory (24 files, registry-based system) - Removed
internal/hugo/fmcore/directory (3 files, patch merge system) - Removed visualize command and Page/PageShim abstractions
- Simplified integration code (content_copy.go: 216→13 lines)
- Net code reduction: -6,233 lines (-88%)
Test Status:
- Pipeline unit tests: PASS (6 test functions, 12 sub-tests)
- Hugo package tests: PASS (71 tests)
- Full short test suite: PASS (all packages)
- golangci-lint: 0 issues
Context
Current Architecture
DocBuilder’s content transformation system uses a registry-based, dependency-ordered pipeline with a front matter patching system:
Components:
TransformRegistry: Global registry where transforms register themselves viainit()Transforminterface: RequiresName(),Priority(),DependsOn(), andApply(page *Page) Patch- Dependency resolution: Topological sort based on
DependsOn()declarations - Patch system: Three merge modes (
MergeDeep,MergeReplace,MergeSetIfMissing) with priority-based ordering - Protected keys: Reserved front matter fields that block
MergeDeeppatches
Current transforms:
front_matter_builder_v2(priority 50): Initializes base front matterextract_index_title(priority 55): Extracts H1 from README/index filesstrip_heading: Removes first H1 from contentrelative_link_rewriter: Fixes relative markdown linksimage_link_rewriter: Fixes image paths- Various metadata injectors (repo info, edit links, etc.)
Example transform:
Problems with Current Architecture
-
Hidden complexity: Dependencies and execution order are not obvious from reading the code
-
Non-local reasoning: Understanding transform behavior requires checking:
- Registration order in
init() - Declared dependencies in
DependsOn() - Priority values across multiple transforms
- Protected key system in patching logic
- Merge mode semantics (MergeDeep vs MergeReplace)
- Registration order in
-
Debugging difficulty:
- Recent bug:
extract_index_titleextracted correct title but was silently blocked by protected keys - Required temporary debug logging to discover the issue
- Solution was non-obvious: change
MergeDeeptoMergeReplace
- Recent bug:
-
Indirection overhead:
- Registry pattern adds abstraction without benefit
- Topological sort runs on every build
- Patch merging adds cognitive overhead
-
False flexibility:
- Users cannot configure transforms dynamically
- Registry/dependency system suggests extensibility we don’t support
- Added complexity without delivering value
-
Maintenance burden:
- Adding transforms requires understanding registration, priorities, dependencies, and patch semantics
- Easy to introduce subtle bugs (wrong merge mode, missing dependency, priority conflicts)
Key Insight
DocBuilder is greenfield and we control the pipeline. We don’t need dynamic transform registration or user-configurable pipelines. We need a solid, predictable pipeline for our specific use case.
Decision
Replace the registry-based, patch-driven pipeline with a fixed, explicit transform pipeline.
Core Principles
- Fixed execution order: Transforms are called in explicit sequence defined in code
- Direct mutation: Transforms modify
Documentdirectly (no patching) - No dynamic registration: No
init()registry, no dependency declarations - Simple interfaces: Transform = function that modifies a document
- Transparent behavior: Reading the pipeline code shows exact execution order
New Architecture
Core Interfaces:
Pipeline Execution:
Example Generator (Creates New Files):
Example Transform (Modifies Existing Files):
Example Transform (Generates New Files Based on Keywords):
Migration Path
Phase 1: Create New Pipeline (Parallel)
- Define
Document,FileTransform,FileGenerator,GenerationContexttypes - Create
processContent()with generation + transform phases - Convert existing index generation logic to generators
- Convert existing transforms to new interface (one by one)
- Add comprehensive tests for new pipeline
Phase 2: Switch Over
- Update
copyContentFiles()to use new pipeline - Run integration tests to verify behavior
- Fix any discrepancies
Phase 3: Cleanup
- Remove old
Transforminterface - Remove
TransformRegistry - Remove topological sort logic
- Remove patch system (
Patch,MergeMode, protected keys) - Remove old transform files
Phase 4: Documentation
- Update copilot instructions
- Document transform pipeline in architecture docs
- Add examples for adding new transforms
Consequences
Positive
✅ Predictable: Execution order is explicit in code
✅ Debuggable: Set breakpoint in pipeline, step through transforms sequentially
✅ Testable: Test individual transforms/generators or full pipeline easily
✅ Maintainable: No magic, no hidden dependencies, no indirection
✅ Fast: No registry lookups, no topological sorting, no patch merging
✅ Simple onboarding: New developers see exact transform order immediately
✅ Reliable: Fixed pipeline means consistent, reproducible behavior
✅ Separation of concerns: Generation (creating files) separate from transformation (modifying files)
✅ Dynamic generation: Transforms can create new files based on content analysis (keywords, patterns, etc.)
✅ Composable: New documents flow through remaining transforms automatically
Negative
⚠️ Less flexible: Cannot dynamically add/remove transforms (but we don’t need this)
⚠️ Migration effort: Need to convert all existing transforms
Neutral
- Pipeline is now explicitly ordered instead of dependency-ordered
- Transforms mutate directly instead of returning patches
- Code location becomes important (pipeline defined in
generator.go)
Alternatives Considered
1. Keep Current System, Fix Bugs
Description: Continue using registry + patches, improve documentation
Rejected because:
- Doesn’t address root cause (unnecessary complexity)
- Bug was symptom of overly complex system
- Future maintainers will face same issues
2. Plugin Architecture
Description: Make transforms truly pluggable with user configuration
Rejected because:
- Massive scope increase
- Users don’t need this flexibility
- Introduces security/stability risks
- Not aligned with project goals
3. Middleware Pattern
Description: Chain of responsibility with explicit next() calls
Rejected because:
- More complex than simple function list
- Doesn’t add value for our use case
- Makes testing harder (mocking next())
Implementation Plan
✅ Completed December 16, 2025
Phase 1: Core Pipeline (Completed)
- Created
internal/hugo/pipeline/package - Implemented
Documenttype with front matter and content fields - Built
Processorwith two-phase execution (generators → transforms) - Added queue-based processing for dynamic document generation
Phase 2: Transforms Migration (Completed)
- Converted all 10 essential transforms to
FileTransformfunctions - Implemented 3 generators for index file creation
- Removed dependency on registry, patches, and Page abstraction
- All transforms use direct mutation pattern
Phase 3: Integration (Completed)
- Created
copyContentFilesPipeline()integration function - Added environment variable feature flag (
DOCBUILDER_NEW_PIPELINE=1) - Maintained backward compatibility with old system
- Updated copilot instructions
Phase 4: Testing & Validation (Completed)
- Unit tests for all generators and transforms
- Edge case coverage (empty FM, no FM, malformed FM)
- Integration via feature flag tested
- All tests passing, linter clean
Remaining Work (Separate from this ADR):
- Remove old registry/patch system
- Update golden test expectations (theme system issue)
- Make new pipeline the default
- Documentation updates
Actual effort: 1 day (vs estimated 3-5 days)
Implementation Details
File Structure
Key Design Decisions
- Direct Mutation: Documents are modified in-place, no patch merging
- Type Safety: Compile-time verification of transform signatures
- Queue-Based: Generators can add new documents during processing
- Stateless Transforms: Pure functions with no global state
- Feature Flag: Environment variable enables new pipeline without code changes
Open Questions
All questions resolved during implementation:
- Error handling: ✅ Transforms return errors, pipeline fails fast
- Transform state: ✅ Pass context via RepositoryMetadata parameter
- Partial failures: ✅ Fail fast on first error (single-pass pipeline)
- Testing strategy: ✅ Both unit tests per transform and integration tests
- Front matter parsing: ✅ Handle edge cases (empty FM, no FM, malformed FM)
- Generator ordering: ✅ All generators run before any transforms
References
- Issue: “README H1 duplicate headers” (revealed patch system complexity)
- ADR-002: In-Memory Content Pipeline (established single-pass architecture)
- Copilot Instructions: Transform pipeline section (needs update)
- Style Guide: Function naming conventions (already compatible)
Decision Makers
- @inful (Lead Developer)
Notes
This refactor aligns with DocBuilder’s greenfield status and aggressive refactoring posture. We’re optimizing for clarity and maintainability over theoretical flexibility we don’t need.