ADR-005: Documentation Linting for Pre-Commit Validation
Date: 2025-12-29
Status
Proposed
Context
DocBuilder processes documentation from multiple Git repositories, transforming markdown files into Hugo-compatible sites. Currently, developers discover issues only after committing and running builds:
Current Pain Points
- Late feedback: Filename issues (spaces, mixed-case) discovered during Hugo build
- Silent failures: Invalid frontmatter causes pages to render incorrectly or not at all
- Broken links: Cross-references break when files are renamed without updating links
- Path inconsistencies: Mixed naming conventions (
README.md,api-guide.md,My Document.md) in same repository - Asset orphans: Images referenced but not committed, or committed but never referenced
- Hugo quirks: Reserved filenames (
_index.mdvsindex.md) behave differently but look similar
Impact
- Developers commit documentation that fails to build
- CI/CD pipelines fail unexpectedly
- Manual inspection required to diagnose issues
- Inconsistent documentation quality across repositories
- Time wasted on avoidable build failures
Hugo and DocBuilder Best Practices
Hugo has specific expectations:
- Filenames become URL slugs:
My Document.md→/my%20document/(problematic) - Case sensitivity varies by OS:
README.mdvsreadme.mdcauses issues - Special files:
_index.md(section landing),index.md(leaf bundle) - Asset paths must match exactly (case-sensitive even on macOS/Windows during deployment)
DocBuilder’s discovery system (internal/docs/) walks repositories and expects:
- Lowercase filenames for predictable Hugo paths
- No spaces or special characters (except
-,_,.) - Valid UTF-8 frontmatter
- Relative paths for cross-document links
Decision
Implement a documentation linting system with multiple integration points:
- CLI command:
docbuilder lint [path]for manual validation - Git hooks: Traditional pre-commit hooks or lefthook for automatic checking
- CI/CD integration: GitHub Actions / GitLab CI step for PR validation
Architecture
Linting Rules
Rules are fixed and opinionated based on Hugo/DocBuilder best practices. No configuration override.
Filename Rules (Errors - Block Build)
Allowed pattern: [a-z0-9-_.] with exceptions for whitelisted double extensions:
.drawio.png- Draw.io embedded PNG diagrams.drawio.svg- Draw.io embedded SVG diagrams
| Rule | Severity | Rationale |
|---|---|---|
| Uppercase letters in filename | Error | Causes URL inconsistency, case-sensitivity issues |
| Spaces in filename | Error | Breaks Hugo URL generation, creates %20 in paths |
Special characters (not [a-z0-9-_.]) |
Error | Unsupported by Hugo slugify, potential shell escaping issues |
| Leading/trailing hyphens or underscores | Error | Creates malformed URLs (/-docs/ or /_temp/) |
| Double extensions (except whitelisted) | Error | Processed as markdown, causes build errors. Allowed: .drawio.png, .drawio.svg (embedded diagrams) |
Reserved names without prefix (tags.md, categories.md) |
Error | Conflicts with Hugo taxonomy URLs |
Error Message Example:
Content Rules (Errors - Block Build)
| Rule | Severity | Rationale |
|---|---|---|
| Malformed frontmatter YAML | Error | Hugo fails to parse, page skipped silently |
Missing closing --- in frontmatter |
Error | Entire file treated as frontmatter |
| Invalid frontmatter keys (duplicates) | Error | Undefined Hugo behavior |
Broken internal links ([text](/docbuilder/adr/missing)) |
Error | 404s in production, poor UX |
| Image references to non-existent files | Error | Missing images break layout |
Error Message Example:
Structure Rules (Warnings - Allow but Notify)
| Rule | Severity | Rationale |
|---|---|---|
Missing _index.md in directory with docs |
Warning | Directory won’t have landing page, appears empty in nav |
| Deeply nested structure (>4 levels) | Warning | Poor navigation UX, consider flattening |
| Orphaned assets (unreferenced images) | Warning | Bloats repository, may be leftover from deletions |
| Mixed file naming styles in same directory | Warning | Inconsistent developer experience |
Warning Message Example:
Asset Rules (Warnings - Allow but Notify)
| Rule | Severity | Rationale |
|---|---|---|
| Image filename with spaces/uppercase | Warning | Works but creates inconsistent URLs |
| Absolute URLs to internal assets | Warning | Breaks in local development, not portable |
| Large binary files (>5MB) | Warning | Git performance, consider external hosting |
Embedded diagram formats (.drawio.png, .drawio.svg) |
Info | Valid double extension for editable diagrams, explicitly allowed |
Implementation Phases
Phase 1: Core Linting Engine (Week 1)
- Implement
internal/lintpackage with rule engine - Filename validation (errors only)
- Human-readable formatter
- Unit tests for each rule
Phase 2: CLI and Manual Workflow (Week 1-2)
- Add
docbuilder lintcommand - Intelligent default path detection (
docs/ordocumentation/) - Support single file, directory, and recursive modes
- Exit codes: 0 (clean), 1 (warnings), 2 (errors)
- Colorized terminal output (red errors, yellow warnings)
Phase 3: Auto-Fix Capability (Week 2)
- Implement safe file renaming with link resolution:
- Rename files:
My Doc.md→my-doc.md - Scan all markdown files for links to renamed files
- Update internal references preserving link style (relative/absolute)
- Handle image links, inline links, reference-style links
- Preserve anchor fragments (#section) in links
- Preserve Git history (using
git mv)
- Rename files:
- Require
--fixflag and confirmation prompt showing:- Files to be renamed
- Markdown files that will be updated
- Total number of links to be modified
- Dry-run mode:
--fix --dry-runshows all changes without applying - Generate detailed fix report with before/after comparison
Phase 4: Integration Hooks (Week 3)
- Traditional pre-commit hook script (
scripts/install-hooks.sh) - Lefthook configuration (
lefthook.yml) - GitHub Actions workflow example
- GitLab CI template
- Documentation in
docs/how-to/setup-linting.md
Phase 5: Content and Structure Rules (Future)
- Frontmatter validation
- Link checking
- Orphaned asset detection
- Structure recommendations
Lint Command Interface
Default Behavior:
When run without arguments, docbuilder lint uses intelligent path detection:
- If
docs/directory exists in current directory → lintdocs/ - If
documentation/directory exists → lintdocumentation/ - Otherwise → lint current directory (
.)
Exit Codes
0: No issues found (clean)1: Warnings present but no errors2: Errors found (build would fail)3: Lint execution error (filesystem access, etc.)
Output Format
Safety Guarantees for Auto-Fix
The --fix flag will only perform transformations that are provably safe:
- Filename normalization: Lowercase + hyphenate, preserving whitelisted double extensions (reversible)
- Link updates: Update relative links in same repo (validated before commit)
- Git integration: Use
git mvto preserve history - Atomic operations: All-or-nothing (rollback on any failure)
- Backup prompt: Confirms user wants to proceed
- Dry-run first: Shows changes before applying
Default Path Detection
To minimize friction, docbuilder lint intelligently detects documentation directories:
Detection Order:
- Check for
docs/directory in current path - Check for
documentation/directory in current path - Fallback to current directory (
.)
Override Behavior:
- Explicit path argument always takes precedence:
docbuilder lint ./custom-docs - Use
docbuilder lint .to explicitly lint current directory
Rationale:
- Most projects use
docs/ordocumentation/as standard convention - Reduces cognitive load for developers (just run
docbuilder lint) - Follows principle of least surprise
- Works naturally in CI/CD where working directory is project root
Will NOT auto-fix:
- Frontmatter structure (too complex, context-dependent)
- External links (can’t validate without network)
- Content rewrites (subjective)
- Cross-repository links (affects multiple repos)
Git Hooks Integration
Option 1: Lefthook (Recommended)
Lefthook is a fast, modern Git hooks manager. Add to lefthook.yml in repository root:
Installation:
Benefits:
- Fast parallel execution
- Easier to configure and maintain
- Portable configuration (checked into repo)
- Supports multiple hooks and commands
- Auto-staging of fixed files
Option 2: Traditional Pre-Commit Hook
Install via: docbuilder lint install-hook
Generated hook at .git/hooks/pre-commit:
CI/CD Integration
GitHub Actions (.github/workflows/lint-docs.yml):
GitLab CI (.gitlab-ci.yml):
Auto-Fix Implementation: Link Resolution
When the --fix flag is used, renaming files requires updating all internal markdown links that reference those files. This is critical to prevent broken documentation after auto-fixing filename issues.
Link Resolution Strategy
Supported Link Types
The fixer must handle all common markdown link patterns:
Phase 2: Path Resolution
Phase 3: Generate Replacement
Phase 4: Apply Updates
Edge Cases and Safety
Case 1: External URLs
Resolution: Skip. Only update links to local files. Detect by checking for protocol scheme (http://, https://).
Case 2: Broken Links
Resolution: If link target doesn’t exist and matches old filename pattern, report separately as “potential broken link” but don’t update.
Case 3: Multiple Files Same Name
Resolution: Use full path matching. Only update links that resolve to the specific file being renamed.
Case 4: Circular References
Resolution: No special handling needed. Each file rename updates its own references independently.
Case 5: Links in Code Blocks
Resolution: Don’t update links inside code blocks. Use markdown parser to identify fenced code blocks and skip them.
Case 6: Case-Insensitive Filesystems
Resolution: Perform case-insensitive path comparison when checking if link targets the file being renamed.
User Confirmation Flow
When --fix flag is used without --yes, show interactive confirmation:
Dry-Run Output
docbuilder lint --fix --dry-run shows what would change without applying:
Implementation Phases
Phase 3a: Basic Renaming
- File rename with
git mvsupport - Confirmation prompts
- Dry-run mode
Phase 3b: Link Discovery
- Scan markdown files for links
- Parse inline, reference, and image links
- Resolve relative paths
Phase 3c: Link Updates
- Generate replacement text
- Apply updates atomically
- Rollback on failure
Phase 3d: Edge Cases
- Skip external URLs
- Handle code blocks
- Case-insensitive matching
Phase 3e: Reporting
- Detailed fix report
- Dry-run preview
- Interactive confirmation
Testing Strategy
Follow ADR-001 golden testing approach:
Test Coverage:
- Each rule has unit test with valid/invalid cases
- Integration tests run linter on test directories
- Golden files verify exact error messages
- Auto-fix tests verify safe transformations
- Test file renaming with git mv
- Test link discovery and resolution
- Test link updates preserve style (relative/absolute)
- Test anchor fragments are preserved
- Test external URLs are not modified
- Test links in code blocks are ignored
- Test rollback on failure
- Pre-commit hook tested via Git test repository
- Default path detection tested (docs/, documentation/, fallback)
- Link resolution tests:
- Unit tests for path resolution (relative → absolute)
- Unit tests for link regex patterns (inline, reference, image)
- Integration tests with before/after directory structures
- Edge case tests (external URLs, code blocks, broken links)
- Case-insensitive filesystem tests
Keeping Linting Rules Synchronized with DocBuilder
The linting system must stay synchronized with DocBuilder’s actual behavior to remain useful. As DocBuilder evolves—adding new features or changing how it processes documentation—the linter must reflect these changes.
Synchronization Strategy
1. Shared Test Infrastructure
Linting rules should be validated against actual DocBuilder behavior, not assumptions:
2. Integration Tests with Full Pipeline
Periodically run integration tests that:
- Create test repositories with various violations
- Run
docbuilder buildon them - Verify linter warnings/errors match actual build issues
- Catch cases where linter is too strict or too lenient
Example test structure:
3. Version Alignment
Linting rules should evolve with DocBuilder versions:
| DocBuilder Version | Linter Rule Changes |
|---|---|
| 1.0 - 1.5 | Basic filename and frontmatter rules |
| 1.6+ | Enhanced frontmatter schema validation |
| 2.0+ | Asset transformation and link validation |
| Future | Custom Hugo module support detection |
Version compatibility approach:
- Linter reports its “target DocBuilder version”
- Warns if linting against much older/newer DocBuilder behavior
- Can optionally validate against multiple versions
4. Feature Detection
When DocBuilder adds new features, update linter rules accordingly:
| DocBuilder Feature | Linting Rule Update |
|---|---|
| New frontmatter field support | Add validation for new fields |
| Asset transformation (WebP) | Allow new file extensions |
| Custom shortcodes | Validate shortcode syntax |
| Multi-language support | Validate language-specific paths |
| Repository metadata injection | Validate editURL patterns |
5. Documentation Cross-References
Maintain bidirectional links between linter rules and DocBuilder documentation:
6. Automated Sync Checks
Add CI checks to prevent drift:
7. Maintenance Workflow
When DocBuilder changes:
-
Feature Addition:
- Update linter to recognize new valid patterns
- Add test cases for new feature
- Update
docs/reference/lint-rules.md
-
Deprecation:
- Linter warns about deprecated patterns
- Provide migration suggestions
- Eventually promote warnings to errors
-
Bug Fixes:
- If DocBuilder now accepts something it previously rejected, update linter
- Add regression test
- Update golden test files
8. Rule Evolution Process
9. Feedback Loop
Monitor false positives/negatives:
- Track GitHub issues tagged
linter-false-positiveorlinter-missed-issue - Periodic review of linter vs actual build failures
- User feedback in success metrics (see Success Metrics section)
10. Living Documentation
Maintain a changelog specifically for linting rules:
Practical Example: Adding Asset Transformation Support
Scenario: DocBuilder adds support for automatic WebP image conversion.
Synchronization steps:
-
Detect the change: PR adds WebP transformation to
internal/docs/assets.go -
Update linter rules:
- Add tests:
-
Update documentation:
docs/reference/lint-rules.md: Add WebP to allowed asset formats- Update asset rules table with WebP examples
-
Release together: Linter v1.8.0 released alongside DocBuilder v1.8.0
Ownership and Responsibility
- Core team: Maintains synchronization, reviews PRs for drift
- Feature developers: Update linter rules when adding DocBuilder features
- PR checklist: “Have you updated linting rules if applicable?”
- Quarterly review: Check for accumulated drift, plan alignment work
Consequences
Positive
- Early feedback: Developers catch issues before commit
- Consistent quality: Opinionated rules enforce best practices
- Better documentation: Improved structure and linking
- Faster CI: Fewer build failures from preventable issues
- Self-documenting: Error messages teach Hugo conventions
- Safe automation:
--fixflag reduces manual renaming work - Zero configuration: Intelligent defaults work out of the box (auto-detects
docs/)
Negative
- Initial friction: Existing repositories may have many violations
- Migration effort: Teams must fix legacy documentation
- Learning curve: Developers learn new rules
- Hook conflicts: May conflict with other pre-commit tools
Mitigation
- Gradual rollout: Start with warnings, move to errors over time
- Migration guide: Document bulk-fixing existing repositories
- Rule documentation: Comprehensive explanation of each rule
- Opt-in initially: Teams adopt voluntarily before enforcement
Migration Path
Week 1: Soft Launch
- Release
docbuilder lintcommand (warnings only) - Documentation in
docs/how-to/ - Encourage voluntary adoption
Week 2: Team Testing
- Select 2-3 pilot repositories
- Run
docbuilder lint --fixto clean up - Gather feedback on rules and messages
Week 3: Git Hooks
- Publish traditional hook installer
- Add lefthook.yml to repository template
- Provide team-wide installation guide (both options)
- Keep as warnings (non-blocking)
Month 2: CI Integration
- Add CI workflow to template repositories
- Start blocking PRs with errors (not warnings)
- Monitor false positives, adjust rules if needed
Month 3: Full Enforcement
- All repositories have lint checks
- Warnings promoted to errors where appropriate
- Legacy repositories cleaned up or exempted
Future Enhancements
Content Linting (Phase 5)
- Spell checking (en-US by default, configurable)
- Markdown style consistency (headings, lists, code blocks)
- Accessibility checks (alt text, heading hierarchy)
- SEO recommendations (meta descriptions, keywords)
Advanced Asset Handling
- Accessibility score for images (alt text quality)
IDE Integration
- VS Code extension for real-time linting
- Language server protocol (LSP) for any editor
- Inline quick-fixes and refactorings
Smart Fixes
- Automatic frontmatter generation from content
- Link suggestion for orphaned sections
- Batch rename with preview
Examples
Example 1: Clean Repository (Using Defaults)
Example 1b: No docs/ Directory Found
Example 2: Filename Issues
Example 3: Auto-Fix Dry Run
Example 4: Lefthook Integration
Usage:
References
- Hugo URL Management
- Hugo Content Organization
- Git Pre-Commit Hooks
- Lefthook Documentation
- Markdown Best Practices
- ADR-001: Golden Testing Strategy (for test approach)
- ADR-000: Uniform Error Handling (for error reporting)
Implementation Checklist
Phase 1: Core Linting Engine ✅
- Create
internal/lintpackage (5 files: types, rules, linter, formatter, tests) - Implement filename rules with whitelisted extensions (.drawio.png, .drawio.svg)
- Human-readable text formatter with colorization and NO_COLOR support
- JSON formatter for CI/CD integration
- Unit tests for each rule (11 comprehensive test cases)
- Standard file filtering (README, CONTRIBUTING, CHANGELOG, etc.)
- Intelligent default path detection (
docs/,documentation/, fallback to.)
Phase 2: CLI Implementation ✅
- Add
docbuilder lintCLI command with Kong integration - Exit code handling (0=clean, 1=warnings, 2=errors, 3=execution error)
- Output format flags:
--format=text|json - Verbosity control:
--quiet,--verbose - Color detection with NO_COLOR environment variable
- Duplicate error prevention (consolidated uppercase/special char reporting)
Phase 3: Auto-Fix Capability (Link Resolution)
- Comprehensive link resolution strategy documented in ADR
- Phase 3a: Basic file renaming with git mv support
- File rename implementation
- Git mv integration for history preservation
- Dry-run mode (
--fix --dry-run) - Force flag for overwriting existing files
- Comprehensive test coverage (8 tests)
- Interactive confirmation prompts (deferred to Phase 3e)
- Detailed preview of changes (deferred to Phase 3e)
- Phase 3b: Link discovery and path resolution
- Regex patterns for inline, reference, image links
- Relative path resolution to absolute workspace paths
- Link reference tracking (source file, line number, type)
- External URL detection and exclusion
- Code block detection and exclusion
- Anchor fragment preservation
- Comprehensive test coverage (13 tests, 379 lines)
- Phase 3c: Link updates with atomic operations
- Generate replacement text preserving style
- Atomic file updates with rollback on failure
- Preserve anchor fragments (#section) in updated links
- Test coverage for anchor fragment preservation
- Test coverage for rollback mechanism
- Phase 3d: Edge case handling
- Skip external URLs (protocol detection) - already implemented in Phase 3b
- Ignore links in code blocks (markdown parser) - already implemented in Phase 3b
- Case-insensitive filesystem support
- Broken link detection and reporting
- Phase 3e: Reporting and interactive confirmation
- Detailed fix report with statistics
- Interactive confirmation showing files + links affected
- Dry-run preview with before/after comparison
- Backup creation (.docbuilder-backup-{timestamp}/)
Phase 4: Git Hooks Integration
- Traditional pre-commit hook script (
scripts/install-hooks.sh) - Hook installer command:
docbuilder lint install-hook - Lefthook configuration example (
lefthook.yml) - Test with staged files workflow
Phase 5: CI/CD Integration
- GitHub Actions workflow example (
.github/workflows/lint-docs.yml) - GitLab CI template (
.gitlab-ci.yml) - JSON output schema documentation
- PR comment integration examples
Testing
- Integration tests with golden files for core lint functionality
- Valid scenarios: correct filenames, whitelisted extensions
- Invalid scenarios: mixed-case, spaces, special chars, double extensions
- Golden file generation with
-update-goldenflag - Normalized path comparison for system-independence
- 6 comprehensive test cases covering all current rules
- Integration golden tests for auto-fix functionality (Phase 3)
- Before/after directory structures with realistic test data
- TestGoldenAutoFix_FileRenameWithLinkUpdates: Complete fix workflow
- TestGoldenAutoFix_DryRun: Dry-run mode output verification
- TestGoldenAutoFix_BrokenLinkDetection: Broken link reporting
- Sorted results for consistent comparison across runs
- Normalized paths (filenames only) for portability
- Integration tests for lint-DocBuilder sync
- TestLintDocBuilderSync: Full build pipeline → lint validation
- TestLintDocBuilderSync_FileNaming: Filename convention compliance
- Test repository with cross-reference links (./relative-link.md syntax)
- Link transformation bug fixes (strip ./ prefix in transform_links.go)
- Linter path resolution enhancements (Hugo site-absolute paths in fixer.go)
- CI workflow to detect rule drift
- GitHub Actions workflow: .github/workflows/detect-rule-drift.yml
- Weekly schedule (Sunday midnight) + manual dispatch
- Single theme testing (Relearn only)
- Artifact uploads (90-day retention)
- PR comment integration on drift detection
Documentation
-
docs/how-to/setup-linting.md- Setup and usage guide (completed) -
docs/reference/lint-rules.md- Complete rule reference (completed) -
docs/reference/lint-rules-changelog.md- Rule version history (completed) -
docs/how-to/migrate-to-linting.md- Migration guide for existing repositories (completed) -
docs/how-to/ci-cd-linting.md- CI/CD integration examples (completed)
Future Enhancements
- VS Code extension for real-time linting
- Content linting rules (spell checking, style consistency)
- Advanced asset handling (accessibility checks)
Success Metrics
After 3 months of deployment:
- 90%+ of commits pass linting without errors
- 50% reduction in documentation-related CI failures
- Positive developer feedback (survey)
- <5% false positive rate on errors
- Active usage of
--fixflag (telemetry)
Decision Owner: [To be assigned]
Stakeholders: Development team, documentation maintainers, DevOps
Review Date: 3 months after implementation
[adr-005-documentation-linting-for-pre-commit-validation](https://docs.home.luguber.info/_uid/ef6dd6b5-904d-4ec9-94f2-bc3fe2699cd1/)