Skip to main content

E4:S17:T05 – Document Failure Modes and Rollback Guidance

Task ID: E4:S17:T05
Status: ✅ COMPLETE
Priority: HIGH
Epic: E4 – Kanban Framework
Story: E4:S17 – Kanban Package Installation Evaluation
Version Anchor: ✅ COMPLETE (v0.4.17.5+1)


Scope

Document failure modes and rollback guidance for the Kanban framework package installation to ensure users can recover from installation failures and rollback problematic installations. This documentation covers common failure scenarios, error messages, troubleshooting steps, and recovery procedures.

Scope includes:

  • Identify common failure modes during installation
  • Document error messages and their meanings
  • Create troubleshooting guides for each failure mode
  • Document rollback procedures
  • Provide recovery paths and best practices

Inputs

  • Installer behavior and error handling:

    • packages/frameworks/kanban/scripts/install_kanban_framework.py
    • packages/frameworks/kanban/scripts/migrate_structure.py
    • packages/frameworks/kanban/scripts/validate_installation.py
  • Backup and recovery mechanisms

  • Error messages and validation outputs

  • Installation validation from T02-T04 (baseline for comparison)


Deliverables

  1. Failure mode list documenting:

    • Common failure scenarios
    • Error messages and symptoms
    • Root causes
    • Impact assessment
  2. Rollback guidance documenting:

    • Rollback procedures for each failure mode
    • Backup restoration steps
    • Recovery verification procedures
    • Best practices for preventing failures

Approach

  1. Review installer error handling for failure modes
  2. Identify Common Failure Modes
    • Validation failures
    • Migration failures
    • Backup failures
    • Configuration failures
  3. Document Error Messages
    • Map error messages to failure modes
    • Document error meanings and implications
  4. Create Rollback Procedures
    • Document backup restoration
    • Document manual recovery steps
    • Document verification procedures
  5. Create Troubleshooting Guides
    • Step-by-step troubleshooting for each failure mode
    • Recovery paths and best practices

Acceptance Criteria

  • Common failure modes identified and documented ✅
  • Error messages documented with meanings ✅
  • Rollback procedures created for each failure mode ✅
  • Troubleshooting guides created ✅
  • Recovery paths documented ✅
  • Best practices documented ✅

Failure Modes

Failure Mode 1: Validation Errors

Symptoms:

  • Installation blocked by validation errors
  • Error messages about Epic mashup or conflicts
  • Validation script reports errors

Error Messages:

❌ ERRORS (must be fixed before installation):
❌ CRITICAL: Epic 9 contains 'Book Related Work' but canonical Epic 9 is
'User Management and Authentication'. This is the root cause of Epic mashup.

Root Causes:

  • Epic mashup (project-specific content in canonical range)
  • Epic numbering conflicts
  • Canonical conflicts detected
  • Version file path issues

Impact: HIGH - Installation blocked until errors resolved

Recovery Steps:

  1. Review validation errors
  2. Fix Epic numbering issues (rename project epics to Epic 24+)
  3. Resolve canonical conflicts
  4. Re-run validation: python3 scripts/validate_installation.py --kanban-path docs/project-management/kanban
  5. Re-run installation once validation passes

Prevention:

  • Always use installer (don't manually copy epics)
  • Use canonical templates (not ai-dev-kit's actual Kanban)
  • Follow Epic numbering (Epic 1-23 canonical, Epic 24+ project-specific)

Failure Mode 2: Migration Failures

Symptoms:

  • Migration script fails with error
  • Migration report shows errors
  • Structure not migrated correctly

Error Messages:

❌ Migration failed: [error details]

Root Causes:

  • Analysis report not found or invalid
  • Backup creation failed
  • File permission issues
  • Disk space issues
  • Invalid migration mode

Impact: HIGH - Migration incomplete, structure may be corrupted

Recovery Steps:

  1. Check backup: Verify backup was created before migration
  2. Restore from backup:
    # Locate backup directory (created before migration)
    ls -la docs/project-management/_backup-*

    # Restore from backup
    rm -rf docs/project-management/kanban
    cp -r docs/project-management/_backup-YYYYMMDD-HHMMSS docs/project-management/kanban
  3. Review error logs: Check migration report for specific errors
  4. Fix root cause: Address file permissions, disk space, or analysis report issues
  5. Re-run migration: Use --dry-run first to preview changes

Prevention:

  • Always run --dry-run before actual migration
  • Ensure sufficient disk space
  • Verify file permissions before migration
  • Review analysis report before proceeding

Failure Mode 3: Backup Creation Failures

Symptoms:

  • Backup creation fails or is cancelled
  • Migration proceeds without backup
  • No backup directory created

Error Messages:

❌ Error: Backup creation failed or was cancelled.

Root Causes:

  • Insufficient disk space
  • File permission issues
  • Backup directory already exists (user cancelled)
  • Path issues

Impact: CRITICAL - No rollback available if migration fails

Recovery Steps:

  1. Manual backup before proceeding:
    # Create manual backup
    cp -r docs/project-management/kanban docs/project-management/kanban-backup-manual
  2. Fix backup issues:
    • Free up disk space
    • Fix file permissions
    • Remove existing backup directory if needed
  3. Re-run installation: Backup will be created automatically

Prevention:

  • Ensure sufficient disk space before installation
  • Verify write permissions on kanban directory
  • Review backup directory before installation

Failure Mode 4: Configuration Failures

Symptoms:

  • Configuration files not created correctly
  • Path issues with kanban directory
  • Version file integration fails

Error Messages:

⚠️  Warning: Version file not found at expected path

Root Causes:

  • Project structure differs from expected
  • Version file path not configured
  • Kanban path misconfigured

Impact: MEDIUM - Installation may succeed but validation fails

Recovery Steps:

  1. Verify kanban path:
    # Check kanban directory exists
    ls -la docs/project-management/kanban
  2. Configure version file path:
    • Update validator configuration if needed
    • Verify version file exists at expected path
  3. Re-run validation: Verify configuration is correct

Prevention:

  • Use standard project structure
  • Configure version file path before installation
  • Review configuration requirements in T04

Failure Mode 5: Partial Installation

Symptoms:

  • Installation starts but doesn't complete
  • Some files created, others missing
  • Inconsistent structure

Error Messages:

⚠️  Installation incomplete - some files may be missing

Root Causes:

  • Installation interrupted
  • Script errors during execution
  • Disk space issues during installation

Impact: HIGH - Incomplete installation, structure may be inconsistent

Recovery Steps:

  1. Assess damage: Check what was installed vs. what's missing
  2. Restore from backup: If backup exists, restore and restart
  3. Manual cleanup: Remove partially installed files
  4. Re-run installation: Start fresh installation

Prevention:

  • Don't interrupt installation process
  • Ensure sufficient disk space
  • Run installation in stable environment

Rollback Procedures

Procedure 1: Restore from Automatic Backup

When to Use: Migration created automatic backup before proceeding

Steps:

  1. Locate backup directory:

    # Find backup directory (created before migration)
    ls -la docs/project-management/_backup-*
  2. Verify backup contents:

    # Check backup contains kanban structure
    ls -la docs/project-management/_backup-YYYYMMDD-HHMMSS/
  3. Restore from backup:

    # Remove current (problematic) installation
    rm -rf docs/project-management/kanban

    # Restore from backup
    cp -r docs/project-management/_backup-YYYYMMDD-HHMMSS docs/project-management/kanban
  4. Verify restoration:

    # Run validation to verify restoration
    python3 scripts/validate_installation.py --kanban-path docs/project-management/kanban

Success Criteria:

  • Kanban structure restored
  • Validation passes
  • No errors reported

Procedure 2: Restore from Git

When to Use: Changes committed to Git before installation

Steps:

  1. Check Git status:

    git status
  2. Restore from Git:

    # Restore kanban directory from Git
    git checkout HEAD -- docs/project-management/kanban/
  3. Verify restoration:

    # Run validation
    python3 scripts/validate_installation.py --kanban-path docs/project-management/kanban

Success Criteria:

  • Kanban structure restored from Git
  • Validation passes
  • No errors reported

Procedure 3: Manual Cleanup and Reinstall

When to Use: No backup available, installation failed

Steps:

  1. Remove failed installation:

    # Remove kanban directory
    rm -rf docs/project-management/kanban
  2. Clean up any partial files:

    # Remove any partial installation artifacts
    rm -f detection_report.json analysis_report.json migration_report.json
  3. Re-run installation:

    # Start fresh installation
    python3 scripts/install_kanban_framework.py --mode fresh

Success Criteria:

  • Clean installation
  • Validation passes
  • No errors reported

Troubleshooting Guide

Issue: Installation Blocked by Validation Errors

Symptoms:

  • Validation errors prevent installation
  • Epic mashup detected
  • Canonical conflicts found

Troubleshooting Steps:

  1. Review validation errors:

    python3 scripts/validate_installation.py --kanban-path docs/project-management/kanban
  2. Fix Epic numbering:

    • Rename project epics to Epic 24+ range
    • Ensure Epic 1-23 are canonical only
  3. Resolve conflicts:

    • Review conflict details in validation output
    • Fix epic conflicts before proceeding
  4. Re-run validation:

    python3 scripts/validate_installation.py --kanban-path docs/project-management/kanban
  5. Proceed with installation: Once validation passes


Issue: Migration Fails Mid-Process

Symptoms:

  • Migration starts but fails partway through
  • Partial migration completed
  • Error messages during migration

Troubleshooting Steps:

  1. Check backup: Verify backup was created
  2. Review migration report: Check migration_report.json for errors
  3. Restore from backup: Use Procedure 1 above
  4. Fix root cause: Address specific error from migration report
  5. Re-run migration: Use --dry-run first to preview

Issue: Backup Creation Fails

Symptoms:

  • Backup creation fails or is cancelled
  • No backup directory created
  • Migration proceeds without backup

Troubleshooting Steps:

  1. Check disk space:

    df -h docs/project-management/
  2. Check permissions:

    ls -la docs/project-management/
  3. Create manual backup:

    cp -r docs/project-management/kanban docs/project-management/kanban-backup-manual
  4. Fix issues: Free space or fix permissions

  5. Re-run installation: Backup will be created automatically


Recovery Best Practices

Before Installation

  1. Create manual backup:

    cp -r docs/project-management/kanban docs/project-management/kanban-backup-pre-install
  2. Commit current state to Git:

    git add docs/project-management/kanban/
    git commit -m "Backup before Kanban installation"
  3. Verify disk space:

    df -h docs/project-management/
  4. Run dry-run first:

    python3 scripts/install_kanban_framework.py --mode migration --dry-run

During Installation

  1. Monitor installation progress: Watch for errors
  2. Don't interrupt: Let installation complete
  3. Note backup location: Record backup directory path
  4. Save error messages: Copy any error messages for troubleshooting

After Installation

  1. Run validation:

    python3 scripts/validate_installation.py --kanban-path docs/project-management/kanban
  2. Verify structure:

    • Check epic documents exist
    • Verify story documents (if created)
    • Confirm kanban board exists
  3. Test functionality:

    • Create test epic/story
    • Verify templates work
    • Test integration with versioning (if applicable)

Summary

Failure Modes Documented

Failure ModeSeverityRecovery AvailablePrevention
Validation ErrorsHIGHManual fixUse installer, follow numbering
Migration FailuresHIGHBackup restoreDry-run first, check prerequisites
Backup FailuresCRITICALManual backupCheck disk space, permissions
Configuration FailuresMEDIUMManual fixConfigure paths before install
Partial InstallationHIGHCleanup & reinstallDon't interrupt, ensure space

Rollback Procedures

  1. Automatic Backup Restore: Use backup created by installer
  2. Git Restore: Restore from Git if changes committed
  3. Manual Cleanup: Remove failed installation and reinstall

Recovery Best Practices

  • Always create manual backup before installation
  • Commit current state to Git before installation
  • Run dry-run before actual installation
  • Monitor installation progress
  • Run validation after installation

  • E4:S17:T01: Enumerate supported installation paths (completed)
  • E4:S17:T02: Validate fresh install steps (completed)
  • E4:S17:T03: Validate migration/update paths (completed)
  • E4:S17:T04: Verify post-install configuration and validation steps (completed)
  • E4:S17:T06: Capture documentation gaps and improvements (next task)

Notes

  • Failure modes identified through code review and error analysis
  • Rollback procedures tested through documentation review
  • Recovery best practices based on installer behavior and error handling
  • RC readiness confirmed with comprehensive failure mode documentation