E4:S17:T05 – Document Failure Modes and Rollback Guidance
Task ID: E4:S17:T05
Status: ✅ COMPLETE
Priority: HIGH
Epic: E4 – Kanban Framework
Story: E4:S17 – Kanban Package Installation Evaluation
Version Anchor: ✅ COMPLETE (v0.4.17.5+1)
Scope
Document failure modes and rollback guidance for the Kanban framework package installation to ensure users can recover from installation failures and rollback problematic installations. This documentation covers common failure scenarios, error messages, troubleshooting steps, and recovery procedures.
Scope includes:
- Identify common failure modes during installation
- Document error messages and their meanings
- Create troubleshooting guides for each failure mode
- Document rollback procedures
- Provide recovery paths and best practices
Inputs
-
Installer behavior and error handling:
packages/frameworks/kanban/scripts/install_kanban_framework.pypackages/frameworks/kanban/scripts/migrate_structure.pypackages/frameworks/kanban/scripts/validate_installation.py
-
Backup and recovery mechanisms
-
Error messages and validation outputs
-
Installation validation from T02-T04 (baseline for comparison)
Deliverables
-
Failure mode list documenting:
- Common failure scenarios
- Error messages and symptoms
- Root causes
- Impact assessment
-
Rollback guidance documenting:
- Rollback procedures for each failure mode
- Backup restoration steps
- Recovery verification procedures
- Best practices for preventing failures
Approach
- Review installer error handling for failure modes
- Identify Common Failure Modes
- Validation failures
- Migration failures
- Backup failures
- Configuration failures
- Document Error Messages
- Map error messages to failure modes
- Document error meanings and implications
- Create Rollback Procedures
- Document backup restoration
- Document manual recovery steps
- Document verification procedures
- Create Troubleshooting Guides
- Step-by-step troubleshooting for each failure mode
- Recovery paths and best practices
Acceptance Criteria
- Common failure modes identified and documented ✅
- Error messages documented with meanings ✅
- Rollback procedures created for each failure mode ✅
- Troubleshooting guides created ✅
- Recovery paths documented ✅
- Best practices documented ✅
Failure Modes
Failure Mode 1: Validation Errors
Symptoms:
- Installation blocked by validation errors
- Error messages about Epic mashup or conflicts
- Validation script reports errors
Error Messages:
❌ ERRORS (must be fixed before installation):
❌ CRITICAL: Epic 9 contains 'Book Related Work' but canonical Epic 9 is
'User Management and Authentication'. This is the root cause of Epic mashup.
Root Causes:
- Epic mashup (project-specific content in canonical range)
- Epic numbering conflicts
- Canonical conflicts detected
- Version file path issues
Impact: HIGH - Installation blocked until errors resolved
Recovery Steps:
- Review validation errors
- Fix Epic numbering issues (rename project epics to Epic 24+)
- Resolve canonical conflicts
- Re-run validation:
python3 scripts/validate_installation.py --kanban-path docs/project-management/kanban - Re-run installation once validation passes
Prevention:
- Always use installer (don't manually copy epics)
- Use canonical templates (not ai-dev-kit's actual Kanban)
- Follow Epic numbering (Epic 1-23 canonical, Epic 24+ project-specific)
Failure Mode 2: Migration Failures
Symptoms:
- Migration script fails with error
- Migration report shows errors
- Structure not migrated correctly
Error Messages:
❌ Migration failed: [error details]
Root Causes:
- Analysis report not found or invalid
- Backup creation failed
- File permission issues
- Disk space issues
- Invalid migration mode
Impact: HIGH - Migration incomplete, structure may be corrupted
Recovery Steps:
- Check backup: Verify backup was created before migration
- Restore from backup:
# Locate backup directory (created before migration)
ls -la docs/project-management/_backup-*
# Restore from backup
rm -rf docs/project-management/kanban
cp -r docs/project-management/_backup-YYYYMMDD-HHMMSS docs/project-management/kanban - Review error logs: Check migration report for specific errors
- Fix root cause: Address file permissions, disk space, or analysis report issues
- Re-run migration: Use
--dry-runfirst to preview changes
Prevention:
- Always run
--dry-runbefore actual migration - Ensure sufficient disk space
- Verify file permissions before migration
- Review analysis report before proceeding
Failure Mode 3: Backup Creation Failures
Symptoms:
- Backup creation fails or is cancelled
- Migration proceeds without backup
- No backup directory created
Error Messages:
❌ Error: Backup creation failed or was cancelled.
Root Causes:
- Insufficient disk space
- File permission issues
- Backup directory already exists (user cancelled)
- Path issues
Impact: CRITICAL - No rollback available if migration fails
Recovery Steps:
- Manual backup before proceeding:
# Create manual backup
cp -r docs/project-management/kanban docs/project-management/kanban-backup-manual - Fix backup issues:
- Free up disk space
- Fix file permissions
- Remove existing backup directory if needed
- Re-run installation: Backup will be created automatically
Prevention:
- Ensure sufficient disk space before installation
- Verify write permissions on kanban directory
- Review backup directory before installation
Failure Mode 4: Configuration Failures
Symptoms:
- Configuration files not created correctly
- Path issues with kanban directory
- Version file integration fails
Error Messages:
⚠️ Warning: Version file not found at expected path
Root Causes:
- Project structure differs from expected
- Version file path not configured
- Kanban path misconfigured
Impact: MEDIUM - Installation may succeed but validation fails
Recovery Steps:
- Verify kanban path:
# Check kanban directory exists
ls -la docs/project-management/kanban - Configure version file path:
- Update validator configuration if needed
- Verify version file exists at expected path
- Re-run validation: Verify configuration is correct
Prevention:
- Use standard project structure
- Configure version file path before installation
- Review configuration requirements in T04
Failure Mode 5: Partial Installation
Symptoms:
- Installation starts but doesn't complete
- Some files created, others missing
- Inconsistent structure
Error Messages:
⚠️ Installation incomplete - some files may be missing
Root Causes:
- Installation interrupted
- Script errors during execution
- Disk space issues during installation
Impact: HIGH - Incomplete installation, structure may be inconsistent
Recovery Steps:
- Assess damage: Check what was installed vs. what's missing
- Restore from backup: If backup exists, restore and restart
- Manual cleanup: Remove partially installed files
- Re-run installation: Start fresh installation
Prevention:
- Don't interrupt installation process
- Ensure sufficient disk space
- Run installation in stable environment
Rollback Procedures
Procedure 1: Restore from Automatic Backup
When to Use: Migration created automatic backup before proceeding
Steps:
-
Locate backup directory:
# Find backup directory (created before migration)
ls -la docs/project-management/_backup-* -
Verify backup contents:
# Check backup contains kanban structure
ls -la docs/project-management/_backup-YYYYMMDD-HHMMSS/ -
Restore from backup:
# Remove current (problematic) installation
rm -rf docs/project-management/kanban
# Restore from backup
cp -r docs/project-management/_backup-YYYYMMDD-HHMMSS docs/project-management/kanban -
Verify restoration:
# Run validation to verify restoration
python3 scripts/validate_installation.py --kanban-path docs/project-management/kanban
Success Criteria:
- Kanban structure restored
- Validation passes
- No errors reported
Procedure 2: Restore from Git
When to Use: Changes committed to Git before installation
Steps:
-
Check Git status:
git status -
Restore from Git:
# Restore kanban directory from Git
git checkout HEAD -- docs/project-management/kanban/ -
Verify restoration:
# Run validation
python3 scripts/validate_installation.py --kanban-path docs/project-management/kanban
Success Criteria:
- Kanban structure restored from Git
- Validation passes
- No errors reported
Procedure 3: Manual Cleanup and Reinstall
When to Use: No backup available, installation failed
Steps:
-
Remove failed installation:
# Remove kanban directory
rm -rf docs/project-management/kanban -
Clean up any partial files:
# Remove any partial installation artifacts
rm -f detection_report.json analysis_report.json migration_report.json -
Re-run installation:
# Start fresh installation
python3 scripts/install_kanban_framework.py --mode fresh
Success Criteria:
- Clean installation
- Validation passes
- No errors reported
Troubleshooting Guide
Issue: Installation Blocked by Validation Errors
Symptoms:
- Validation errors prevent installation
- Epic mashup detected
- Canonical conflicts found
Troubleshooting Steps:
-
Review validation errors:
python3 scripts/validate_installation.py --kanban-path docs/project-management/kanban -
Fix Epic numbering:
- Rename project epics to Epic 24+ range
- Ensure Epic 1-23 are canonical only
-
Resolve conflicts:
- Review conflict details in validation output
- Fix epic conflicts before proceeding
-
Re-run validation:
python3 scripts/validate_installation.py --kanban-path docs/project-management/kanban -
Proceed with installation: Once validation passes
Issue: Migration Fails Mid-Process
Symptoms:
- Migration starts but fails partway through
- Partial migration completed
- Error messages during migration
Troubleshooting Steps:
- Check backup: Verify backup was created
- Review migration report: Check
migration_report.jsonfor errors - Restore from backup: Use Procedure 1 above
- Fix root cause: Address specific error from migration report
- Re-run migration: Use
--dry-runfirst to preview
Issue: Backup Creation Fails
Symptoms:
- Backup creation fails or is cancelled
- No backup directory created
- Migration proceeds without backup
Troubleshooting Steps:
-
Check disk space:
df -h docs/project-management/ -
Check permissions:
ls -la docs/project-management/ -
Create manual backup:
cp -r docs/project-management/kanban docs/project-management/kanban-backup-manual -
Fix issues: Free space or fix permissions
-
Re-run installation: Backup will be created automatically
Recovery Best Practices
Before Installation
-
Create manual backup:
cp -r docs/project-management/kanban docs/project-management/kanban-backup-pre-install -
Commit current state to Git:
git add docs/project-management/kanban/
git commit -m "Backup before Kanban installation" -
Verify disk space:
df -h docs/project-management/ -
Run dry-run first:
python3 scripts/install_kanban_framework.py --mode migration --dry-run
During Installation
- Monitor installation progress: Watch for errors
- Don't interrupt: Let installation complete
- Note backup location: Record backup directory path
- Save error messages: Copy any error messages for troubleshooting
After Installation
-
Run validation:
python3 scripts/validate_installation.py --kanban-path docs/project-management/kanban -
Verify structure:
- Check epic documents exist
- Verify story documents (if created)
- Confirm kanban board exists
-
Test functionality:
- Create test epic/story
- Verify templates work
- Test integration with versioning (if applicable)
Summary
Failure Modes Documented
| Failure Mode | Severity | Recovery Available | Prevention |
|---|---|---|---|
| Validation Errors | HIGH | Manual fix | Use installer, follow numbering |
| Migration Failures | HIGH | Backup restore | Dry-run first, check prerequisites |
| Backup Failures | CRITICAL | Manual backup | Check disk space, permissions |
| Configuration Failures | MEDIUM | Manual fix | Configure paths before install |
| Partial Installation | HIGH | Cleanup & reinstall | Don't interrupt, ensure space |
Rollback Procedures
- Automatic Backup Restore: Use backup created by installer
- Git Restore: Restore from Git if changes committed
- Manual Cleanup: Remove failed installation and reinstall
Recovery Best Practices
- Always create manual backup before installation
- Commit current state to Git before installation
- Run dry-run before actual installation
- Monitor installation progress
- Run validation after installation
Related Work
- E4:S17:T01: Enumerate supported installation paths (completed)
- E4:S17:T02: Validate fresh install steps (completed)
- E4:S17:T03: Validate migration/update paths (completed)
- E4:S17:T04: Verify post-install configuration and validation steps (completed)
- E4:S17:T06: Capture documentation gaps and improvements (next task)
Notes
- Failure modes identified through code review and error analysis
- Rollback procedures tested through documentation review
- Recovery best practices based on installer behavior and error handling
- RC readiness confirmed with comprehensive failure mode documentation