Files
Sankofa/docs/runbooks/PROXMOX_DISASTER_RECOVERY.md
defiQUG 9daf1fd378 Apply Composer changes: comprehensive API updates, migrations, middleware, and infrastructure improvements
- Add comprehensive database migrations (001-024) for schema evolution
- Enhance API schema with expanded type definitions and resolvers
- Add new middleware: audit logging, rate limiting, MFA enforcement, security, tenant auth
- Implement new services: AI optimization, billing, blockchain, compliance, marketplace
- Add adapter layer for cloud integrations (Cloudflare, Kubernetes, Proxmox, storage)
- Update Crossplane provider with enhanced VM management capabilities
- Add comprehensive test suite for API endpoints and services
- Update frontend components with improved GraphQL subscriptions and real-time updates
- Enhance security configurations and headers (CSP, CORS, etc.)
- Update documentation and configuration files
- Add new CI/CD workflows and validation scripts
- Implement design system improvements and UI enhancements
2025-12-12 18:01:35 -08:00

245 lines
5.3 KiB
Markdown

# Proxmox Disaster Recovery Procedures
## Overview
This document outlines disaster recovery procedures for Proxmox infrastructure managed by the Crossplane provider.
## Recovery Scenarios
### Scenario 1: Provider Pod Failure
#### Symptoms
- Provider pod not running
- VM operations failing
- ProviderConfig not working
#### Recovery Steps
1. **Check Pod Status**:
```bash
kubectl get pods -n crossplane-system -l app=crossplane-provider-proxmox
```
2. **Restart Provider**:
```bash
kubectl delete pod -n crossplane-system -l app=crossplane-provider-proxmox
```
3. **Verify Recovery**:
```bash
kubectl get pods -n crossplane-system -l app=crossplane-provider-proxmox
kubectl logs -n crossplane-system -l app=crossplane-provider-proxmox --tail=50
```
### Scenario 2: Proxmox Node Failure
#### Symptoms
- Cannot connect to Proxmox
- VMs unreachable
- Provider connection errors
#### Recovery Steps
1. **Verify Node Status**:
- Check Proxmox Web UI
- Verify node is online
- Check network connectivity
2. **Check ProviderConfig**:
```bash
kubectl get providerconfig proxmox-provider-config -o yaml
```
3. **Update Endpoint if Needed**:
- If node IP changed, update ProviderConfig
- If using hostname, verify DNS
4. **Test Connectivity**:
```bash
curl -k https://your-proxmox:8006/api2/json/version
```
### Scenario 3: Credential Compromise
#### Symptoms
- Authentication failures
- Security alerts
- Unauthorized access
#### Recovery Steps
1. **Revoke Compromised Credentials**:
- Log into Proxmox Web UI
- Revoke API tokens
- Change passwords
2. **Create New Credentials**:
- Create new API tokens
- Use strong passwords
- Set appropriate permissions
3. **Update Kubernetes Secret**:
```bash
kubectl delete secret proxmox-credentials -n crossplane-system
kubectl create secret generic proxmox-credentials \
--from-literal=credentials.json='{"username":"root@pam","token":"new-token"}' \
-n crossplane-system
```
4. **Restart Provider**:
```bash
kubectl delete pod -n crossplane-system -l app=crossplane-provider-proxmox
```
### Scenario 4: VM Data Loss
#### Symptoms
- VM not found
- Data missing
- Storage errors
#### Recovery Steps
1. **Check VM Status**:
```bash
kubectl get proxmoxvm <vm-name>
kubectl describe proxmoxvm <vm-name>
```
2. **Check Proxmox Backups**:
- Log into Proxmox Web UI
- Check backup storage
- Review backup schedule
3. **Restore from Backup**:
- Use Proxmox backup restore
- Or recreate VM from template
4. **Recreate VM Resource**:
```bash
# Delete existing resource
kubectl delete proxmoxvm <vm-name>
# Recreate with same configuration
kubectl apply -f <vm-manifest>.yaml
```
### Scenario 5: Complete Provider Failure
#### Symptoms
- Provider not responding
- All VM operations failing
- ProviderConfig errors
#### Recovery Steps
1. **Check Provider Deployment**:
```bash
kubectl get deployment -n crossplane-system crossplane-provider-proxmox
kubectl describe deployment -n crossplane-system crossplane-provider-proxmox
```
2. **Redeploy Provider**:
```bash
kubectl delete deployment -n crossplane-system crossplane-provider-proxmox
kubectl apply -f crossplane-provider-proxmox/config/provider.yaml
```
3. **Verify ProviderConfig**:
```bash
kubectl get providerconfig
kubectl describe providerconfig proxmox-provider-config
```
4. **Test VM Operations**:
```bash
kubectl get proxmoxvm
kubectl describe proxmoxvm <test-vm>
```
## Backup Procedures
### Provider Configuration Backup
```bash
# Backup ProviderConfig
kubectl get providerconfig proxmox-provider-config -o yaml > providerconfig-backup.yaml
# Backup credentials secret (be careful with this!)
kubectl get secret proxmox-credentials -n crossplane-system -o yaml > credentials-backup.yaml
```
### VM Configuration Backup
```bash
# Backup all VM resources
kubectl get proxmoxvm -o yaml > all-vms-backup.yaml
# Backup specific VM
kubectl get proxmoxvm <vm-name> -o yaml > <vm-name>-backup.yaml
```
### Proxmox Backup
1. **Configure Backup Schedule**:
- Log into Proxmox Web UI
- Go to Datacenter → Backup
- Configure backup schedule
2. **Manual Backup**:
- Select VM in Proxmox Web UI
- Click Backup
- Choose backup storage
- Start backup
## Recovery Testing
### Test Provider Recovery
1. **Simulate Failure**:
```bash
kubectl delete pod -n crossplane-system -l app=crossplane-provider-proxmox
```
2. **Verify Auto-Recovery**:
```bash
kubectl get pods -n crossplane-system -l app=crossplane-provider-proxmox
```
3. **Test VM Operations**:
```bash
kubectl get proxmoxvm
```
### Test VM Recovery
1. **Create Test VM**:
```bash
kubectl apply -f test-vm.yaml
```
2. **Delete VM**:
```bash
kubectl delete proxmoxvm test-vm
```
3. **Recreate VM**:
```bash
kubectl apply -f test-vm.yaml
```
## Prevention
1. **Regular Backups**: Schedule regular backups
2. **Monitoring**: Set up alerts for failures
3. **Documentation**: Keep procedures documented
4. **Testing**: Regularly test recovery procedures
5. **Redundancy**: Use multiple Proxmox nodes
## Related Documentation
- [VM Provisioning Runbook](./PROXMOX_VM_PROVISIONING.md)
- [Troubleshooting Guide](./PROXMOX_TROUBLESHOOTING.md)
- [Deployment Guide](../proxmox/DEPLOYMENT_GUIDE.md)