- Add comprehensive database migrations (001-024) for schema evolution - Enhance API schema with expanded type definitions and resolvers - Add new middleware: audit logging, rate limiting, MFA enforcement, security, tenant auth - Implement new services: AI optimization, billing, blockchain, compliance, marketplace - Add adapter layer for cloud integrations (Cloudflare, Kubernetes, Proxmox, storage) - Update Crossplane provider with enhanced VM management capabilities - Add comprehensive test suite for API endpoints and services - Update frontend components with improved GraphQL subscriptions and real-time updates - Enhance security configurations and headers (CSP, CORS, etc.) - Update documentation and configuration files - Add new CI/CD workflows and validation scripts - Implement design system improvements and UI enhancements
245 lines
5.3 KiB
Markdown
245 lines
5.3 KiB
Markdown
# Proxmox Disaster Recovery Procedures
|
|
|
|
## Overview
|
|
|
|
This document outlines disaster recovery procedures for Proxmox infrastructure managed by the Crossplane provider.
|
|
|
|
## Recovery Scenarios
|
|
|
|
### Scenario 1: Provider Pod Failure
|
|
|
|
#### Symptoms
|
|
- Provider pod not running
|
|
- VM operations failing
|
|
- ProviderConfig not working
|
|
|
|
#### Recovery Steps
|
|
|
|
1. **Check Pod Status**:
|
|
```bash
|
|
kubectl get pods -n crossplane-system -l app=crossplane-provider-proxmox
|
|
```
|
|
|
|
2. **Restart Provider**:
|
|
```bash
|
|
kubectl delete pod -n crossplane-system -l app=crossplane-provider-proxmox
|
|
```
|
|
|
|
3. **Verify Recovery**:
|
|
```bash
|
|
kubectl get pods -n crossplane-system -l app=crossplane-provider-proxmox
|
|
kubectl logs -n crossplane-system -l app=crossplane-provider-proxmox --tail=50
|
|
```
|
|
|
|
### Scenario 2: Proxmox Node Failure
|
|
|
|
#### Symptoms
|
|
- Cannot connect to Proxmox
|
|
- VMs unreachable
|
|
- Provider connection errors
|
|
|
|
#### Recovery Steps
|
|
|
|
1. **Verify Node Status**:
|
|
- Check Proxmox Web UI
|
|
- Verify node is online
|
|
- Check network connectivity
|
|
|
|
2. **Check ProviderConfig**:
|
|
```bash
|
|
kubectl get providerconfig proxmox-provider-config -o yaml
|
|
```
|
|
|
|
3. **Update Endpoint if Needed**:
|
|
- If node IP changed, update ProviderConfig
|
|
- If using hostname, verify DNS
|
|
|
|
4. **Test Connectivity**:
|
|
```bash
|
|
curl -k https://your-proxmox:8006/api2/json/version
|
|
```
|
|
|
|
### Scenario 3: Credential Compromise
|
|
|
|
#### Symptoms
|
|
- Authentication failures
|
|
- Security alerts
|
|
- Unauthorized access
|
|
|
|
#### Recovery Steps
|
|
|
|
1. **Revoke Compromised Credentials**:
|
|
- Log into Proxmox Web UI
|
|
- Revoke API tokens
|
|
- Change passwords
|
|
|
|
2. **Create New Credentials**:
|
|
- Create new API tokens
|
|
- Use strong passwords
|
|
- Set appropriate permissions
|
|
|
|
3. **Update Kubernetes Secret**:
|
|
```bash
|
|
kubectl delete secret proxmox-credentials -n crossplane-system
|
|
kubectl create secret generic proxmox-credentials \
|
|
--from-literal=credentials.json='{"username":"root@pam","token":"new-token"}' \
|
|
-n crossplane-system
|
|
```
|
|
|
|
4. **Restart Provider**:
|
|
```bash
|
|
kubectl delete pod -n crossplane-system -l app=crossplane-provider-proxmox
|
|
```
|
|
|
|
### Scenario 4: VM Data Loss
|
|
|
|
#### Symptoms
|
|
- VM not found
|
|
- Data missing
|
|
- Storage errors
|
|
|
|
#### Recovery Steps
|
|
|
|
1. **Check VM Status**:
|
|
```bash
|
|
kubectl get proxmoxvm <vm-name>
|
|
kubectl describe proxmoxvm <vm-name>
|
|
```
|
|
|
|
2. **Check Proxmox Backups**:
|
|
- Log into Proxmox Web UI
|
|
- Check backup storage
|
|
- Review backup schedule
|
|
|
|
3. **Restore from Backup**:
|
|
- Use Proxmox backup restore
|
|
- Or recreate VM from template
|
|
|
|
4. **Recreate VM Resource**:
|
|
```bash
|
|
# Delete existing resource
|
|
kubectl delete proxmoxvm <vm-name>
|
|
|
|
# Recreate with same configuration
|
|
kubectl apply -f <vm-manifest>.yaml
|
|
```
|
|
|
|
### Scenario 5: Complete Provider Failure
|
|
|
|
#### Symptoms
|
|
- Provider not responding
|
|
- All VM operations failing
|
|
- ProviderConfig errors
|
|
|
|
#### Recovery Steps
|
|
|
|
1. **Check Provider Deployment**:
|
|
```bash
|
|
kubectl get deployment -n crossplane-system crossplane-provider-proxmox
|
|
kubectl describe deployment -n crossplane-system crossplane-provider-proxmox
|
|
```
|
|
|
|
2. **Redeploy Provider**:
|
|
```bash
|
|
kubectl delete deployment -n crossplane-system crossplane-provider-proxmox
|
|
kubectl apply -f crossplane-provider-proxmox/config/provider.yaml
|
|
```
|
|
|
|
3. **Verify ProviderConfig**:
|
|
```bash
|
|
kubectl get providerconfig
|
|
kubectl describe providerconfig proxmox-provider-config
|
|
```
|
|
|
|
4. **Test VM Operations**:
|
|
```bash
|
|
kubectl get proxmoxvm
|
|
kubectl describe proxmoxvm <test-vm>
|
|
```
|
|
|
|
## Backup Procedures
|
|
|
|
### Provider Configuration Backup
|
|
|
|
```bash
|
|
# Backup ProviderConfig
|
|
kubectl get providerconfig proxmox-provider-config -o yaml > providerconfig-backup.yaml
|
|
|
|
# Backup credentials secret (be careful with this!)
|
|
kubectl get secret proxmox-credentials -n crossplane-system -o yaml > credentials-backup.yaml
|
|
```
|
|
|
|
### VM Configuration Backup
|
|
|
|
```bash
|
|
# Backup all VM resources
|
|
kubectl get proxmoxvm -o yaml > all-vms-backup.yaml
|
|
|
|
# Backup specific VM
|
|
kubectl get proxmoxvm <vm-name> -o yaml > <vm-name>-backup.yaml
|
|
```
|
|
|
|
### Proxmox Backup
|
|
|
|
1. **Configure Backup Schedule**:
|
|
- Log into Proxmox Web UI
|
|
- Go to Datacenter → Backup
|
|
- Configure backup schedule
|
|
|
|
2. **Manual Backup**:
|
|
- Select VM in Proxmox Web UI
|
|
- Click Backup
|
|
- Choose backup storage
|
|
- Start backup
|
|
|
|
## Recovery Testing
|
|
|
|
### Test Provider Recovery
|
|
|
|
1. **Simulate Failure**:
|
|
```bash
|
|
kubectl delete pod -n crossplane-system -l app=crossplane-provider-proxmox
|
|
```
|
|
|
|
2. **Verify Auto-Recovery**:
|
|
```bash
|
|
kubectl get pods -n crossplane-system -l app=crossplane-provider-proxmox
|
|
```
|
|
|
|
3. **Test VM Operations**:
|
|
```bash
|
|
kubectl get proxmoxvm
|
|
```
|
|
|
|
### Test VM Recovery
|
|
|
|
1. **Create Test VM**:
|
|
```bash
|
|
kubectl apply -f test-vm.yaml
|
|
```
|
|
|
|
2. **Delete VM**:
|
|
```bash
|
|
kubectl delete proxmoxvm test-vm
|
|
```
|
|
|
|
3. **Recreate VM**:
|
|
```bash
|
|
kubectl apply -f test-vm.yaml
|
|
```
|
|
|
|
## Prevention
|
|
|
|
1. **Regular Backups**: Schedule regular backups
|
|
2. **Monitoring**: Set up alerts for failures
|
|
3. **Documentation**: Keep procedures documented
|
|
4. **Testing**: Regularly test recovery procedures
|
|
5. **Redundancy**: Use multiple Proxmox nodes
|
|
|
|
## Related Documentation
|
|
|
|
- [VM Provisioning Runbook](./PROXMOX_VM_PROVISIONING.md)
|
|
- [Troubleshooting Guide](./PROXMOX_TROUBLESHOOTING.md)
|
|
- [Deployment Guide](../proxmox/DEPLOYMENT_GUIDE.md)
|
|
|