Apply Composer changes: comprehensive API updates, migrations, middleware, and infrastructure improvements
- Add comprehensive database migrations (001-024) for schema evolution - Enhance API schema with expanded type definitions and resolvers - Add new middleware: audit logging, rate limiting, MFA enforcement, security, tenant auth - Implement new services: AI optimization, billing, blockchain, compliance, marketplace - Add adapter layer for cloud integrations (Cloudflare, Kubernetes, Proxmox, storage) - Update Crossplane provider with enhanced VM management capabilities - Add comprehensive test suite for API endpoints and services - Update frontend components with improved GraphQL subscriptions and real-time updates - Enhance security configurations and headers (CSP, CORS, etc.) - Update documentation and configuration files - Add new CI/CD workflows and validation scripts - Implement design system improvements and UI enhancements
This commit is contained in:
229
infrastructure/proxmox/README.md
Normal file
229
infrastructure/proxmox/README.md
Normal file
@@ -0,0 +1,229 @@
|
||||
# Proxmox VE Management
|
||||
|
||||
Comprehensive management tools and integrations for Proxmox VE virtualization infrastructure.
|
||||
|
||||
## Overview
|
||||
|
||||
This directory contains management components for Proxmox VE clusters deployed across Sankofa Phoenix edge sites. It complements the existing Crossplane provider (`crossplane-provider-proxmox/`) with additional tooling for operations, monitoring, and automation.
|
||||
|
||||
## Components
|
||||
|
||||
### API Client (`api/`)
|
||||
|
||||
Proxmox API client utilities and helpers for:
|
||||
- Cluster operations
|
||||
- Storage management
|
||||
- Network configuration
|
||||
- Backup operations
|
||||
- Node management
|
||||
|
||||
### Terraform (`terraform/`)
|
||||
|
||||
Terraform modules for:
|
||||
- Proxmox cluster provisioning
|
||||
- Storage pool configuration
|
||||
- Network bridge setup
|
||||
- Resource pool management
|
||||
|
||||
### Ansible (`ansible/`)
|
||||
|
||||
Ansible roles and playbooks for:
|
||||
- Cluster deployment
|
||||
- Node configuration
|
||||
- Storage setup
|
||||
- Network configuration
|
||||
- Monitoring agent installation
|
||||
|
||||
### Scripts (`scripts/`)
|
||||
|
||||
Management scripts for:
|
||||
- Cluster health checks
|
||||
- Backup automation
|
||||
- Disaster recovery
|
||||
- Performance tuning
|
||||
- Maintenance operations
|
||||
|
||||
## Integration with Crossplane Provider
|
||||
|
||||
The Proxmox management components work alongside the Crossplane provider:
|
||||
|
||||
- **Crossplane Provider**: Declarative VM management via Kubernetes
|
||||
- **Management Tools**: Operational tasks, monitoring, and automation
|
||||
- **API Client**: Direct Proxmox API access for advanced operations
|
||||
|
||||
## Usage
|
||||
|
||||
### Cluster Setup
|
||||
|
||||
```bash
|
||||
# Setup a new Proxmox cluster
|
||||
./scripts/setup-cluster.sh \
|
||||
--site us-east-1 \
|
||||
--nodes pve1,pve2,pve3 \
|
||||
--storage local-lvm \
|
||||
--network vmbr0
|
||||
```
|
||||
|
||||
### Storage Management
|
||||
|
||||
```bash
|
||||
# Add storage pool
|
||||
./scripts/add-storage.sh \
|
||||
--pool ceph-storage \
|
||||
--type ceph \
|
||||
--nodes pve1,pve2,pve3
|
||||
```
|
||||
|
||||
### Network Configuration
|
||||
|
||||
```bash
|
||||
# Configure network bridge
|
||||
./scripts/configure-network.sh \
|
||||
--bridge vmbr1 \
|
||||
--vlan 100 \
|
||||
--nodes pve1,pve2,pve3
|
||||
```
|
||||
|
||||
### Ansible Deployment
|
||||
|
||||
```bash
|
||||
# Deploy Proxmox configuration
|
||||
cd ansible
|
||||
ansible-playbook -i inventory.yml site-deployment.yml \
|
||||
-e site=us-east-1 \
|
||||
-e nodes="pve1,pve2,pve3"
|
||||
```
|
||||
|
||||
### Terraform
|
||||
|
||||
```bash
|
||||
# Provision Proxmox infrastructure
|
||||
cd terraform
|
||||
terraform init
|
||||
terraform plan -var="site=us-east-1"
|
||||
terraform apply
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
### Site Configuration
|
||||
|
||||
Each Proxmox site requires configuration:
|
||||
|
||||
```yaml
|
||||
site: us-east-1
|
||||
nodes:
|
||||
- name: pve1
|
||||
ip: 10.1.0.10
|
||||
role: master
|
||||
- name: pve2
|
||||
ip: 10.1.0.11
|
||||
role: worker
|
||||
- name: pve3
|
||||
ip: 10.1.0.12
|
||||
role: worker
|
||||
storage:
|
||||
pools:
|
||||
- name: local-lvm
|
||||
type: lvm
|
||||
- name: ceph-storage
|
||||
type: ceph
|
||||
networks:
|
||||
bridges:
|
||||
- name: vmbr0
|
||||
type: bridge
|
||||
vlan: untagged
|
||||
- name: vmbr1
|
||||
type: bridge
|
||||
vlan: 100
|
||||
```
|
||||
|
||||
### API Authentication
|
||||
|
||||
Proxmox API authentication via tokens:
|
||||
|
||||
```bash
|
||||
# Create API token
|
||||
export PROXMOX_API_URL=https://pve1.sankofa.nexus:8006
|
||||
export PROXMOX_API_TOKEN=root@pam!token-name=abc123def456
|
||||
```
|
||||
|
||||
## Monitoring
|
||||
|
||||
Proxmox monitoring integrates with the Prometheus stack:
|
||||
|
||||
- **pve_exporter**: Prometheus metrics exporter
|
||||
- **Grafana Dashboards**: Pre-built dashboards for Proxmox
|
||||
- **Alerts**: Alert rules for cluster health
|
||||
|
||||
See [Monitoring](../monitoring/README.md) for details.
|
||||
|
||||
## Backup and Recovery
|
||||
|
||||
### Automated Backups
|
||||
|
||||
```bash
|
||||
# Configure backup schedule
|
||||
./scripts/configure-backups.sh \
|
||||
--schedule "0 2 * * *" \
|
||||
--retention 30 \
|
||||
--storage backup-storage
|
||||
```
|
||||
|
||||
### Disaster Recovery
|
||||
|
||||
```bash
|
||||
# Restore from backup
|
||||
./scripts/restore-backup.sh \
|
||||
--backup backup-20240101 \
|
||||
--target pve1
|
||||
```
|
||||
|
||||
## Multi-Site Management
|
||||
|
||||
For managing multiple Proxmox sites:
|
||||
|
||||
```bash
|
||||
# List all sites
|
||||
./scripts/list-sites.sh
|
||||
|
||||
# Get site status
|
||||
./scripts/site-status.sh --site us-east-1
|
||||
|
||||
# Sync configuration across sites
|
||||
./scripts/sync-config.sh --sites us-east-1,eu-west-1
|
||||
```
|
||||
|
||||
## Security
|
||||
|
||||
- API tokens with least privilege
|
||||
- TLS/SSL for all API communications
|
||||
- Network isolation via VLANs
|
||||
- Regular security updates
|
||||
- Audit logging
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
**Cluster split-brain:**
|
||||
```bash
|
||||
./scripts/fix-split-brain.sh --site us-east-1
|
||||
```
|
||||
|
||||
**Storage issues:**
|
||||
```bash
|
||||
./scripts/diagnose-storage.sh --pool local-lvm
|
||||
```
|
||||
|
||||
**Network connectivity:**
|
||||
```bash
|
||||
./scripts/test-network.sh --node pve1
|
||||
```
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [Crossplane Provider](../../crossplane-provider-proxmox/README.md)
|
||||
- [System Architecture](../../docs/system_architecture.md)
|
||||
- [Deployment Scripts](../../scripts/README.md)
|
||||
|
||||
135
infrastructure/proxmox/scripts/cluster-health.sh
Executable file
135
infrastructure/proxmox/scripts/cluster-health.sh
Executable file
@@ -0,0 +1,135 @@
|
||||
#!/bin/bash
|
||||
set -euo pipefail
|
||||
|
||||
# Proxmox Cluster Health Check Script
|
||||
|
||||
SITE="${SITE:-}"
|
||||
NODE="${NODE:-}"
|
||||
|
||||
log() {
|
||||
echo "[$(date +'%Y-%m-%d %H:%M:%S')] $*" >&2
|
||||
}
|
||||
|
||||
error() {
|
||||
log "ERROR: $*"
|
||||
exit 1
|
||||
}
|
||||
|
||||
check_node() {
|
||||
local node=$1
|
||||
log "Checking node: ${node}..."
|
||||
|
||||
if ! command -v pvesh &> /dev/null; then
|
||||
error "pvesh not found. This script must be run on a Proxmox node."
|
||||
fi
|
||||
|
||||
# Check node status
|
||||
STATUS=$(pvesh get /nodes/${node}/status --output-format json 2>/dev/null || echo "{}")
|
||||
|
||||
if [ -z "${STATUS}" ] || [ "${STATUS}" = "{}" ]; then
|
||||
log " ❌ Node ${node} is unreachable"
|
||||
return 1
|
||||
fi
|
||||
|
||||
# Parse status
|
||||
UPTIME=$(echo "${STATUS}" | grep -o '"uptime":[0-9]*' | cut -d':' -f2)
|
||||
CPU=$(echo "${STATUS}" | grep -o '"cpu":[0-9.]*' | cut -d':' -f2)
|
||||
MEMORY_TOTAL=$(echo "${STATUS}" | grep -o '"memory_total":[0-9]*' | cut -d':' -f2)
|
||||
MEMORY_USED=$(echo "${STATUS}" | grep -o '"memory_used":[0-9]*' | cut -d':' -f2)
|
||||
|
||||
if [ -n "${UPTIME}" ]; then
|
||||
log " ✅ Node ${node} is online"
|
||||
log " Uptime: ${UPTIME} seconds"
|
||||
log " CPU: ${CPU}%"
|
||||
if [ -n "${MEMORY_TOTAL}" ] && [ -n "${MEMORY_USED}" ]; then
|
||||
MEMORY_PERCENT=$((MEMORY_USED * 100 / MEMORY_TOTAL))
|
||||
log " Memory: ${MEMORY_PERCENT}% used (${MEMORY_USED}/${MEMORY_TOTAL} bytes)"
|
||||
fi
|
||||
return 0
|
||||
else
|
||||
log " ❌ Node ${node} status unknown"
|
||||
return 1
|
||||
fi
|
||||
}
|
||||
|
||||
check_cluster() {
|
||||
log "Checking cluster status..."
|
||||
|
||||
# Get cluster nodes
|
||||
NODES=$(pvesh get /nodes --output-format json 2>/dev/null | grep -o '"node":"[^"]*' | cut -d'"' -f4 || echo "")
|
||||
|
||||
if [ -z "${NODES}" ]; then
|
||||
error "Cannot retrieve cluster nodes"
|
||||
fi
|
||||
|
||||
log "Found nodes: ${NODES}"
|
||||
|
||||
local all_healthy=true
|
||||
for node in ${NODES}; do
|
||||
if ! check_node "${node}"; then
|
||||
all_healthy=false
|
||||
fi
|
||||
done
|
||||
|
||||
if [ "${all_healthy}" = "true" ]; then
|
||||
log "✅ All nodes are healthy"
|
||||
return 0
|
||||
else
|
||||
log "❌ Some nodes are unhealthy"
|
||||
return 1
|
||||
fi
|
||||
}
|
||||
|
||||
check_storage() {
|
||||
log "Checking storage pools..."
|
||||
|
||||
STORAGE=$(pvesh get /storage --output-format json 2>/dev/null || echo "[]")
|
||||
|
||||
if [ -z "${STORAGE}" ] || [ "${STORAGE}" = "[]" ]; then
|
||||
log " ⚠️ No storage pools found"
|
||||
return 0
|
||||
fi
|
||||
|
||||
# Parse storage (simplified)
|
||||
log " Storage pools configured"
|
||||
return 0
|
||||
}
|
||||
|
||||
check_vms() {
|
||||
log "Checking virtual machines..."
|
||||
|
||||
# Get all VMs
|
||||
VMS=$(pvesh get /nodes --output-format json 2>/dev/null | grep -o '"vmid":[0-9]*' | cut -d':' -f2 | sort -u || echo "")
|
||||
|
||||
if [ -z "${VMS}" ]; then
|
||||
log " No VMs found"
|
||||
return 0
|
||||
fi
|
||||
|
||||
VM_COUNT=$(echo "${VMS}" | wc -l)
|
||||
log " Found ${VM_COUNT} virtual machines"
|
||||
|
||||
return 0
|
||||
}
|
||||
|
||||
main() {
|
||||
log "Starting Proxmox cluster health check..."
|
||||
|
||||
if [ -n "${NODE}" ]; then
|
||||
check_node "${NODE}"
|
||||
elif [ -n "${SITE}" ]; then
|
||||
log "Checking site: ${SITE}"
|
||||
check_cluster
|
||||
check_storage
|
||||
check_vms
|
||||
else
|
||||
check_cluster
|
||||
check_storage
|
||||
check_vms
|
||||
fi
|
||||
|
||||
log "Health check completed!"
|
||||
}
|
||||
|
||||
main "$@"
|
||||
|
||||
Reference in New Issue
Block a user