Initial commit: loc_az_hci (smom-dbis-138 excluded via .gitignore)
Some checks failed
Test / test (push) Has been cancelled
Some checks failed
Test / test (push) Has been cancelled
Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
211
docs/operations/guest-agent-setup.md
Normal file
211
docs/operations/guest-agent-setup.md
Normal file
@@ -0,0 +1,211 @@
|
||||
# QEMU Guest Agent Setup Guide
|
||||
|
||||
## Overview
|
||||
|
||||
QEMU Guest Agent provides better integration between Proxmox and VMs, enabling:
|
||||
- **Proper VM shutdown/reboot** from Proxmox Web UI
|
||||
- **Automatic IP address detection** in Proxmox
|
||||
- **Better VM status reporting** (CPU, memory, disk usage)
|
||||
- **File system information** and operations
|
||||
- **Time synchronization** between host and guest
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- VMs must have Ubuntu installed and be reachable via SSH
|
||||
- SSH key access configured
|
||||
- VMs must be running
|
||||
|
||||
## Quick Setup
|
||||
|
||||
### Automated Setup (Recommended)
|
||||
|
||||
```bash
|
||||
# Set SSH key (if different from default)
|
||||
export SSH_KEY="~/.ssh/id_rsa"
|
||||
export SSH_USER="ubuntu"
|
||||
|
||||
# Run setup script
|
||||
./scripts/setup-guest-agent.sh
|
||||
```
|
||||
|
||||
This script will:
|
||||
1. Install `qemu-guest-agent` on each VM
|
||||
2. Enable and start the service
|
||||
3. Enable agent in Proxmox VM configuration
|
||||
4. Verify agent is working
|
||||
|
||||
## Manual Setup
|
||||
|
||||
### Step 1: Install Guest Agent on VM
|
||||
|
||||
SSH to each VM and run:
|
||||
|
||||
```bash
|
||||
sudo apt-get update
|
||||
sudo apt-get install -y qemu-guest-agent
|
||||
sudo systemctl enable qemu-guest-agent
|
||||
sudo systemctl start qemu-guest-agent
|
||||
sudo systemctl status qemu-guest-agent
|
||||
```
|
||||
|
||||
### Step 2: Enable Agent in Proxmox
|
||||
|
||||
For each VM in Proxmox Web UI:
|
||||
|
||||
1. **Stop the VM** (if running)
|
||||
2. **Go to:** VM → **Options** tab
|
||||
3. **Find:** "QEMU Guest Agent"
|
||||
4. **Click:** "Edit"
|
||||
5. **Enable:** Check "Use QEMU Guest Agent"
|
||||
6. **Click:** "OK"
|
||||
7. **Start the VM**
|
||||
|
||||
### Step 3: Verify Agent is Working
|
||||
|
||||
In Proxmox Web UI:
|
||||
|
||||
1. **Go to:** VM → **Monitor** tab
|
||||
2. **Look for:** "QEMU Guest Agent" section
|
||||
3. **Check:** Agent status should show as active
|
||||
|
||||
Or via command line:
|
||||
|
||||
```bash
|
||||
# Check agent status via Proxmox API
|
||||
curl -k -s -H "Cookie: PVEAuthCookie=<ticket>" \
|
||||
"https://192.168.1.206:8006/api2/json/nodes/pve/qemu/100/agent/get-fsinfo"
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Agent Not Responding
|
||||
|
||||
**Symptoms:**
|
||||
- Proxmox shows "Guest Agent not running"
|
||||
- Cannot get VM IP address
|
||||
- Cannot shutdown VM from Proxmox
|
||||
|
||||
**Solution:**
|
||||
|
||||
1. **Check agent is installed:**
|
||||
```bash
|
||||
ssh ubuntu@<VM_IP>
|
||||
sudo systemctl status qemu-guest-agent
|
||||
```
|
||||
|
||||
2. **Restart agent:**
|
||||
```bash
|
||||
sudo systemctl restart qemu-guest-agent
|
||||
```
|
||||
|
||||
3. **Check logs:**
|
||||
```bash
|
||||
sudo journalctl -u qemu-guest-agent -f
|
||||
```
|
||||
|
||||
4. **Reinstall agent:**
|
||||
```bash
|
||||
sudo apt-get install --reinstall qemu-guest-agent
|
||||
sudo systemctl restart qemu-guest-agent
|
||||
```
|
||||
|
||||
5. **Use fix script:**
|
||||
```bash
|
||||
./scripts/fix-guest-agent.sh
|
||||
```
|
||||
|
||||
### Agent Not Enabled in Proxmox
|
||||
|
||||
**Symptoms:**
|
||||
- Agent installed on VM but not working
|
||||
- Proxmox doesn't detect agent
|
||||
|
||||
**Solution:**
|
||||
|
||||
1. **Stop VM**
|
||||
2. **Enable agent in Proxmox:**
|
||||
- Options → QEMU Guest Agent → Enable
|
||||
3. **Start VM**
|
||||
4. **Wait 1-2 minutes** for agent to initialize
|
||||
|
||||
### Agent Takes Time to Initialize
|
||||
|
||||
**Note:** After enabling the agent, it may take 1-2 minutes to fully initialize and start responding to Proxmox queries. This is normal.
|
||||
|
||||
**Check status:**
|
||||
```bash
|
||||
# On VM
|
||||
sudo systemctl status qemu-guest-agent
|
||||
|
||||
# Should show: Active: active (running)
|
||||
```
|
||||
|
||||
## Verification
|
||||
|
||||
### Check Agent Status on VM
|
||||
|
||||
```bash
|
||||
ssh ubuntu@<VM_IP>
|
||||
sudo systemctl status qemu-guest-agent
|
||||
```
|
||||
|
||||
**Expected output:**
|
||||
```
|
||||
● qemu-guest-agent.service - QEMU Guest Agent
|
||||
Loaded: loaded (/lib/systemd/system/qemu-guest-agent.service; enabled)
|
||||
Active: active (running) since ...
|
||||
```
|
||||
|
||||
### Check Agent in Proxmox
|
||||
|
||||
**Web UI:**
|
||||
- VM → Monitor → QEMU Guest Agent
|
||||
- Should show agent information
|
||||
|
||||
**API:**
|
||||
```bash
|
||||
# Get filesystem info (requires authentication)
|
||||
curl -k -s -H "Cookie: PVEAuthCookie=<ticket>" \
|
||||
"https://192.168.1.206:8006/api2/json/nodes/pve/qemu/100/agent/get-fsinfo"
|
||||
```
|
||||
|
||||
## Benefits After Setup
|
||||
|
||||
Once guest agent is working:
|
||||
|
||||
1. **VM Shutdown/Reboot:**
|
||||
- Can properly shutdown/reboot VMs from Proxmox
|
||||
- No need to force stop
|
||||
|
||||
2. **IP Address Detection:**
|
||||
- Proxmox automatically detects VM IP addresses
|
||||
- Shows in VM summary
|
||||
|
||||
3. **Resource Monitoring:**
|
||||
- Better CPU, memory, disk usage reporting
|
||||
- More accurate VM statistics
|
||||
|
||||
4. **File Operations:**
|
||||
- Can execute commands in VM from Proxmox
|
||||
- File system information available
|
||||
|
||||
## Scripts Reference
|
||||
|
||||
- `scripts/setup-guest-agent.sh` - Install and configure guest agent
|
||||
- `scripts/fix-guest-agent.sh` - Fix guest agent issues
|
||||
|
||||
## When to Run
|
||||
|
||||
Run guest agent setup **after**:
|
||||
- ✅ Ubuntu installation is complete on all VMs
|
||||
- ✅ VMs are reachable via SSH
|
||||
- ✅ Install scripts have been applied (optional, can run before)
|
||||
|
||||
## Summary
|
||||
|
||||
1. **Install agent:** `./scripts/setup-guest-agent.sh`
|
||||
2. **Verify:** Check Proxmox Web UI → VM → Monitor
|
||||
3. **Fix if needed:** `./scripts/fix-guest-agent.sh`
|
||||
|
||||
Guest agent setup should be done after all VMs are installed and configured, as it requires SSH access to the VMs.
|
||||
|
||||
121
docs/operations/proxmox-ubuntu-images.md
Normal file
121
docs/operations/proxmox-ubuntu-images.md
Normal file
@@ -0,0 +1,121 @@
|
||||
# Ubuntu Images for Proxmox VE
|
||||
|
||||
## Standard Ubuntu ISO (What You're Using Now)
|
||||
|
||||
✅ **The Ubuntu ISO from Ubuntu's website is correct!**
|
||||
|
||||
- **Source**: https://ubuntu.com/download/server
|
||||
- **Format**: `.iso` file
|
||||
- **Use Case**: Manual installation, full control over installation process
|
||||
- **Current Status**: ✅ Working - your VMs are booting from it
|
||||
|
||||
**There is NO Proxmox-specific Ubuntu ISO.** Proxmox VE uses standard operating system ISOs from their official sources.
|
||||
|
||||
## Cloud-Init Templates (Faster Alternative)
|
||||
|
||||
For faster, automated deployments, Proxmox supports **Cloud-Init templates** (pre-configured qcow2 images).
|
||||
|
||||
### What Are Cloud-Init Templates?
|
||||
|
||||
- **Pre-installed** Ubuntu images with Cloud-Init support
|
||||
- **Ready to clone** - no installation needed
|
||||
- **Automated configuration** via Cloud-Init (IP, SSH keys, user data)
|
||||
- **Faster deployment** - clone and configure, no OS installation
|
||||
|
||||
### Where to Get Cloud-Init Templates
|
||||
|
||||
#### Option 1: Download Official Ubuntu Cloud Images
|
||||
|
||||
Ubuntu provides official Cloud-Init images:
|
||||
|
||||
```bash
|
||||
# Ubuntu 24.04 LTS Cloud Image
|
||||
wget https://cloud-images.ubuntu.com/releases/24.04/release/ubuntu-24.04-server-cloudimg-amd64.img
|
||||
|
||||
# Ubuntu 22.04 LTS Cloud Image
|
||||
wget https://cloud-images.ubuntu.com/releases/22.04/release/ubuntu-22.04-server-cloudimg-amd64.img
|
||||
```
|
||||
|
||||
#### Option 2: Create Template from ISO
|
||||
|
||||
You can create a Cloud-Init template from the ISO you already have:
|
||||
|
||||
1. Install Ubuntu from ISO
|
||||
2. Install Cloud-Init: `sudo apt install cloud-init`
|
||||
3. Configure Cloud-Init
|
||||
4. Convert VM to template in Proxmox
|
||||
|
||||
### How to Use Cloud-Init Templates
|
||||
|
||||
1. **Download/Upload Template**
|
||||
- Download Ubuntu Cloud Image
|
||||
- Upload to Proxmox storage
|
||||
- Convert to template
|
||||
|
||||
2. **Create VM from Template**
|
||||
- Clone template (instant, no installation)
|
||||
- Configure Cloud-Init settings:
|
||||
- IP address
|
||||
- SSH keys
|
||||
- User data scripts
|
||||
- Start VM - it's ready!
|
||||
|
||||
3. **Benefits**
|
||||
- ⚡ **Instant deployment** (no OS installation)
|
||||
- 🔧 **Automated configuration** via Cloud-Init
|
||||
- 📦 **Consistent base images**
|
||||
- 🚀 **Perfect for automation** (Terraform, scripts)
|
||||
|
||||
## Comparison: ISO vs Cloud-Init Template
|
||||
|
||||
| Feature | ISO Image | Cloud-Init Template |
|
||||
|---------|-----------|---------------------|
|
||||
| **Installation** | Manual (15-30 min) | Instant clone |
|
||||
| **Configuration** | Manual | Automated via Cloud-Init |
|
||||
| **Flexibility** | Full control | Pre-configured |
|
||||
| **Automation** | Limited | Excellent |
|
||||
| **Use Case** | One-off VMs | Production, automation |
|
||||
|
||||
## Recommendation
|
||||
|
||||
### Use ISO (Current Method) When:
|
||||
- ✅ Installing first time (learning)
|
||||
- ✅ Need full control over installation
|
||||
- ✅ Custom partitioning required
|
||||
- ✅ One-off VMs
|
||||
|
||||
### Use Cloud-Init Template When:
|
||||
- ✅ Deploying multiple VMs
|
||||
- ✅ Automation (Terraform, scripts)
|
||||
- ✅ Consistent base images
|
||||
- ✅ Production deployments
|
||||
|
||||
## Your Current Setup
|
||||
|
||||
You're using the **correct approach** for initial setup:
|
||||
- ✅ Standard Ubuntu ISO from Ubuntu website
|
||||
- ✅ Manual installation gives you full control
|
||||
- ✅ Once installed, you can convert to template for future use
|
||||
|
||||
## Next Steps (Optional)
|
||||
|
||||
If you want to create Cloud-Init templates for faster future deployments:
|
||||
|
||||
1. **After installing Ubuntu on your VMs:**
|
||||
- Install Cloud-Init: `sudo apt install cloud-init`
|
||||
- Configure as needed
|
||||
- Convert VM to template in Proxmox
|
||||
|
||||
2. **Or download official Cloud Image:**
|
||||
- Download Ubuntu Cloud Image
|
||||
- Upload to Proxmox
|
||||
- Convert to template
|
||||
- Use for future VMs
|
||||
|
||||
## Summary
|
||||
|
||||
- ✅ **Your Ubuntu ISO is correct** - no Proxmox-specific ISO exists
|
||||
- ✅ **Standard Ubuntu Server ISO** from Ubuntu website is the right choice
|
||||
- 💡 **Cloud-Init templates** are an optional optimization for automation
|
||||
- 🎯 **Current method is fine** - continue with ISO installation
|
||||
|
||||
237
docs/operations/runbooks/azure-arc-troubleshooting.md
Normal file
237
docs/operations/runbooks/azure-arc-troubleshooting.md
Normal file
@@ -0,0 +1,237 @@
|
||||
# Azure Arc Troubleshooting Runbook
|
||||
|
||||
## Common Issues and Solutions
|
||||
|
||||
### Agent Connection Issues
|
||||
|
||||
#### Check Agent Status
|
||||
|
||||
```bash
|
||||
# Check agent status
|
||||
azcmagent show
|
||||
|
||||
# Check agent version
|
||||
azcmagent version
|
||||
|
||||
# View agent logs
|
||||
journalctl -u azcmagent -f
|
||||
```
|
||||
|
||||
#### Agent Not Connecting
|
||||
|
||||
**Symptoms**: Agent shows as "Disconnected" in Azure Portal
|
||||
|
||||
**Solutions**:
|
||||
|
||||
1. Check network connectivity:
|
||||
```bash
|
||||
# Test Azure connectivity
|
||||
curl -v https://management.azure.com
|
||||
```
|
||||
|
||||
2. Verify credentials:
|
||||
```bash
|
||||
# Reconnect with credentials
|
||||
azcmagent disconnect --force-local-only
|
||||
azcmagent connect \
|
||||
--resource-group HC-Stack \
|
||||
--tenant-id <tenant-id> \
|
||||
--location eastus \
|
||||
--subscription-id <subscription-id>
|
||||
```
|
||||
|
||||
3. Check firewall rules:
|
||||
```bash
|
||||
# Ensure outbound HTTPS (443) is allowed
|
||||
ufw status
|
||||
```
|
||||
|
||||
#### Agent Installation Issues
|
||||
|
||||
**Symptoms**: Agent installation fails
|
||||
|
||||
**Solutions**:
|
||||
|
||||
1. Check prerequisites:
|
||||
```bash
|
||||
# Verify system requirements
|
||||
uname -m # Should be x86_64 or arm64
|
||||
cat /etc/os-release
|
||||
```
|
||||
|
||||
2. Manual installation:
|
||||
```bash
|
||||
wget https://aka.ms/azcmagent -O install_linux_azcmagent.sh
|
||||
chmod +x install_linux_azcmagent.sh
|
||||
./install_linux_azcmagent.sh
|
||||
```
|
||||
|
||||
### Kubernetes Arc Issues
|
||||
|
||||
#### Cluster Not Appearing in Azure
|
||||
|
||||
**Symptoms**: Cluster not visible in Azure Portal
|
||||
|
||||
**Solutions**:
|
||||
|
||||
1. Verify cluster connection:
|
||||
```bash
|
||||
az arc kubernetes show \
|
||||
--resource-group HC-Stack \
|
||||
--name proxmox-k3s-cluster
|
||||
```
|
||||
|
||||
2. Check connectivity:
|
||||
```bash
|
||||
kubectl cluster-info
|
||||
kubectl get nodes
|
||||
```
|
||||
|
||||
3. Re-onboard cluster:
|
||||
```bash
|
||||
az connectedk8s connect \
|
||||
--resource-group HC-Stack \
|
||||
--name proxmox-k3s-cluster \
|
||||
--location eastus
|
||||
```
|
||||
|
||||
#### GitOps Not Syncing
|
||||
|
||||
**Symptoms**: Changes in Git not reflected in cluster
|
||||
|
||||
**Solutions**:
|
||||
|
||||
1. Check Flux status:
|
||||
```bash
|
||||
kubectl get pods -n flux-system
|
||||
kubectl logs -n flux-system -l app=flux
|
||||
```
|
||||
|
||||
2. Verify Git repository access:
|
||||
```bash
|
||||
# Check GitOps source
|
||||
kubectl get gitrepository -n flux-system
|
||||
kubectl describe gitrepository -n flux-system
|
||||
```
|
||||
|
||||
3. Check GitOps configuration in Azure:
|
||||
```bash
|
||||
az k8s-extension show \
|
||||
--resource-group HC-Stack \
|
||||
--cluster-name proxmox-k3s-cluster \
|
||||
--cluster-type connectedClusters \
|
||||
--name flux
|
||||
```
|
||||
|
||||
### Resource Bridge Issues
|
||||
|
||||
#### Resource Bridge Not Working
|
||||
|
||||
**Symptoms**: Cannot manage VMs from Azure Portal
|
||||
|
||||
**Solutions**:
|
||||
|
||||
1. Verify custom location:
|
||||
```bash
|
||||
az customlocation show \
|
||||
--resource-group HC-Stack \
|
||||
--name proxmox-k3s-cluster-location
|
||||
```
|
||||
|
||||
2. Check Resource Bridge pods:
|
||||
```bash
|
||||
kubectl get pods -n arc-resource-bridge
|
||||
kubectl logs -n arc-resource-bridge -l app=resource-bridge
|
||||
```
|
||||
|
||||
### Policy and Compliance Issues
|
||||
|
||||
#### Policies Not Applying
|
||||
|
||||
**Symptoms**: Azure Policy not enforcing on Arc resources
|
||||
|
||||
**Solutions**:
|
||||
|
||||
1. Check policy assignment:
|
||||
```bash
|
||||
az policy assignment list \
|
||||
--scope /subscriptions/<subscription-id>/resourceGroups/HC-Stack
|
||||
```
|
||||
|
||||
2. Verify agent compliance:
|
||||
```bash
|
||||
az connectedmachine show \
|
||||
--resource-group HC-Stack \
|
||||
--name <machine-name> \
|
||||
--query "status"
|
||||
```
|
||||
|
||||
### Monitoring Issues
|
||||
|
||||
#### Metrics Not Appearing
|
||||
|
||||
**Symptoms**: No metrics in Azure Monitor
|
||||
|
||||
**Solutions**:
|
||||
|
||||
1. Check agent extensions:
|
||||
```bash
|
||||
az connectedmachine extension list \
|
||||
--resource-group HC-Stack \
|
||||
--machine-name <machine-name>
|
||||
```
|
||||
|
||||
2. Verify Log Analytics workspace:
|
||||
```bash
|
||||
az monitor log-analytics workspace show \
|
||||
--resource-group HC-Stack \
|
||||
--workspace-name <workspace-name>
|
||||
```
|
||||
|
||||
### Common Commands
|
||||
|
||||
#### View All Arc Resources
|
||||
|
||||
```bash
|
||||
# List all Arc-enabled servers
|
||||
az connectedmachine list --resource-group HC-Stack -o table
|
||||
|
||||
# List all Arc-enabled Kubernetes clusters
|
||||
az arc kubernetes list --resource-group HC-Stack -o table
|
||||
```
|
||||
|
||||
#### Check Agent Health
|
||||
|
||||
```bash
|
||||
# Agent status
|
||||
azcmagent show
|
||||
|
||||
# Agent logs
|
||||
journalctl -u azcmagent --since "1 hour ago"
|
||||
```
|
||||
|
||||
#### Reconnect Resources
|
||||
|
||||
```bash
|
||||
# Reconnect server
|
||||
azcmagent disconnect --force-local-only
|
||||
azcmagent connect --resource-group HC-Stack --tenant-id <id> --location eastus --subscription-id <id>
|
||||
|
||||
# Reconnect Kubernetes
|
||||
az connectedk8s disconnect --resource-group HC-Stack --name <cluster-name> --yes
|
||||
az connectedk8s connect --resource-group HC-Stack --name <cluster-name> --location eastus
|
||||
```
|
||||
|
||||
### Log Locations
|
||||
|
||||
- **Agent logs**: `/var/opt/azcmagent/log/`
|
||||
- **System logs**: `journalctl -u azcmagent`
|
||||
- **Kubernetes logs**: `kubectl logs -n azure-arc`
|
||||
- **GitOps logs**: `kubectl logs -n flux-system`
|
||||
|
||||
### Support Resources
|
||||
|
||||
- Azure Arc documentation: https://docs.microsoft.com/azure/azure-arc
|
||||
- Troubleshooting guide: https://docs.microsoft.com/azure/azure-arc/servers/troubleshooting
|
||||
- GitHub issues: https://github.com/microsoft/azure_arc/issues
|
||||
|
||||
321
docs/operations/runbooks/gitops-workflow.md
Normal file
321
docs/operations/runbooks/gitops-workflow.md
Normal file
@@ -0,0 +1,321 @@
|
||||
# GitOps Workflow Runbook
|
||||
|
||||
## Overview
|
||||
|
||||
This runbook describes the GitOps workflow using Flux for managing Kubernetes deployments.
|
||||
|
||||
## GitOps Architecture
|
||||
|
||||
```
|
||||
Git Repository (Gitea/GitLab)
|
||||
│
|
||||
│ (Poll/Sync)
|
||||
│
|
||||
▼
|
||||
Flux Controller (Kubernetes)
|
||||
│
|
||||
│ (Apply)
|
||||
│
|
||||
▼
|
||||
Kubernetes Cluster
|
||||
│
|
||||
│ (Deploy)
|
||||
│
|
||||
▼
|
||||
Application Pods
|
||||
```
|
||||
|
||||
## Workflow
|
||||
|
||||
### 1. Making Changes
|
||||
|
||||
#### Update Application Configuration
|
||||
|
||||
1. Clone Git repository:
|
||||
```bash
|
||||
git clone http://git.local:3000/user/gitops-repo.git
|
||||
cd gitops-repo
|
||||
```
|
||||
|
||||
2. Edit Helm chart values:
|
||||
```bash
|
||||
# Edit values.yaml
|
||||
vim gitops/apps/besu/values.yaml
|
||||
```
|
||||
|
||||
3. Commit and push:
|
||||
```bash
|
||||
git add gitops/apps/besu/values.yaml
|
||||
git commit -m "Update Besu configuration"
|
||||
git push origin main
|
||||
```
|
||||
|
||||
#### Add New Application
|
||||
|
||||
1. Add Helm chart to repository:
|
||||
```bash
|
||||
cp -r /path/to/new-chart gitops/apps/new-app/
|
||||
```
|
||||
|
||||
2. Create Flux Kustomization:
|
||||
```bash
|
||||
# Create gitops/apps/new-app/kustomization.yaml
|
||||
apiVersion: kustomize.toolkit.fluxcd.io/v1
|
||||
kind: Kustomization
|
||||
metadata:
|
||||
name: new-app
|
||||
namespace: flux-system
|
||||
spec:
|
||||
interval: 10m
|
||||
path: ./apps/new-app
|
||||
prune: true
|
||||
sourceRef:
|
||||
kind: GitRepository
|
||||
name: flux-system
|
||||
```
|
||||
|
||||
3. Commit and push:
|
||||
```bash
|
||||
git add gitops/apps/new-app/
|
||||
git commit -m "Add new application"
|
||||
git push origin main
|
||||
```
|
||||
|
||||
### 2. Monitoring Sync Status
|
||||
|
||||
#### Check Flux Status
|
||||
|
||||
```bash
|
||||
# Check Flux pods
|
||||
kubectl get pods -n flux-system
|
||||
|
||||
# Check Git repository status
|
||||
kubectl get gitrepository -n flux-system
|
||||
kubectl describe gitrepository flux-system -n flux-system
|
||||
|
||||
# Check Kustomization status
|
||||
kubectl get kustomization -n flux-system
|
||||
kubectl describe kustomization <app-name> -n flux-system
|
||||
```
|
||||
|
||||
#### View Sync Events
|
||||
|
||||
```bash
|
||||
# Watch Flux events
|
||||
kubectl get events -n flux-system --sort-by='.lastTimestamp'
|
||||
|
||||
# View Flux logs
|
||||
kubectl logs -n flux-system -l app=flux -f
|
||||
```
|
||||
|
||||
### 3. Troubleshooting
|
||||
|
||||
#### Sync Not Happening
|
||||
|
||||
**Check Git repository access**:
|
||||
```bash
|
||||
kubectl get gitrepository flux-system -n flux-system -o yaml
|
||||
kubectl describe gitrepository flux-system -n flux-system
|
||||
```
|
||||
|
||||
**Check authentication**:
|
||||
```bash
|
||||
# For HTTPS with token
|
||||
kubectl get secret -n flux-system
|
||||
|
||||
# For SSH
|
||||
kubectl get secret flux-system -n flux-system -o yaml
|
||||
```
|
||||
|
||||
#### Application Not Deploying
|
||||
|
||||
**Check Kustomization**:
|
||||
```bash
|
||||
kubectl get kustomization <app-name> -n flux-system
|
||||
kubectl describe kustomization <app-name> -n flux-system
|
||||
```
|
||||
|
||||
**Check Helm release**:
|
||||
```bash
|
||||
kubectl get helmrelease -n <namespace>
|
||||
kubectl describe helmrelease <app-name> -n <namespace>
|
||||
```
|
||||
|
||||
#### Manual Sync Trigger
|
||||
|
||||
```bash
|
||||
# Trigger immediate sync
|
||||
flux reconcile source git flux-system
|
||||
flux reconcile kustomization <app-name>
|
||||
```
|
||||
|
||||
### 4. Best Practices
|
||||
|
||||
#### Repository Structure
|
||||
|
||||
```
|
||||
gitops-repo/
|
||||
├── infrastructure/
|
||||
│ ├── namespace.yaml
|
||||
│ ├── ingress-controller.yaml
|
||||
│ └── cert-manager.yaml
|
||||
└── apps/
|
||||
├── besu/
|
||||
│ ├── Chart.yaml
|
||||
│ ├── values.yaml
|
||||
│ └── templates/
|
||||
├── firefly/
|
||||
└── ...
|
||||
```
|
||||
|
||||
#### Branch Strategy
|
||||
|
||||
- **main**: Production deployments
|
||||
- **staging**: Staging environment
|
||||
- **develop**: Development environment
|
||||
|
||||
#### Change Management
|
||||
|
||||
1. Create feature branch
|
||||
2. Make changes
|
||||
3. Test in development
|
||||
4. Merge to staging
|
||||
5. Promote to production
|
||||
|
||||
### 5. Common Operations
|
||||
|
||||
#### Suspend Sync
|
||||
|
||||
```bash
|
||||
# Suspend specific application
|
||||
flux suspend kustomization <app-name>
|
||||
|
||||
# Resume
|
||||
flux resume kustomization <app-name>
|
||||
```
|
||||
|
||||
#### Rollback Changes
|
||||
|
||||
```bash
|
||||
# Revert Git commit
|
||||
git revert <commit-hash>
|
||||
git push origin main
|
||||
|
||||
# Or manually edit and push
|
||||
```
|
||||
|
||||
#### Update Helm Chart
|
||||
|
||||
```bash
|
||||
# Update chart version in values.yaml
|
||||
# Commit and push
|
||||
git add gitops/apps/<app>/values.yaml
|
||||
git commit -m "Update <app> to version X.Y.Z"
|
||||
git push origin main
|
||||
```
|
||||
|
||||
### 6. Azure Arc GitOps Integration
|
||||
|
||||
#### Configure GitOps in Azure Portal
|
||||
|
||||
1. Navigate to: Azure Arc → Kubernetes → Your cluster
|
||||
2. Go to "GitOps" section
|
||||
3. Add configuration:
|
||||
- Repository URL
|
||||
- Branch
|
||||
- Path
|
||||
- Authentication
|
||||
|
||||
#### View GitOps Status in Azure
|
||||
|
||||
```bash
|
||||
az k8s-extension show \
|
||||
--resource-group HC-Stack \
|
||||
--cluster-name proxmox-k3s-cluster \
|
||||
--cluster-type connectedClusters \
|
||||
--name flux
|
||||
```
|
||||
|
||||
### 7. Security
|
||||
|
||||
#### Secret Management
|
||||
|
||||
**Option 1: Kubernetes Secrets** (not recommended for production):
|
||||
```bash
|
||||
kubectl create secret generic app-secret \
|
||||
--from-literal=password=secret-value \
|
||||
-n <namespace>
|
||||
```
|
||||
|
||||
**Option 2: Sealed Secrets**:
|
||||
```bash
|
||||
# Install Sealed Secrets controller
|
||||
kubectl apply -f https://github.com/bitnami-labs/sealed-secrets/releases/download/v0.18.0/controller.yaml
|
||||
|
||||
# Create sealed secret
|
||||
kubeseal < secret.yaml > sealed-secret.yaml
|
||||
```
|
||||
|
||||
**Option 3: External Secrets Operator**:
|
||||
- Integrate with Azure Key Vault
|
||||
- Use External Secrets Operator
|
||||
|
||||
#### RBAC
|
||||
|
||||
Configure Flux RBAC:
|
||||
```yaml
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: Role
|
||||
metadata:
|
||||
name: flux-<namespace>
|
||||
namespace: <namespace>
|
||||
rules:
|
||||
- apiGroups: [""]
|
||||
resources: ["*"]
|
||||
verbs: ["*"]
|
||||
```
|
||||
|
||||
### 8. Monitoring
|
||||
|
||||
#### Set Up Alerts
|
||||
|
||||
```bash
|
||||
# Create alert for sync failures
|
||||
kubectl apply -f - <<EOF
|
||||
apiVersion: monitoring.coreos.com/v1
|
||||
kind: PrometheusRule
|
||||
metadata:
|
||||
name: flux-sync-alerts
|
||||
spec:
|
||||
groups:
|
||||
- name: flux
|
||||
rules:
|
||||
- alert: FluxSyncFailed
|
||||
expr: flux_kustomization_status_condition{status="False"} == 1
|
||||
annotations:
|
||||
summary: "Flux sync failed"
|
||||
EOF
|
||||
```
|
||||
|
||||
### 9. Disaster Recovery
|
||||
|
||||
#### Backup Git Repository
|
||||
|
||||
```bash
|
||||
# Clone repository
|
||||
git clone --mirror http://git.local:3000/user/gitops-repo.git
|
||||
|
||||
# Backup to external location
|
||||
tar -czf gitops-backup-$(date +%Y%m%d).tar.gz gitops-repo.git
|
||||
```
|
||||
|
||||
#### Restore from Backup
|
||||
|
||||
```bash
|
||||
# Restore repository
|
||||
tar -xzf gitops-backup-YYYYMMDD.tar.gz
|
||||
cd gitops-repo.git
|
||||
git remote set-url origin http://git.local:3000/user/gitops-repo.git
|
||||
git push --mirror
|
||||
```
|
||||
|
||||
187
docs/operations/runbooks/proxmox-operations.md
Normal file
187
docs/operations/runbooks/proxmox-operations.md
Normal file
@@ -0,0 +1,187 @@
|
||||
# Proxmox Operations Runbook
|
||||
|
||||
## Common Operations
|
||||
|
||||
### Cluster Management
|
||||
|
||||
#### Check Cluster Status
|
||||
|
||||
```bash
|
||||
# View cluster status
|
||||
pvecm status
|
||||
|
||||
# List all nodes
|
||||
pvecm nodes
|
||||
|
||||
# View cluster configuration
|
||||
cat /etc/pve/corosync.conf
|
||||
```
|
||||
|
||||
#### Add Node to Cluster
|
||||
|
||||
```bash
|
||||
# On new node
|
||||
pvecm add <existing-node-ip>
|
||||
```
|
||||
|
||||
#### Remove Node from Cluster
|
||||
|
||||
```bash
|
||||
# On node to remove
|
||||
pvecm delnode <node-name>
|
||||
```
|
||||
|
||||
### VM Management
|
||||
|
||||
#### Create VM from Template
|
||||
|
||||
```bash
|
||||
# Via CLI
|
||||
qm clone <template-vmid> <new-vmid> --name <vm-name>
|
||||
qm set <new-vmid> --net0 virtio,bridge=vmbr0
|
||||
qm set <new-vmid> --ipconfig0 ip=<ip-address>/24,gw=<gateway>
|
||||
qm start <new-vmid>
|
||||
```
|
||||
|
||||
#### Migrate VM
|
||||
|
||||
```bash
|
||||
# Live migration
|
||||
qm migrate <vmid> <target-node> --online
|
||||
|
||||
# Stop and migrate
|
||||
qm shutdown <vmid>
|
||||
qm migrate <vmid> <target-node>
|
||||
```
|
||||
|
||||
#### Enable HA for VM
|
||||
|
||||
```bash
|
||||
# Via web UI: Datacenter → HA → Add
|
||||
# Or via CLI
|
||||
ha-manager add <vmid>:started
|
||||
```
|
||||
|
||||
### Storage Management
|
||||
|
||||
#### List Storage
|
||||
|
||||
```bash
|
||||
pvesm status
|
||||
```
|
||||
|
||||
#### Add NFS Storage
|
||||
|
||||
```bash
|
||||
pvesm add nfs <storage-name> \
|
||||
--server <nfs-server> \
|
||||
--path <nfs-path> \
|
||||
--content images,iso,vztmpl,backup
|
||||
```
|
||||
|
||||
#### Check Storage Usage
|
||||
|
||||
```bash
|
||||
pvesm list
|
||||
df -h
|
||||
```
|
||||
|
||||
### Backup Operations
|
||||
|
||||
#### Create Backup
|
||||
|
||||
```bash
|
||||
# Via web UI: Backup → Create
|
||||
# Or via CLI
|
||||
vzdump <vmid> --storage <storage-name> --compress zstd
|
||||
```
|
||||
|
||||
#### Restore from Backup
|
||||
|
||||
```bash
|
||||
# Via web UI: Backup → Restore
|
||||
# Or via CLI
|
||||
qmrestore <backup-file> <vmid> --storage <storage-name>
|
||||
```
|
||||
|
||||
### Network Management
|
||||
|
||||
#### List Networks
|
||||
|
||||
```bash
|
||||
cat /etc/network/interfaces
|
||||
ip addr show
|
||||
```
|
||||
|
||||
#### Add Bridge
|
||||
|
||||
```bash
|
||||
# Edit /etc/network/interfaces
|
||||
# Add bridge configuration
|
||||
# Apply changes
|
||||
ifup vmbr1
|
||||
```
|
||||
|
||||
### Troubleshooting
|
||||
|
||||
#### Check Node Status
|
||||
|
||||
```bash
|
||||
# System status
|
||||
pvecm status
|
||||
systemctl status pve-cluster
|
||||
systemctl status corosync
|
||||
systemctl status pvedaemon
|
||||
```
|
||||
|
||||
#### View Logs
|
||||
|
||||
```bash
|
||||
# Cluster logs
|
||||
journalctl -u pve-cluster
|
||||
journalctl -u corosync
|
||||
|
||||
# VM logs
|
||||
qm config <vmid>
|
||||
cat /var/log/pve/tasks/active
|
||||
```
|
||||
|
||||
#### Fix Cluster Issues
|
||||
|
||||
```bash
|
||||
# Restart cluster services
|
||||
systemctl restart pve-cluster
|
||||
systemctl restart corosync
|
||||
|
||||
# Rejoin cluster (if needed)
|
||||
pvecm updatecerts -f
|
||||
```
|
||||
|
||||
### Maintenance
|
||||
|
||||
#### Update Proxmox
|
||||
|
||||
```bash
|
||||
apt update
|
||||
apt dist-upgrade
|
||||
pveam update
|
||||
```
|
||||
|
||||
#### Reboot Node
|
||||
|
||||
```bash
|
||||
# Ensure VMs are migrated or stopped
|
||||
# Reboot
|
||||
reboot
|
||||
```
|
||||
|
||||
#### Maintenance Mode
|
||||
|
||||
```bash
|
||||
# Enable maintenance mode
|
||||
pvecm expected 1
|
||||
|
||||
# Disable maintenance mode
|
||||
pvecm expected 2
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user