Initial commit: loc_az_hci (smom-dbis-138 excluded via .gitignore)
Some checks failed
Test / test (push) Has been cancelled

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
defiQUG
2026-02-08 09:04:46 -08:00
commit c39465c2bd
386 changed files with 50649 additions and 0 deletions

View File

@@ -0,0 +1,211 @@
# QEMU Guest Agent Setup Guide
## Overview
QEMU Guest Agent provides better integration between Proxmox and VMs, enabling:
- **Proper VM shutdown/reboot** from Proxmox Web UI
- **Automatic IP address detection** in Proxmox
- **Better VM status reporting** (CPU, memory, disk usage)
- **File system information** and operations
- **Time synchronization** between host and guest
## Prerequisites
- VMs must have Ubuntu installed and be reachable via SSH
- SSH key access configured
- VMs must be running
## Quick Setup
### Automated Setup (Recommended)
```bash
# Set SSH key (if different from default)
export SSH_KEY="~/.ssh/id_rsa"
export SSH_USER="ubuntu"
# Run setup script
./scripts/setup-guest-agent.sh
```
This script will:
1. Install `qemu-guest-agent` on each VM
2. Enable and start the service
3. Enable agent in Proxmox VM configuration
4. Verify agent is working
## Manual Setup
### Step 1: Install Guest Agent on VM
SSH to each VM and run:
```bash
sudo apt-get update
sudo apt-get install -y qemu-guest-agent
sudo systemctl enable qemu-guest-agent
sudo systemctl start qemu-guest-agent
sudo systemctl status qemu-guest-agent
```
### Step 2: Enable Agent in Proxmox
For each VM in Proxmox Web UI:
1. **Stop the VM** (if running)
2. **Go to:** VM → **Options** tab
3. **Find:** "QEMU Guest Agent"
4. **Click:** "Edit"
5. **Enable:** Check "Use QEMU Guest Agent"
6. **Click:** "OK"
7. **Start the VM**
### Step 3: Verify Agent is Working
In Proxmox Web UI:
1. **Go to:** VM → **Monitor** tab
2. **Look for:** "QEMU Guest Agent" section
3. **Check:** Agent status should show as active
Or via command line:
```bash
# Check agent status via Proxmox API
curl -k -s -H "Cookie: PVEAuthCookie=<ticket>" \
"https://192.168.1.206:8006/api2/json/nodes/pve/qemu/100/agent/get-fsinfo"
```
## Troubleshooting
### Agent Not Responding
**Symptoms:**
- Proxmox shows "Guest Agent not running"
- Cannot get VM IP address
- Cannot shutdown VM from Proxmox
**Solution:**
1. **Check agent is installed:**
```bash
ssh ubuntu@<VM_IP>
sudo systemctl status qemu-guest-agent
```
2. **Restart agent:**
```bash
sudo systemctl restart qemu-guest-agent
```
3. **Check logs:**
```bash
sudo journalctl -u qemu-guest-agent -f
```
4. **Reinstall agent:**
```bash
sudo apt-get install --reinstall qemu-guest-agent
sudo systemctl restart qemu-guest-agent
```
5. **Use fix script:**
```bash
./scripts/fix-guest-agent.sh
```
### Agent Not Enabled in Proxmox
**Symptoms:**
- Agent installed on VM but not working
- Proxmox doesn't detect agent
**Solution:**
1. **Stop VM**
2. **Enable agent in Proxmox:**
- Options → QEMU Guest Agent → Enable
3. **Start VM**
4. **Wait 1-2 minutes** for agent to initialize
### Agent Takes Time to Initialize
**Note:** After enabling the agent, it may take 1-2 minutes to fully initialize and start responding to Proxmox queries. This is normal.
**Check status:**
```bash
# On VM
sudo systemctl status qemu-guest-agent
# Should show: Active: active (running)
```
## Verification
### Check Agent Status on VM
```bash
ssh ubuntu@<VM_IP>
sudo systemctl status qemu-guest-agent
```
**Expected output:**
```
● qemu-guest-agent.service - QEMU Guest Agent
Loaded: loaded (/lib/systemd/system/qemu-guest-agent.service; enabled)
Active: active (running) since ...
```
### Check Agent in Proxmox
**Web UI:**
- VM → Monitor → QEMU Guest Agent
- Should show agent information
**API:**
```bash
# Get filesystem info (requires authentication)
curl -k -s -H "Cookie: PVEAuthCookie=<ticket>" \
"https://192.168.1.206:8006/api2/json/nodes/pve/qemu/100/agent/get-fsinfo"
```
## Benefits After Setup
Once guest agent is working:
1. **VM Shutdown/Reboot:**
- Can properly shutdown/reboot VMs from Proxmox
- No need to force stop
2. **IP Address Detection:**
- Proxmox automatically detects VM IP addresses
- Shows in VM summary
3. **Resource Monitoring:**
- Better CPU, memory, disk usage reporting
- More accurate VM statistics
4. **File Operations:**
- Can execute commands in VM from Proxmox
- File system information available
## Scripts Reference
- `scripts/setup-guest-agent.sh` - Install and configure guest agent
- `scripts/fix-guest-agent.sh` - Fix guest agent issues
## When to Run
Run guest agent setup **after**:
- ✅ Ubuntu installation is complete on all VMs
- ✅ VMs are reachable via SSH
- ✅ Install scripts have been applied (optional, can run before)
## Summary
1. **Install agent:** `./scripts/setup-guest-agent.sh`
2. **Verify:** Check Proxmox Web UI → VM → Monitor
3. **Fix if needed:** `./scripts/fix-guest-agent.sh`
Guest agent setup should be done after all VMs are installed and configured, as it requires SSH access to the VMs.

View File

@@ -0,0 +1,121 @@
# Ubuntu Images for Proxmox VE
## Standard Ubuntu ISO (What You're Using Now)
**The Ubuntu ISO from Ubuntu's website is correct!**
- **Source**: https://ubuntu.com/download/server
- **Format**: `.iso` file
- **Use Case**: Manual installation, full control over installation process
- **Current Status**: ✅ Working - your VMs are booting from it
**There is NO Proxmox-specific Ubuntu ISO.** Proxmox VE uses standard operating system ISOs from their official sources.
## Cloud-Init Templates (Faster Alternative)
For faster, automated deployments, Proxmox supports **Cloud-Init templates** (pre-configured qcow2 images).
### What Are Cloud-Init Templates?
- **Pre-installed** Ubuntu images with Cloud-Init support
- **Ready to clone** - no installation needed
- **Automated configuration** via Cloud-Init (IP, SSH keys, user data)
- **Faster deployment** - clone and configure, no OS installation
### Where to Get Cloud-Init Templates
#### Option 1: Download Official Ubuntu Cloud Images
Ubuntu provides official Cloud-Init images:
```bash
# Ubuntu 24.04 LTS Cloud Image
wget https://cloud-images.ubuntu.com/releases/24.04/release/ubuntu-24.04-server-cloudimg-amd64.img
# Ubuntu 22.04 LTS Cloud Image
wget https://cloud-images.ubuntu.com/releases/22.04/release/ubuntu-22.04-server-cloudimg-amd64.img
```
#### Option 2: Create Template from ISO
You can create a Cloud-Init template from the ISO you already have:
1. Install Ubuntu from ISO
2. Install Cloud-Init: `sudo apt install cloud-init`
3. Configure Cloud-Init
4. Convert VM to template in Proxmox
### How to Use Cloud-Init Templates
1. **Download/Upload Template**
- Download Ubuntu Cloud Image
- Upload to Proxmox storage
- Convert to template
2. **Create VM from Template**
- Clone template (instant, no installation)
- Configure Cloud-Init settings:
- IP address
- SSH keys
- User data scripts
- Start VM - it's ready!
3. **Benefits**
-**Instant deployment** (no OS installation)
- 🔧 **Automated configuration** via Cloud-Init
- 📦 **Consistent base images**
- 🚀 **Perfect for automation** (Terraform, scripts)
## Comparison: ISO vs Cloud-Init Template
| Feature | ISO Image | Cloud-Init Template |
|---------|-----------|---------------------|
| **Installation** | Manual (15-30 min) | Instant clone |
| **Configuration** | Manual | Automated via Cloud-Init |
| **Flexibility** | Full control | Pre-configured |
| **Automation** | Limited | Excellent |
| **Use Case** | One-off VMs | Production, automation |
## Recommendation
### Use ISO (Current Method) When:
- ✅ Installing first time (learning)
- ✅ Need full control over installation
- ✅ Custom partitioning required
- ✅ One-off VMs
### Use Cloud-Init Template When:
- ✅ Deploying multiple VMs
- ✅ Automation (Terraform, scripts)
- ✅ Consistent base images
- ✅ Production deployments
## Your Current Setup
You're using the **correct approach** for initial setup:
- ✅ Standard Ubuntu ISO from Ubuntu website
- ✅ Manual installation gives you full control
- ✅ Once installed, you can convert to template for future use
## Next Steps (Optional)
If you want to create Cloud-Init templates for faster future deployments:
1. **After installing Ubuntu on your VMs:**
- Install Cloud-Init: `sudo apt install cloud-init`
- Configure as needed
- Convert VM to template in Proxmox
2. **Or download official Cloud Image:**
- Download Ubuntu Cloud Image
- Upload to Proxmox
- Convert to template
- Use for future VMs
## Summary
-**Your Ubuntu ISO is correct** - no Proxmox-specific ISO exists
-**Standard Ubuntu Server ISO** from Ubuntu website is the right choice
- 💡 **Cloud-Init templates** are an optional optimization for automation
- 🎯 **Current method is fine** - continue with ISO installation

View File

@@ -0,0 +1,237 @@
# Azure Arc Troubleshooting Runbook
## Common Issues and Solutions
### Agent Connection Issues
#### Check Agent Status
```bash
# Check agent status
azcmagent show
# Check agent version
azcmagent version
# View agent logs
journalctl -u azcmagent -f
```
#### Agent Not Connecting
**Symptoms**: Agent shows as "Disconnected" in Azure Portal
**Solutions**:
1. Check network connectivity:
```bash
# Test Azure connectivity
curl -v https://management.azure.com
```
2. Verify credentials:
```bash
# Reconnect with credentials
azcmagent disconnect --force-local-only
azcmagent connect \
--resource-group HC-Stack \
--tenant-id <tenant-id> \
--location eastus \
--subscription-id <subscription-id>
```
3. Check firewall rules:
```bash
# Ensure outbound HTTPS (443) is allowed
ufw status
```
#### Agent Installation Issues
**Symptoms**: Agent installation fails
**Solutions**:
1. Check prerequisites:
```bash
# Verify system requirements
uname -m # Should be x86_64 or arm64
cat /etc/os-release
```
2. Manual installation:
```bash
wget https://aka.ms/azcmagent -O install_linux_azcmagent.sh
chmod +x install_linux_azcmagent.sh
./install_linux_azcmagent.sh
```
### Kubernetes Arc Issues
#### Cluster Not Appearing in Azure
**Symptoms**: Cluster not visible in Azure Portal
**Solutions**:
1. Verify cluster connection:
```bash
az arc kubernetes show \
--resource-group HC-Stack \
--name proxmox-k3s-cluster
```
2. Check connectivity:
```bash
kubectl cluster-info
kubectl get nodes
```
3. Re-onboard cluster:
```bash
az connectedk8s connect \
--resource-group HC-Stack \
--name proxmox-k3s-cluster \
--location eastus
```
#### GitOps Not Syncing
**Symptoms**: Changes in Git not reflected in cluster
**Solutions**:
1. Check Flux status:
```bash
kubectl get pods -n flux-system
kubectl logs -n flux-system -l app=flux
```
2. Verify Git repository access:
```bash
# Check GitOps source
kubectl get gitrepository -n flux-system
kubectl describe gitrepository -n flux-system
```
3. Check GitOps configuration in Azure:
```bash
az k8s-extension show \
--resource-group HC-Stack \
--cluster-name proxmox-k3s-cluster \
--cluster-type connectedClusters \
--name flux
```
### Resource Bridge Issues
#### Resource Bridge Not Working
**Symptoms**: Cannot manage VMs from Azure Portal
**Solutions**:
1. Verify custom location:
```bash
az customlocation show \
--resource-group HC-Stack \
--name proxmox-k3s-cluster-location
```
2. Check Resource Bridge pods:
```bash
kubectl get pods -n arc-resource-bridge
kubectl logs -n arc-resource-bridge -l app=resource-bridge
```
### Policy and Compliance Issues
#### Policies Not Applying
**Symptoms**: Azure Policy not enforcing on Arc resources
**Solutions**:
1. Check policy assignment:
```bash
az policy assignment list \
--scope /subscriptions/<subscription-id>/resourceGroups/HC-Stack
```
2. Verify agent compliance:
```bash
az connectedmachine show \
--resource-group HC-Stack \
--name <machine-name> \
--query "status"
```
### Monitoring Issues
#### Metrics Not Appearing
**Symptoms**: No metrics in Azure Monitor
**Solutions**:
1. Check agent extensions:
```bash
az connectedmachine extension list \
--resource-group HC-Stack \
--machine-name <machine-name>
```
2. Verify Log Analytics workspace:
```bash
az monitor log-analytics workspace show \
--resource-group HC-Stack \
--workspace-name <workspace-name>
```
### Common Commands
#### View All Arc Resources
```bash
# List all Arc-enabled servers
az connectedmachine list --resource-group HC-Stack -o table
# List all Arc-enabled Kubernetes clusters
az arc kubernetes list --resource-group HC-Stack -o table
```
#### Check Agent Health
```bash
# Agent status
azcmagent show
# Agent logs
journalctl -u azcmagent --since "1 hour ago"
```
#### Reconnect Resources
```bash
# Reconnect server
azcmagent disconnect --force-local-only
azcmagent connect --resource-group HC-Stack --tenant-id <id> --location eastus --subscription-id <id>
# Reconnect Kubernetes
az connectedk8s disconnect --resource-group HC-Stack --name <cluster-name> --yes
az connectedk8s connect --resource-group HC-Stack --name <cluster-name> --location eastus
```
### Log Locations
- **Agent logs**: `/var/opt/azcmagent/log/`
- **System logs**: `journalctl -u azcmagent`
- **Kubernetes logs**: `kubectl logs -n azure-arc`
- **GitOps logs**: `kubectl logs -n flux-system`
### Support Resources
- Azure Arc documentation: https://docs.microsoft.com/azure/azure-arc
- Troubleshooting guide: https://docs.microsoft.com/azure/azure-arc/servers/troubleshooting
- GitHub issues: https://github.com/microsoft/azure_arc/issues

View File

@@ -0,0 +1,321 @@
# GitOps Workflow Runbook
## Overview
This runbook describes the GitOps workflow using Flux for managing Kubernetes deployments.
## GitOps Architecture
```
Git Repository (Gitea/GitLab)
│ (Poll/Sync)
Flux Controller (Kubernetes)
│ (Apply)
Kubernetes Cluster
│ (Deploy)
Application Pods
```
## Workflow
### 1. Making Changes
#### Update Application Configuration
1. Clone Git repository:
```bash
git clone http://git.local:3000/user/gitops-repo.git
cd gitops-repo
```
2. Edit Helm chart values:
```bash
# Edit values.yaml
vim gitops/apps/besu/values.yaml
```
3. Commit and push:
```bash
git add gitops/apps/besu/values.yaml
git commit -m "Update Besu configuration"
git push origin main
```
#### Add New Application
1. Add Helm chart to repository:
```bash
cp -r /path/to/new-chart gitops/apps/new-app/
```
2. Create Flux Kustomization:
```bash
# Create gitops/apps/new-app/kustomization.yaml
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: new-app
namespace: flux-system
spec:
interval: 10m
path: ./apps/new-app
prune: true
sourceRef:
kind: GitRepository
name: flux-system
```
3. Commit and push:
```bash
git add gitops/apps/new-app/
git commit -m "Add new application"
git push origin main
```
### 2. Monitoring Sync Status
#### Check Flux Status
```bash
# Check Flux pods
kubectl get pods -n flux-system
# Check Git repository status
kubectl get gitrepository -n flux-system
kubectl describe gitrepository flux-system -n flux-system
# Check Kustomization status
kubectl get kustomization -n flux-system
kubectl describe kustomization <app-name> -n flux-system
```
#### View Sync Events
```bash
# Watch Flux events
kubectl get events -n flux-system --sort-by='.lastTimestamp'
# View Flux logs
kubectl logs -n flux-system -l app=flux -f
```
### 3. Troubleshooting
#### Sync Not Happening
**Check Git repository access**:
```bash
kubectl get gitrepository flux-system -n flux-system -o yaml
kubectl describe gitrepository flux-system -n flux-system
```
**Check authentication**:
```bash
# For HTTPS with token
kubectl get secret -n flux-system
# For SSH
kubectl get secret flux-system -n flux-system -o yaml
```
#### Application Not Deploying
**Check Kustomization**:
```bash
kubectl get kustomization <app-name> -n flux-system
kubectl describe kustomization <app-name> -n flux-system
```
**Check Helm release**:
```bash
kubectl get helmrelease -n <namespace>
kubectl describe helmrelease <app-name> -n <namespace>
```
#### Manual Sync Trigger
```bash
# Trigger immediate sync
flux reconcile source git flux-system
flux reconcile kustomization <app-name>
```
### 4. Best Practices
#### Repository Structure
```
gitops-repo/
├── infrastructure/
│ ├── namespace.yaml
│ ├── ingress-controller.yaml
│ └── cert-manager.yaml
└── apps/
├── besu/
│ ├── Chart.yaml
│ ├── values.yaml
│ └── templates/
├── firefly/
└── ...
```
#### Branch Strategy
- **main**: Production deployments
- **staging**: Staging environment
- **develop**: Development environment
#### Change Management
1. Create feature branch
2. Make changes
3. Test in development
4. Merge to staging
5. Promote to production
### 5. Common Operations
#### Suspend Sync
```bash
# Suspend specific application
flux suspend kustomization <app-name>
# Resume
flux resume kustomization <app-name>
```
#### Rollback Changes
```bash
# Revert Git commit
git revert <commit-hash>
git push origin main
# Or manually edit and push
```
#### Update Helm Chart
```bash
# Update chart version in values.yaml
# Commit and push
git add gitops/apps/<app>/values.yaml
git commit -m "Update <app> to version X.Y.Z"
git push origin main
```
### 6. Azure Arc GitOps Integration
#### Configure GitOps in Azure Portal
1. Navigate to: Azure Arc → Kubernetes → Your cluster
2. Go to "GitOps" section
3. Add configuration:
- Repository URL
- Branch
- Path
- Authentication
#### View GitOps Status in Azure
```bash
az k8s-extension show \
--resource-group HC-Stack \
--cluster-name proxmox-k3s-cluster \
--cluster-type connectedClusters \
--name flux
```
### 7. Security
#### Secret Management
**Option 1: Kubernetes Secrets** (not recommended for production):
```bash
kubectl create secret generic app-secret \
--from-literal=password=secret-value \
-n <namespace>
```
**Option 2: Sealed Secrets**:
```bash
# Install Sealed Secrets controller
kubectl apply -f https://github.com/bitnami-labs/sealed-secrets/releases/download/v0.18.0/controller.yaml
# Create sealed secret
kubeseal < secret.yaml > sealed-secret.yaml
```
**Option 3: External Secrets Operator**:
- Integrate with Azure Key Vault
- Use External Secrets Operator
#### RBAC
Configure Flux RBAC:
```yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: flux-<namespace>
namespace: <namespace>
rules:
- apiGroups: [""]
resources: ["*"]
verbs: ["*"]
```
### 8. Monitoring
#### Set Up Alerts
```bash
# Create alert for sync failures
kubectl apply -f - <<EOF
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: flux-sync-alerts
spec:
groups:
- name: flux
rules:
- alert: FluxSyncFailed
expr: flux_kustomization_status_condition{status="False"} == 1
annotations:
summary: "Flux sync failed"
EOF
```
### 9. Disaster Recovery
#### Backup Git Repository
```bash
# Clone repository
git clone --mirror http://git.local:3000/user/gitops-repo.git
# Backup to external location
tar -czf gitops-backup-$(date +%Y%m%d).tar.gz gitops-repo.git
```
#### Restore from Backup
```bash
# Restore repository
tar -xzf gitops-backup-YYYYMMDD.tar.gz
cd gitops-repo.git
git remote set-url origin http://git.local:3000/user/gitops-repo.git
git push --mirror
```

View File

@@ -0,0 +1,187 @@
# Proxmox Operations Runbook
## Common Operations
### Cluster Management
#### Check Cluster Status
```bash
# View cluster status
pvecm status
# List all nodes
pvecm nodes
# View cluster configuration
cat /etc/pve/corosync.conf
```
#### Add Node to Cluster
```bash
# On new node
pvecm add <existing-node-ip>
```
#### Remove Node from Cluster
```bash
# On node to remove
pvecm delnode <node-name>
```
### VM Management
#### Create VM from Template
```bash
# Via CLI
qm clone <template-vmid> <new-vmid> --name <vm-name>
qm set <new-vmid> --net0 virtio,bridge=vmbr0
qm set <new-vmid> --ipconfig0 ip=<ip-address>/24,gw=<gateway>
qm start <new-vmid>
```
#### Migrate VM
```bash
# Live migration
qm migrate <vmid> <target-node> --online
# Stop and migrate
qm shutdown <vmid>
qm migrate <vmid> <target-node>
```
#### Enable HA for VM
```bash
# Via web UI: Datacenter → HA → Add
# Or via CLI
ha-manager add <vmid>:started
```
### Storage Management
#### List Storage
```bash
pvesm status
```
#### Add NFS Storage
```bash
pvesm add nfs <storage-name> \
--server <nfs-server> \
--path <nfs-path> \
--content images,iso,vztmpl,backup
```
#### Check Storage Usage
```bash
pvesm list
df -h
```
### Backup Operations
#### Create Backup
```bash
# Via web UI: Backup → Create
# Or via CLI
vzdump <vmid> --storage <storage-name> --compress zstd
```
#### Restore from Backup
```bash
# Via web UI: Backup → Restore
# Or via CLI
qmrestore <backup-file> <vmid> --storage <storage-name>
```
### Network Management
#### List Networks
```bash
cat /etc/network/interfaces
ip addr show
```
#### Add Bridge
```bash
# Edit /etc/network/interfaces
# Add bridge configuration
# Apply changes
ifup vmbr1
```
### Troubleshooting
#### Check Node Status
```bash
# System status
pvecm status
systemctl status pve-cluster
systemctl status corosync
systemctl status pvedaemon
```
#### View Logs
```bash
# Cluster logs
journalctl -u pve-cluster
journalctl -u corosync
# VM logs
qm config <vmid>
cat /var/log/pve/tasks/active
```
#### Fix Cluster Issues
```bash
# Restart cluster services
systemctl restart pve-cluster
systemctl restart corosync
# Rejoin cluster (if needed)
pvecm updatecerts -f
```
### Maintenance
#### Update Proxmox
```bash
apt update
apt dist-upgrade
pveam update
```
#### Reboot Node
```bash
# Ensure VMs are migrated or stopped
# Reboot
reboot
```
#### Maintenance Mode
```bash
# Enable maintenance mode
pvecm expected 1
# Disable maintenance mode
pvecm expected 2
```