# Guest Agent IP Discovery - Architecture Guide **Date:** 2025-11-27 **Purpose:** Document the guest-agent IP discovery pattern for all scripts ## Overview All SSH-using scripts now discover VM IPs dynamically from the QEMU Guest Agent instead of hard-coding IP addresses. This provides: - **Flexibility:** VMs can change IPs without breaking scripts - **Maintainability:** No IP addresses scattered throughout codebase - **Reliability:** Single source of truth (guest agent) - **Scalability:** Easy to add new VMs without updating IP lists ## Architecture ### Helper Library **Location:** `scripts/lib/proxmox_vm_helpers.sh` **Key Functions:** - `get_vm_ip_from_guest_agent ` - Get IP from guest agent - `get_vm_ip_or_warn ` - Get IP with warning if unavailable - `get_vm_ip_or_fallback ` - Get IP with fallback - `ensure_guest_agent_enabled ` - Enable agent in VM config - `wait_for_guest_agent ` - Wait for agent to be ready ### VM Array Pattern **Before (hard-coded IPs):** ```bash VMS=( "100 cloudflare-tunnel 192.168.1.60" "101 k3s-master 192.168.1.188" ) ``` **After (IP-free):** ```bash VMS=( "100 cloudflare-tunnel" "101 k3s-master" ) ``` ### Script Pattern **Before:** ```bash read -r vmid name ip <<< "$vm_spec" ssh "${VM_USER}@${ip}" ... ``` **After:** ```bash read -r vmid name <<< "$vm_spec" ip="$(get_vm_ip_or_warn "$vmid" "$name" || true)" [[ -z "$ip" ]] && continue ssh "${VM_USER}@${ip}" ... ``` ## Bootstrap Problem ### The Challenge Guest-agent IP discovery only works **after** QEMU Guest Agent is installed and running in the VM. ### Solution: Fallback Pattern For bootstrap scripts (installing QGA itself), use fallback IPs: ```bash # Fallback IPs for bootstrap declare -A FALLBACK_IPS=( ["100"]="192.168.1.60" ["101"]="192.168.1.188" ) # Get IP with fallback ip="$(get_vm_ip_or_fallback "$vmid" "$name" "${FALLBACK_IPS[$vmid]:-}" || true)" ``` ### Bootstrap Flow 1. **First Pass:** Use fallback IPs to install QGA 2. **After QGA:** All subsequent scripts use guest-agent discovery 3. **No More Hard-coded IPs:** Once QGA is installed everywhere ## Updated Scripts ### ✅ Refactored Scripts 1. **`scripts/ops/ssh-test-all.sh`** - Example SSH test script 2. **`scripts/deploy/configure-vm-services.sh`** - Service deployment 3. **`scripts/deploy/add-ssh-keys-to-vms.sh`** - SSH key management 4. **`scripts/deploy/verify-cloud-init.sh`** - Cloud-init verification 5. **`scripts/infrastructure/install-qemu-guest-agent.sh`** - QGA installation (with fallback) ### 📋 Scripts to Update All scripts that use hard-coded IPs should be updated: - `scripts/troubleshooting/diagnose-vm-issues.sh` - `scripts/troubleshooting/test-all-access-paths.sh` - `scripts/deploy/deploy-vms-via-api.sh` (IPs needed for creation, but can use discovery after) - And many more... ## Usage Examples ### Example 1: Simple SSH Script ```bash #!/bin/bash source "$PROJECT_ROOT/scripts/lib/proxmox_vm_helpers.sh" VMS=( "100 cloudflare-tunnel" "101 k3s-master" ) for vm_spec in "${VMS[@]}"; do read -r vmid name <<< "$vm_spec" ip="$(get_vm_ip_or_warn "$vmid" "$name" || true)" [[ -z "$ip" ]] && continue ssh "${VM_USER}@${ip}" "hostname" done ``` ### Example 2: Bootstrap Script (with Fallback) ```bash #!/bin/bash source "$PROJECT_ROOT/scripts/lib/proxmox_vm_helpers.sh" declare -A FALLBACK_IPS=( ["100"]="192.168.1.60" ) for vm_spec in "${VMS[@]}"; do read -r vmid name <<< "$vm_spec" ip="$(get_vm_ip_or_fallback "$vmid" "$name" "${FALLBACK_IPS[$vmid]:-}" || true)" [[ -z "$ip" ]] && continue # Install QGA using discovered/fallback IP ssh "${VM_USER}@${ip}" "sudo apt install -y qemu-guest-agent" done ``` ### Example 3: Service Deployment ```bash #!/bin/bash source "$PROJECT_ROOT/scripts/lib/proxmox_vm_helpers.sh" declare -A VM_IPS # Discover all IPs first for vm_spec in "${VMS[@]}"; do read -r vmid name <<< "$vm_spec" ip="$(get_vm_ip_or_warn "$vmid" "$name" || true)" [[ -n "$ip" ]] && VM_IPS["$vmid"]="$ip" done # Use discovered IPs if [[ -n "${VM_IPS[102]:-}" ]]; then deploy_gitea "${VM_IPS[102]}" fi ``` ## Prerequisites ### On Proxmox Host 1. **jq installed:** ```bash apt update && apt install -y jq ``` 2. **Helper library accessible:** - Scripts run on Proxmox host: Direct access - Scripts run remotely: Copy helper or source via SSH ### In VMs 1. **QEMU Guest Agent installed:** ```bash sudo apt install -y qemu-guest-agent sudo systemctl enable --now qemu-guest-agent ``` 2. **Agent enabled in VM config:** ```bash qm set --agent enabled=1 ``` ## Migration Checklist For each script that uses hard-coded IPs: - [ ] Remove IPs from VM array (keep only VMID and NAME) - [ ] Add `source` for helper library - [ ] Replace `read -r vmid name ip` with `read -r vmid name` - [ ] Add IP discovery: `ip="$(get_vm_ip_or_warn "$vmid" "$name" || true)"` - [ ] Add skip logic: `[[ -z "$ip" ]] && continue` - [ ] Test script with guest agent enabled - [ ] For bootstrap scripts, add fallback IPs ## Benefits 1. **No IP Maintenance:** IPs change? Scripts still work 2. **Single Source of Truth:** Guest agent provides accurate IPs 3. **Easier Testing:** Can test with different IPs without code changes 4. **Better Error Handling:** Scripts gracefully handle missing guest agent 5. **Future-Proof:** Works with DHCP, dynamic IPs, multiple interfaces ## Troubleshooting ### "No IP from guest agent" **Causes:** - QEMU Guest Agent not installed in VM - Agent not enabled in VM config - VM not powered on - Agent service not running **Fix:** ```bash # In VM sudo apt install -y qemu-guest-agent sudo systemctl enable --now qemu-guest-agent # On Proxmox host qm set --agent enabled=1 ``` ### "jq command not found" **Fix:** ```bash apt update && apt install -y jq ``` ### Scripts run remotely (not on Proxmox host) **Options:** 1. Copy helper library to remote location 2. Source via SSH: ```bash ssh proxmox-host "source /path/to/helpers.sh && get_vm_ip_or_warn 100 test" ``` 3. Use Proxmox API instead of `qm` commands --- **Status:** Helper library created, key scripts refactored. Remaining scripts should follow the same pattern.