Files
proxmox/docs/10-best-practices/PROXMOX_FINAL_RECOMMENDATIONS.md
defiQUG 4ebf2d7902
Some checks failed
Deploy to Phoenix / validate (push) Failing after 1s
Deploy to Phoenix / deploy (push) Has been skipped
Deploy to Phoenix / deploy-atomic-swap-dapp (push) Has been skipped
Deploy to Phoenix / cloudflare (push) Has been skipped
chore(repo): sync operator workspace (config, scripts, docs, multi-chain)
Add optional Cosmos/Engine-X/act-runner templates, CWUSDC/EI-matrix tooling,
non-EVM route planner in multi-chain-execution (tests passing), token list and
extraction updates, and documentation (MetaMask matrix, GRU/CWUSDC packets).

Ignore institutional evidence tarballs/sha256 under reports/status.

Validated with: bash scripts/verify/run-all-validation.sh --skip-genesis

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-11 16:25:08 -07:00

10 KiB

Proxmox VE Final Recommendations and Summary

Last Updated: 2026-05-09
Document Version: 1.0
Status: Active Documentation


Live cluster (2026-05-09): PROXMOX_CLUSTER_ARCHITECTURE.md — PVE 9.1.7, 136 running guests, ml110 0; r630-01 57, r630-02 41, r630-03/04 19 each. The sections below retain historical migration and audit notes; do not use them as the current placement source of truth.

Date: 2025-01-20
Status: Complete Review with Actionable Recommendations


Completed Tasks Summary

1. Hostname Migration - COMPLETE

  • r630-01 (192.168.11.11): Successfully renamed from pve to r630-01
  • r630-02 (192.168.11.12): Successfully renamed from pve2 to r630-02
  • All services operational after migration
  • /etc/hosts updated on both hosts

2. IP Address Audit - COMPLETE

  • Cluster (2026-05-09): 136 running LXC/QEMU; static IP inventory: ALL_VMIDS_ENDPOINTS.md
  • IP Conflicts: 0 (re-verify when adding CTs)

3. Proxmox Configuration Review - COMPLETE

  • All hosts reviewed
  • Storage configurations analyzed
  • Issues identified and documented

🔴 Critical Issues and Fixes

Issue 1: Storage Node References Outdated

Problem: Storage configuration files reference old hostnames (pve, pve2) instead of new hostnames (r630-01, r630-02)

Impact: Storage may show as disabled or inaccessible

Fix Applied:

# On r630-01
sed -i 's/nodes pve$/nodes r630-01/' /etc/pve/storage.cfg
sed -i 's/nodes pve /nodes r630-01 /' /etc/pve/storage.cfg

# On r630-02
sed -i 's/nodes pve2$/nodes r630-02/' /etc/pve/storage.cfg
sed -i 's/nodes pve2 /nodes r630-02 /' /etc/pve/storage.cfg

Status: Fixed


📊 Host Configuration Summary

ml110 (192.168.11.10)

  • Status: Operational (cluster member)
  • CPU: 6 cores
  • Memory: ~63 GiB reported; ~3.5% used
  • Guests: 0 (2026-05-09)
  • Recommendation: Use r630-* for new workloads unless intentionally placing on ml110.

r630-01 (192.168.11.11) - Previously "pve"

  • Status: Operational
  • CPU: 32 cores @ 2.40GHz
  • Memory: ~125 GiB reported; ~66% used
  • Guests: 57
  • Recommendation: Monitor thin1/data utilization before large disks.

r630-02 (192.168.11.12) - Previously "pve2"

  • Status: Operational
  • CPU: 56 cores @ 2.00GHz
  • Memory: ~125 GiB reported; ~62% used
  • Guests: 41 (includes infra CTs such as NPMplus — see inventory docs)
  • Recommendation: Watch thin1-r630-02thin6 pool %.

🎯 Critical Recommendations

1. Enable Storage on r630-01 and r630-02 🔴 CRITICAL

Priority: HIGH - Required before starting new VMs

Actions:

  1. Update storage.cfg node references (DONE)
  2. Enable local-lvm storage on r630-01
  3. Enable thin1-thin6 storage on r630-02
  4. Verify storage is accessible

Commands:

# On r630-01
ssh root@192.168.11.11
pvesm set local-lvm --disable 0
pvesm set thin1 --disable 0
pvesm status

# On r630-02
ssh root@192.168.11.12
for storage in thin1 thin2 thin3 thin4 thin5 thin6; do
    pvesm set "$storage" --disable 0
done
pvesm status

2. Verify Existing VMs on r630-02 ⚠️ HIGH PRIORITY

Issue: VMs found on r630-02 storage (VMIDs: 100, 101, 102, 103, 104, 105, 130, 5000, 6200, 7800)

Actions:

  1. List all VMs/containers on r630-02
  2. Verify they're accessible
  3. Check their IP addresses
  4. Update IP audit if needed

Commands:

ssh root@192.168.11.12
pct list
qm list
# Check each VMID's IP configuration

Current: All 34 VMs on ml110 (overloaded)

Recommendation:

  • Migrate some VMs to r630-01 and r630-02
  • Balance workload:
    • ml110: Keep management/lightweight VMs
    • r630-01: Medium workload VMs
    • r630-02: Heavy workload VMs (best CPU)

Benefits:

  • Better performance (ml110 CPU is slower)
  • Better resource utilization
  • Improved redundancy

Issue: Cluster may still reference old hostnames

Actions:

  1. Verify cluster status
  2. Check if hostname changes are reflected
  3. Update cluster configuration if needed

Commands:

# On any cluster node
pvecm status
pvecm nodes
# Verify hostnames are correct

Current State:

  • ml110: Using local-lvm (good performance)
  • r630-01: Only local (directory) - slower
  • r630-02: thin1-thin6 available but need activation

Recommendation:

  • Enable LVM thin storage on both r630-01 and r630-02
  • Use thin provisioning for space efficiency
  • Monitor storage usage

📋 Detailed Recommendations by Category

Storage Recommendations

Immediate Actions (Before Starting VMs)

  1. Enable local-lvm on r630-01

    • Thin pools already exist (pve/data, pve/thin1)
    • Just need to activate in Proxmox
    • Will enable efficient storage
  2. Enable thin storage on r630-02

    • 6 volume groups available (thin1-thin6)
    • Each ~230GB
    • Enable all for maximum flexibility
  3. Verify storage after enabling

    • Test VM creation
    • Test storage migration
    • Monitor performance

Long-term Actions

  1. Implement storage monitoring

    • Set alerts for >80% usage
    • Monitor thin pool usage
    • Track storage growth
  2. Consider shared storage

    • For easier migration
    • For better redundancy
    • NFS or Ceph options

Performance Recommendations

ml110

  • CPU: Older/slower - Reduce workload
  • Memory: High usage (75%) - Monitor closely
  • Action: Migrate some VMs to r630-01/r630-02

r630-01

  • CPU: Good (32 cores) - Ready for workloads
  • Memory: Excellent (99% free) - Can handle many VMs
  • Action: Enable storage, start deploying VMs

r630-02

  • CPU: Excellent (56 cores) - Best performance
  • Memory: Excellent (98% free) - Can handle many VMs
  • Action: Enable storage, verify existing VMs, deploy new VMs

Network Recommendations

Current Status

  • Flat network (192.168.11.0/24)
  • All hosts accessible
  • Gateway: 192.168.11.1

Recommendations

  1. VLAN Migration (Planned)

    • Segment by service type
    • Improve security
    • Better traffic management
  2. Network Monitoring

    • Monitor bandwidth
    • Track performance
    • Alert on issues

Security Recommendations

  1. Update Passwords

    • Some hosts use weak passwords ("password")
    • Consider stronger passwords
    • Use SSH keys where possible
  2. Firewall Configuration

    • Review firewall rules
    • Restrict access where needed
    • Document firewall policies
  3. Access Control

    • Review user permissions
    • Implement least privilege
    • Audit access logs

🚀 Action Plan

Phase 1: Storage Configuration (CRITICAL - Do First)

  1. Update storage.cfg node references
  2. Enable local-lvm on r630-01
  3. Enable thin storage on r630-02
  4. Verify storage is working

Estimated Time: 15-30 minutes

Phase 2: VM Verification

  1. List all VMs on r630-02
  2. Verify VM IP addresses
  3. Update IP audit if needed
  4. Test VM accessibility

Estimated Time: 15-30 minutes

Phase 3: Cluster Verification

  1. Verify cluster status
  2. Check hostname references
  3. Update if needed
  4. Test cluster operations

Estimated Time: 10-15 minutes

Phase 4: VM Distribution (Optional)

  1. Plan VM migration
  2. Migrate VMs to r630-01/r630-02
  3. Balance workload
  4. Monitor performance

Estimated Time: 1-2 hours (depending on number of VMs)


📝 Verification Checklist

Pre-Start Verification

  • Hostnames migrated correctly
  • IP addresses audited (no conflicts)
  • Proxmox services running
  • Storage enabled on r630-01
  • Storage enabled on r630-02
  • VMs on r630-02 verified
  • Cluster configuration updated

Post-Start Verification

  • All VMs accessible
  • No IP conflicts
  • Storage working correctly
  • Performance acceptable
  • Monitoring in place

🔧 Quick Fix Commands

Enable Storage on r630-01

ssh root@192.168.11.11
pvesm set local-lvm --disable 0
pvesm set thin1 --disable 0
pvesm status

Enable Storage on r630-02

ssh root@192.168.11.12
for storage in thin1 thin2 thin3 thin4 thin5 thin6; do
    pvesm set "$storage" --disable 0
done
pvesm status

Verify Cluster

# On any node
pvecm status
pvecm nodes

List All VMs

# On each host
pct list
qm list

📊 Resource Summary

Host CPU Memory (reported) Guests (2026-05-09) Notes
ml110 6 ~63 GiB · ~3% used 0 Quorum / idle
r630-01 32 ~125 GiB · ~66% used 57 Highest count
r630-02 56 ~125 GiB · ~62% used 41 Thin pools active
r630-03 32 ~125 GiB · ~62% used 19
r630-04 32 ~125 GiB · ~17% used 19

Totals: 158 logical cores in cluster; ~562 GiB reported maxmem sum (API); 136 running guests. Storage is node-local thin — see pvesm status per host.


🎯 Priority Actions

Historical (migration era)

The checklist below applied when r630 storage names and hostname references were being repaired. For current capacity actions, use PROXMOX_CLUSTER_ARCHITECTURE.md and live pvesm status.

🔴 CRITICAL (historical)

  1. Enable storage on r630-01 — validate pools if UI shows disabled stubs
  2. Enable storage on r630-02thin pools are active on r630-02 (verify utilization)

⚠️ HIGH PRIORITY

  1. Continue distributing new CTs across r630-01r630-04 using thin pool headroom
  2. Reconcile any disabled storage rows that reference foreign node names (cosmetic in pvesm)
  1. Monitor thin pool % on busy nodes (r630-01, r630-02)
  2. Implement / maintain Prometheus/Grafana per operational stack
  3. Keep ALL_VMIDS_ENDPOINTS.md updated when adding VMIDs

Last Updated: 2025-01-20
Status: Review Complete - Storage Configuration Needed