Some checks failed
Test / test (push) Has been cancelled
Co-authored-by: Cursor <cursoragent@cursor.com>
378 lines
10 KiB
Markdown
378 lines
10 KiB
Markdown
# Bring-Up Checklist
|
||
|
||
## Day-One Installation Guide
|
||
|
||
This checklist provides a step-by-step guide for bringing up the complete Azure Stack HCI environment on installation day.
|
||
|
||
## Pre-Installation Preparation
|
||
|
||
### Hardware Verification
|
||
|
||
- [ ] Router server chassis received and inspected
|
||
- [ ] All PCIe cards received (NICs, HBAs, QAT)
|
||
- [ ] Memory modules received (8× 4GB DDR4 ECC RDIMM)
|
||
- [ ] Storage SSD received (256GB)
|
||
- [ ] All cables received (Ethernet, Mini-SAS HD)
|
||
- [ ] Storage shelves received and inspected
|
||
- [ ] Proxmox hosts (ML110, R630) verified operational
|
||
|
||
### Documentation Review
|
||
|
||
- [ ] Complete architecture reviewed
|
||
- [ ] PCIe slot allocation map reviewed
|
||
- [ ] Network topology and VLAN schema reviewed
|
||
- [ ] Driver matrix reviewed
|
||
- [ ] All configuration files prepared
|
||
|
||
### Environment Configuration
|
||
|
||
- [ ] Copy `.env.example` to `.env`
|
||
- [ ] Configure Azure credentials in `.env`:
|
||
- [ ] `AZURE_SUBSCRIPTION_ID`
|
||
- [ ] `AZURE_TENANT_ID`
|
||
- [ ] `AZURE_RESOURCE_GROUP`
|
||
- [ ] `AZURE_LOCATION`
|
||
- [ ] Configure Cloudflare credentials in `.env`:
|
||
- [ ] `CLOUDFLARE_API_TOKEN`
|
||
- [ ] `CLOUDFLARE_ACCOUNT_EMAIL`
|
||
- [ ] Configure Proxmox credentials in `.env`:
|
||
- [ ] `PVE_ROOT_PASS` (shared root password for all instances)
|
||
- [ ] `PROXMOX_ML110_URL`
|
||
- [ ] `PROXMOX_R630_URL`
|
||
- [ ] Note: Username `root@pam` is implied and should not be stored
|
||
- [ ] For production: Create RBAC accounts and use API tokens instead of root
|
||
- [ ] Verify `.env` file is in `.gitignore` (should not be committed)
|
||
|
||
## Phase 1: Hardware Installation
|
||
|
||
### Router Server Assembly
|
||
|
||
- [ ] Install CPU and memory (8× 4GB DDR4 ECC RDIMM)
|
||
- [ ] Install boot SSD (256GB)
|
||
- [ ] Install Intel QAT 8970 in x16_1 slot
|
||
- [ ] Install Intel X550-T2 in x8_1 slot
|
||
- [ ] Install LSI 9207-8e #1 in x8_2 slot
|
||
- [ ] Install LSI 9207-8e #2 in x8_3 slot
|
||
- [ ] Install Intel i350-T4 in x4_1 slot
|
||
- [ ] Install Intel i350-T8 in x4_2 slot
|
||
- [ ] Install Intel i225 Quad-Port in x4_3 slot
|
||
- [ ] Verify all cards seated properly
|
||
- [ ] Connect power and verify POST
|
||
|
||
### BIOS/UEFI Configuration
|
||
|
||
- [ ] Enter BIOS/UEFI setup
|
||
- [ ] Verify all PCIe cards detected
|
||
- [ ] Configure boot order (SSD first)
|
||
- [ ] Enable virtualization (Intel VT-x, VT-d)
|
||
- [ ] Configure memory settings (ECC enabled)
|
||
- [ ] Set date/time
|
||
- [ ] Save and exit BIOS
|
||
|
||
### Storage Shelf Cabling
|
||
|
||
- [ ] Connect SFF-8644 cables from LSI HBA #1 to shelves 1-2
|
||
- [ ] Connect SFF-8644 cables from LSI HBA #2 to shelves 3-4
|
||
- [ ] Power on storage shelves
|
||
- [ ] Verify shelf power and status LEDs
|
||
- [ ] Label all cables
|
||
|
||
### Network Cabling
|
||
|
||
- [ ] Connect 4× Cat6 cables from i350-T4 to Spectrum modems/ONTs (WAN1-4)
|
||
- [ ] Connect 2× Cat6a cables to X550-T2 (reserved for future)
|
||
- [ ] Connect 4× Cat6 cables from i225 Quad to ML110, R630, and key services
|
||
- [ ] Connect 8× Cat6 cables from i350-T8 to remaining servers/appliances
|
||
- [ ] Label all cables at both ends
|
||
- [ ] Document cable mapping
|
||
|
||
## Phase 2: Operating System Installation
|
||
|
||
### Router Server OS
|
||
|
||
**Option A: Windows Server Core**
|
||
|
||
- [ ] Boot from Windows Server installation media
|
||
- [ ] Install Windows Server Core
|
||
- [ ] Configure initial administrator password
|
||
- [ ] Install Windows Updates
|
||
- [ ] Configure static IP on management interface
|
||
- [ ] Enable Remote Desktop (if needed)
|
||
- [ ] Install Windows Admin Center
|
||
|
||
**Option B: Proxmox VE**
|
||
|
||
- [ ] Boot from Proxmox VE installation media
|
||
- [ ] Install Proxmox VE
|
||
- [ ] Configure initial root password
|
||
- [ ] Configure network (management interface)
|
||
- [ ] Update Proxmox packages
|
||
- [ ] Verify Proxmox web interface accessible
|
||
|
||
### Proxmox Hosts (ML110, R630)
|
||
|
||
- [ ] Verify Proxmox VE installed and updated
|
||
- [ ] Configure network interfaces
|
||
- [ ] Verify cluster status (if clustered)
|
||
- [ ] Test VM creation
|
||
|
||
## Phase 3: Driver Installation
|
||
|
||
### Router Server Drivers
|
||
|
||
- [ ] Install Intel PROSet drivers for all NICs
|
||
- [ ] i350-T4 (WAN)
|
||
- [ ] i350-T8 (LAN 1GbE)
|
||
- [ ] X550-T2 (10GbE)
|
||
- [ ] i225 Quad-Port (LAN 2.5GbE)
|
||
- [ ] Verify all NICs detected and functional
|
||
- [ ] Install LSI mpt3sas driver
|
||
- [ ] Flash LSI HBAs to IT mode
|
||
- [ ] Verify storage shelves detected
|
||
- [ ] Install Intel QAT drivers (qatlib)
|
||
- [ ] Install OpenSSL QAT engine
|
||
- [ ] Verify QAT acceleration working
|
||
|
||
### Driver Verification
|
||
|
||
- [ ] Run driver verification script
|
||
- [ ] Test all network ports
|
||
- [ ] Test storage connectivity
|
||
- [ ] Test QAT acceleration
|
||
- [ ] Document any issues
|
||
|
||
## Phase 4: Network Configuration
|
||
|
||
### OpenWrt VM Setup
|
||
|
||
- [ ] Create OpenWrt VM on Router server
|
||
- [ ] Configure OpenWrt network interfaces
|
||
- [ ] Configure VLANs (10, 20, 30, 40, 50, 60, 99)
|
||
- [ ] Configure mwan3 for 4× Spectrum WAN
|
||
- [ ] Configure firewall zones
|
||
- [ ] Test multi-WAN failover
|
||
- [ ] Configure inter-VLAN routing
|
||
|
||
### Proxmox VLAN Configuration
|
||
|
||
- [ ] Configure VLAN bridges on ML110
|
||
- [ ] Configure VLAN bridges on R630
|
||
- [ ] Test VLAN connectivity
|
||
- [ ] Verify VM network isolation
|
||
|
||
### IP Address Configuration
|
||
|
||
- [ ] Configure IP addresses per VLAN schema
|
||
- [ ] Configure DNS settings
|
||
- [ ] Test network connectivity
|
||
- [ ] Verify routing between VLANs
|
||
|
||
## Phase 5: Storage Configuration
|
||
|
||
### Storage Spaces Direct Setup
|
||
|
||
- [ ] Verify all shelves detected
|
||
- [ ] Create Storage Spaces Direct pools
|
||
- [ ] Create volumes for VMs
|
||
- [ ] Create volumes for applications
|
||
- [ ] Configure storage exports (NFS/iSCSI)
|
||
|
||
### Proxmox Storage Mounts
|
||
|
||
- [ ] Configure NFS mounts on ML110
|
||
- [ ] Configure NFS mounts on R630
|
||
- [ ] Test storage connectivity
|
||
- [ ] Verify VM storage access
|
||
|
||
## Phase 6: Azure Arc Onboarding
|
||
|
||
### Arc Agent Installation
|
||
|
||
- [ ] Install Azure Arc agent on Router server (if Linux)
|
||
- [ ] Install Azure Arc agent on ML110
|
||
- [ ] Install Azure Arc agent on R630
|
||
- [ ] Install Azure Arc agent on Windows management VM (if applicable)
|
||
|
||
### Arc Onboarding
|
||
|
||
- [ ] Load environment variables from `.env`: `export $(cat .env | grep -v '^#' | xargs)`
|
||
- [ ] Configure Azure subscription and resource group (from `.env`)
|
||
- [ ] Onboard Router server to Azure Arc
|
||
- [ ] Onboard ML110 to Azure Arc
|
||
- [ ] Onboard R630 to Azure Arc
|
||
- [ ] Verify all resources visible in Azure Portal
|
||
|
||
### Arc Governance
|
||
|
||
- [ ] Configure Azure Policy
|
||
- [ ] Enable Azure Monitor
|
||
- [ ] Enable Azure Defender
|
||
- [ ] Configure Update Management
|
||
- [ ] Test policy enforcement
|
||
|
||
## Phase 7: Cloudflare Integration
|
||
|
||
### Cloudflare Tunnel Setup
|
||
|
||
- [ ] Create Cloudflare account (if not exists)
|
||
- [ ] Create Zero Trust organization
|
||
- [ ] Configure Cloudflare API token in `.env` file
|
||
- [ ] Install cloudflared on Ubuntu VM
|
||
- [ ] Authenticate cloudflared (interactive or using API token from `.env`)
|
||
- [ ] Configure Tunnel for WAC
|
||
- [ ] Configure Tunnel for Proxmox UI
|
||
- [ ] Configure Tunnel for dashboards
|
||
- [ ] Configure Tunnel for Git/CI services
|
||
|
||
### Zero Trust Policies
|
||
|
||
- [ ] Configure SSO (Azure AD/Okta)
|
||
- [ ] Configure MFA requirements
|
||
- [ ] Configure device posture checks
|
||
- [ ] Configure access policies
|
||
- [ ] Test external access
|
||
|
||
### WAF Configuration
|
||
|
||
- [ ] Configure WAF rules
|
||
- [ ] Test WAF protection
|
||
- [ ] Verify no inbound ports required
|
||
|
||
## Phase 8: Service VM Deployment
|
||
|
||
### Ubuntu VM Templates
|
||
|
||
- [ ] Create Ubuntu LTS template on Proxmox
|
||
- [ ] Install Azure Arc agent in template
|
||
- [ ] Configure base packages
|
||
- [ ] Create VM snapshots
|
||
|
||
### Service VM Deployment
|
||
|
||
- [ ] Deploy Cloudflare Tunnel VM (VLAN 99)
|
||
- [ ] Deploy Reverse Proxy VM (VLAN 30/99)
|
||
- [ ] Deploy Observability VM (VLAN 40)
|
||
- [ ] Deploy CI/CD VM (VLAN 50)
|
||
- [ ] Install Azure Arc agents on all VMs
|
||
|
||
### Service Configuration
|
||
|
||
- [ ] Configure Cloudflare Tunnel
|
||
- [ ] Configure reverse proxy (NGINX/Traefik)
|
||
- [ ] Configure observability stack (Prometheus/Grafana)
|
||
- [ ] Configure CI/CD (GitLab Runner/Jenkins)
|
||
|
||
## Phase 9: Verification and Testing
|
||
|
||
### Network Testing
|
||
|
||
- [ ] Test all WAN connections
|
||
- [ ] Test multi-WAN failover
|
||
- [ ] Test VLAN isolation
|
||
- [ ] Test inter-VLAN routing
|
||
- [ ] Test firewall rules
|
||
|
||
### Storage Testing
|
||
|
||
- [ ] Test storage read/write performance
|
||
- [ ] Test storage redundancy
|
||
- [ ] Test VM storage access
|
||
- [ ] Test storage exports
|
||
|
||
### Service Testing
|
||
|
||
- [ ] Test Cloudflare Tunnel access
|
||
- [ ] Test Azure Arc connectivity
|
||
- [ ] Test observability dashboards
|
||
- [ ] Test CI/CD pipelines
|
||
|
||
### Performance Testing
|
||
|
||
- [ ] Test QAT acceleration
|
||
- [ ] Test network throughput
|
||
- [ ] Test storage I/O
|
||
- [ ] Document performance metrics
|
||
|
||
## Phase 10: Documentation and Handoff
|
||
|
||
### Documentation
|
||
|
||
- [ ] Document all IP addresses
|
||
- [ ] Verify `.env` file contains all credentials (stored securely, not in version control)
|
||
- [ ] Document cable mappings
|
||
- [ ] Document VLAN configurations
|
||
- [ ] Document storage allocations
|
||
- [ ] Create network diagrams
|
||
- [ ] Create runbooks
|
||
- [ ] Verify `.env` is in `.gitignore` and not committed to repository
|
||
|
||
### Monitoring Setup
|
||
|
||
- [ ] Configure Grafana dashboards
|
||
- [ ] Configure Prometheus alerts
|
||
- [ ] Configure Azure Monitor alerts
|
||
- [ ] Test alerting
|
||
|
||
### Security Hardening
|
||
|
||
- [ ] Review firewall rules
|
||
- [ ] Review access policies
|
||
- [ ] Create RBAC accounts for Proxmox (replace root usage)
|
||
- [ ] Create service accounts for automation
|
||
- [ ] Create operator accounts with appropriate roles
|
||
- [ ] Generate API tokens for service accounts
|
||
- [ ] Document RBAC account usage (see docs/security/proxmox-rbac.md)
|
||
- [ ] Review secret management
|
||
- [ ] Perform security scan
|
||
|
||
## Post-Installation Tasks
|
||
|
||
### Ongoing Maintenance
|
||
|
||
- [ ] Schedule regular backups
|
||
- [ ] Schedule firmware updates
|
||
- [ ] Schedule driver updates
|
||
- [ ] Schedule OS updates
|
||
- [ ] Schedule security patches
|
||
|
||
### Monitoring
|
||
|
||
- [ ] Review monitoring dashboards daily
|
||
- [ ] Review Azure Arc status
|
||
- [ ] Review Cloudflare Tunnel status
|
||
- [ ] Review storage health
|
||
- [ ] Review network performance
|
||
|
||
## Troubleshooting Reference
|
||
|
||
### Common Issues
|
||
|
||
**Issue:** NIC not detected
|
||
- Check PCIe slot connection
|
||
- Check BIOS settings
|
||
- Update driver
|
||
|
||
**Issue:** Storage shelves not detected
|
||
- Check cable connections
|
||
- Check HBA firmware
|
||
- Check shelf power
|
||
|
||
**Issue:** Azure Arc not connecting
|
||
- Check network connectivity
|
||
- Check proxy settings
|
||
- Check Azure credentials
|
||
|
||
**Issue:** Cloudflare Tunnel not working
|
||
- Check cloudflared service
|
||
- Check Tunnel configuration
|
||
- Check Zero Trust policies
|
||
|
||
## Related Documentation
|
||
|
||
- [Complete Architecture](complete-architecture.md) - Full architecture overview
|
||
- [Hardware BOM](hardware-bom.md) - Complete bill of materials
|
||
- [PCIe Allocation](pcie-allocation.md) - Slot allocation map
|
||
- [Network Topology](network-topology.md) - VLAN/IP schema
|
||
- [Driver Matrix](driver-matrix.md) - Driver versions
|
||
|