Files
loc_az_hci/docs/architecture/network-topology.md
defiQUG c39465c2bd
Some checks failed
Test / test (push) Has been cancelled
Initial commit: loc_az_hci (smom-dbis-138 excluded via .gitignore)
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-08 09:04:46 -08:00

577 lines
19 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Network Topology
## Overview
This document describes the network architecture and topology for the Proxmox Azure Arc Hybrid Cloud Stack.
## Network Architecture
```
┌─────────────────────────────────────────────────────────────────┐
│ Internet / Azure Cloud │
└─────────────────────────────────────────────────────────────────┘
│ VPN / Internet
┌─────────────────────────────────────────────────────────────────┐
│ On-Premises Network │
│ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Management Network (192.168.1.0/24) │ │
│ │ │ │
│ │ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │ PVE Node 1 │ │ PVE Node 2 │ │ │
│ │ │ 192.168.1.10 │ │ 192.168.1.11 │ │ │
│ │ │ vmbr0 │ │ vmbr0 │ │ │
│ │ └──────┬───────┘ └──────┬───────┘ │ │
│ │ │ │ │ │
│ │ └──────────┬───────────────────┘ │ │
│ │ │ │ │
│ │ ┌─────▼─────┐ │ │
│ │ │ Switch │ │ │
│ │ │ / Router │ │ │
│ │ └───────────┘ │ │
│ │ │ │ │
│ │ ┌───────────┼───────────┐ │ │
│ │ │ │ │ │ │
│ │ ┌──────▼───┐ ┌─────▼────┐ ┌───▼────┐ │ │
│ │ │ K3s VM │ │ Git VM │ │ Other │ │ │
│ │ │ .1.50 │ │ .1.60 │ │ VMs │ │ │
│ │ └──────────┘ └──────────┘ └────────┘ │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Storage Network (Optional - 10.0.0.0/24) │ │
│ │ │ │
│ │ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │ PVE Node 1 │ │ PVE Node 2 │ │ │
│ │ │ vmbr1 │ │ vmbr1 │ │ │
│ │ │ 10.0.0.10 │ │ 10.0.0.11 │ │ │
│ │ └──────┬───────┘ └──────┬───────┘ │ │
│ │ │ │ │ │
│ │ └──────────┬───────────────────┘ │ │
│ │ │ │ │
│ │ ┌─────▼─────┐ │ │
│ │ │ NFS │ │ │
│ │ │ Server │ │ │
│ │ │ 10.0.0.100│ │ │
│ │ └───────────┘ │ │
│ └──────────────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Kubernetes Pod Network (10.244.0.0/16) │ │
│ │ │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │ Besu Pod │ │ Firefly Pod │ │ Chainlink │ │ │
│ │ │ 10.244.1.10 │ │ 10.244.1.20 │ │ 10.244.1.30 │ │ │
│ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │
│ │ │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │ Blockscout │ │ Cacti │ │ NGINX │ │ │
│ │ │ 10.244.1.40 │ │ 10.244.1.50 │ │ 10.244.1.60 │ │ │
│ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │
│ └──────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
```
## Network Segments
### 1. Management Network (192.168.1.0/24)
**Purpose**: Primary network for Proxmox nodes, VMs, and management traffic
**Components**:
- Proxmox Node 1: `192.168.1.10`
- Proxmox Node 2: `192.168.1.11`
- K3s VM: `192.168.1.188`
- Git Server (Gitea/GitLab): `192.168.1.60`
- Gateway: `192.168.1.1`
- DNS: `192.168.1.1` (or your DNS server)
**Traffic**:
- Proxmox web UI access
- SSH access to nodes and VMs
- Azure Arc agent communication
- Cluster communication (Corosync)
- VM management
**Firewall Rules**:
- Allow: SSH (22), HTTPS (443), Proxmox API (8006)
- Allow: Azure Arc agent ports (outbound)
- Allow: Cluster communication (5404-5412 UDP)
### 2. Storage Network (10.0.0.0/24) - Optional
**Purpose**: Dedicated network for storage traffic (NFS, iSCSI)
**Components**:
- Proxmox Node 1: `10.0.0.10`
- Proxmox Node 2: `10.0.0.11`
- NFS Server: `10.0.0.100`
**Traffic**:
- NFS storage access
- VM disk I/O
- Cluster storage replication
**Benefits**:
- Isolates storage traffic from management
- Reduces network congestion
- Better performance for storage operations
### 3. Kubernetes Pod Network (10.244.0.0/16)
**Purpose**: Internal Kubernetes pod networking (managed by Flannel/CNI)
**Components**:
- Pod IPs assigned automatically
- Service IPs: `10.43.0.0/16` (K3s default)
- Cluster DNS: `10.43.0.10`
**Traffic**:
- Inter-pod communication
- Service discovery
- Ingress traffic routing
## Network Configuration
### Proxmox Bridge Configuration
**vmbr0 (Management)**:
```bash
auto vmbr0
iface vmbr0 inet static
address 192.168.1.10/24
gateway 192.168.1.1
bridge-ports eth0
bridge-stp off
bridge-fd 0
```
**vmbr1 (Storage - Optional)**:
```bash
auto vmbr1
iface vmbr1 inet static
address 10.0.0.10/24
bridge-ports eth1
bridge-stp off
bridge-fd 0
```
### Kubernetes Network
**K3s Default Configuration**:
- CNI: Flannel
- Pod CIDR: `10.42.0.0/16`
- Service CIDR: `10.43.0.0/16`
- Cluster DNS: `10.43.0.10`
**Custom Configuration** (if needed):
```yaml
# /etc/rancher/k3s/config.yaml
cluster-cidr: "10.244.0.0/16"
service-cidr: "10.245.0.0/16"
cluster-dns: "10.245.0.10"
```
## Port Requirements
### Proxmox Nodes
- **8006**: Proxmox web UI (HTTPS)
- **22**: SSH
- **5404-5412**: Corosync cluster communication (UDP)
- **3128**: SPICE proxy (optional)
### Azure Arc Agents
- **Outbound HTTPS (443)**: Azure Arc connectivity
- **Outbound TCP 443**: Azure Monitor, Azure Policy
### Kubernetes (K3s)
- **6443**: Kubernetes API server
- **10250**: Kubelet API
- **8472**: Flannel VXLAN (UDP)
- **51820-51821**: Flannel WireGuard (UDP)
### Application Services
- **8545**: Besu RPC (HTTP)
- **8546**: Besu RPC (WebSocket)
- **30303**: Besu P2P
- **5000**: Firefly API
- **6688**: Chainlink API
- **4000**: Blockscout
- **80/443**: NGINX Proxy
- **80**: Cacti
### Git Servers
- **3000**: Gitea web UI
- **2222**: Gitea SSH
- **8080**: GitLab web UI
- **2222**: GitLab SSH
## Network Security
### Firewall Recommendations
**Proxmox Nodes**:
```bash
# Allow cluster communication
ufw allow 5404:5412/udp
# Allow Proxmox API
ufw allow 8006/tcp
# Allow SSH
ufw allow 22/tcp
```
**Kubernetes Nodes**:
```bash
# Allow Kubernetes API
ufw allow 6443/tcp
# Allow Flannel networking
ufw allow 8472/udp
ufw allow 51820:51821/udp
```
### Network Policies (Kubernetes)
Example network policy to restrict traffic:
```yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: blockchain-network-policy
namespace: blockchain
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector:
matchLabels:
name: hc-stack
egress:
- to:
- namespaceSelector:
matchLabels:
name: blockchain
```
## DNS Configuration
### Internal DNS
**Hosts File** (for local resolution):
```
192.168.1.188 k3s.local
192.168.1.60 git.local gitea.local
192.168.1.10 pve-node-1.local
192.168.1.11 pve-node-2.local
```
### Service Discovery
**Kubernetes DNS**:
- Service names resolve to cluster IPs
- Format: `<service-name>.<namespace>.svc.cluster.local`
- Example: `besu.blockchain.svc.cluster.local`
## Load Balancing
### NGINX Ingress Controller
- **Type**: LoadBalancer or NodePort
- **Ports**: 80 (HTTP), 443 (HTTPS)
- **Backend Services**: All application services
### Proxmox Load Balancing
- Use Proxmox HA groups for VM-level load balancing
- Configure multiple VMs behind a load balancer
## Network Monitoring
### Tools
- **Cacti**: Network traffic monitoring
- **Azure Monitor**: Network metrics via Azure Arc
- **Kubernetes Metrics**: Pod and service network stats
### Key Metrics
- Bandwidth utilization
- Latency between nodes
- Packet loss
- Connection counts
---
## Azure Stack HCI VLAN Schema
### Overview
The Azure Stack HCI environment uses a comprehensive VLAN-based network segmentation strategy for security, isolation, and scalability.
### VLAN Definitions
#### VLAN 10 - Core Storage (10.10.10.0/24)
**Purpose:** Storage network for shelves, NAS services, and backup
**Components:**
- Storage shelves: 10.10.10.1-10.10.10.9
- NAS services: 10.10.10.10
- Backup services: 10.10.10.20
- Router server storage interface: 10.10.10.1
**Traffic:**
- Storage I/O (NFS, SMB, iSCSI)
- Backup operations
- Storage replication
**Firewall Rules:**
- Default: Allow storage protocols
- Restrict: No internet access
- Allow: Compute nodes → Storage
#### VLAN 20 - Compute (10.10.20.0/24)
**Purpose:** Hypervisor traffic, Proxmox migrations, VM management
**Components:**
- Proxmox Node 1 (ML110): 10.10.20.10
- Proxmox Node 2 (R630): 10.10.20.20
- Router server compute interface: 10.10.20.1
- Future compute nodes: 10.10.20.30+
**Traffic:**
- Proxmox cluster communication
- VM migrations
- Hypervisor management
- Storage access (to VLAN 10)
**Firewall Rules:**
- Default: Allow cluster communication
- Allow: Proxmox API (8006)
- Allow: Corosync (5404-5412 UDP)
- Allow: Storage access (VLAN 10)
#### VLAN 30 - App Tier (10.10.30.0/24)
**Purpose:** Web/API services, internal applications
**Components:**
- Web services: 10.10.30.10-10.10.30.30
- API services: 10.10.30.40-10.10.30.50
- Reverse proxy: 10.10.30.10
- Router server app interface: 10.10.30.1
**Traffic:**
- HTTP/HTTPS traffic
- API requests
- Application-to-application communication
**Firewall Rules:**
- Default: Allow HTTP/HTTPS
- Allow: Reverse proxy → Apps
- Allow: Monitoring access (VLAN 40)
#### VLAN 40 - Observability (10.10.40.0/24)
**Purpose:** Monitoring, logging, metrics collection
**Components:**
- Prometheus: 10.10.40.10
- Grafana: 10.10.40.20
- Loki/OpenSearch: 10.10.40.30
- Router server monitoring interface: 10.10.40.1
**Traffic:**
- Metrics collection
- Log aggregation
- Dashboard access
- Alert notifications
**Firewall Rules:**
- Default: Allow monitoring protocols
- Allow: Prometheus scraping
- Allow: Grafana access (from management VLAN)
- Allow: Log collection
#### VLAN 50 - Dev/Test (10.10.50.0/24)
**Purpose:** Lab workloads, development, testing
**Components:**
- Dev VMs: 10.10.50.10-10.10.50.30
- Test VMs: 10.10.50.40-10.10.50.60
- CI/CD services: 10.10.50.70
- Router server dev interface: 10.10.50.1
**Traffic:**
- Development traffic
- Testing operations
- CI/CD pipelines
- Git operations
**Firewall Rules:**
- Default: Restrict to dev/test only
- Allow: Git access
- Allow: CI/CD operations
- Block: Production network access
#### VLAN 60 - Management (10.10.60.0/24)
**Purpose:** WAC, Azure Arc, SSH, hypervisor management
**Components:**
- Router server management: 10.10.60.1
- Jump host: 10.10.60.10
- Windows Admin Center: 10.10.60.20
- Azure Arc agents: 10.10.60.30+
- Router server mgmt interface: 10.10.60.1
**Traffic:**
- Management protocols (SSH, RDP, WAC)
- Azure Arc agent communication
- Administrative access
- System updates
**Firewall Rules:**
- Default: Restrict access
- Allow: SSH (22) from trusted sources
- Allow: WAC (443) from trusted sources
- Allow: Azure Arc outbound (443)
- Block: Inbound from internet
#### VLAN 99 - Utility/DMZ (10.10.99.0/24)
**Purpose:** Proxies, bastions, Cloudflare tunnel hosts
**Components:**
- Cloudflare Tunnel VM: 10.10.99.10
- Reverse proxy: 10.10.99.20
- Bastion host: 10.10.99.30
- Router server DMZ interface: 10.10.99.1
**Traffic:**
- Cloudflare Tunnel outbound (443)
- Reverse proxy traffic
- External access (via Cloudflare)
- DMZ services
**Firewall Rules:**
- Default: Restrict to DMZ only
- Allow: Cloudflare Tunnel outbound (443)
- Allow: Reverse proxy → Internal services
- Block: Direct internet access (except Cloudflare)
### Physical Port Mapping (Router Server)
#### WAN Ports (i350-T4)
- **WAN1:** Spectrum modem/ONT #1 → VLAN untagged
- **WAN2:** Spectrum modem/ONT #2 → VLAN untagged
- **WAN3:** Spectrum modem/ONT #3 → VLAN untagged
- **WAN4:** Spectrum modem/ONT #4 → VLAN untagged
#### 10GbE Ports (X550-T2)
- **10GbE-1:** Reserved for future 10GbE switch or direct server link
- **10GbE-2:** Reserved for future 10GbE switch or direct server link
#### 2.5GbE LAN Ports (i225 Quad-Port)
- **LAN2.5-1:** Direct to HPE ML110 Gen9 → VLAN 20 (compute)
- **LAN2.5-2:** Direct to Dell R630 → VLAN 20 (compute)
- **LAN2.5-3:** Key service #1 → VLAN 30 (app tier)
- **LAN2.5-4:** Key service #2 → VLAN 30 (app tier)
#### 1GbE LAN Ports (i350-T8)
- **LAN1G-1:** Server/appliance #1 → Appropriate VLAN
- **LAN1G-2:** Server/appliance #2 → Appropriate VLAN
- **LAN1G-3:** Server/appliance #3 → Appropriate VLAN
- **LAN1G-4:** Server/appliance #4 → Appropriate VLAN
- **LAN1G-5:** Server/appliance #5 → Appropriate VLAN
- **LAN1G-6:** Server/appliance #6 → Appropriate VLAN
- **LAN1G-7:** Server/appliance #7 → Appropriate VLAN
- **LAN1G-8:** Server/appliance #8 → Appropriate VLAN
### IP Address Allocation Examples
```
VLAN 10 (Storage): 10.10.10.0/24
- Router: 10.10.10.1
- NAS: 10.10.10.10
- Backup: 10.10.10.20
VLAN 20 (Compute): 10.10.20.0/24
- Router: 10.10.20.1
- ML110: 10.10.20.10
- R630: 10.10.20.20
VLAN 30 (App Tier): 10.10.30.0/24
- Router: 10.10.30.1
- Reverse Proxy: 10.10.30.10
- Apps: 10.10.30.20-50
VLAN 40 (Observability): 10.10.40.0/24
- Router: 10.10.40.1
- Prometheus: 10.10.40.10
- Grafana: 10.10.40.20
- Loki: 10.10.40.30
VLAN 50 (Dev/Test): 10.10.50.0/24
- Router: 10.10.50.1
- Dev VMs: 10.10.50.10-30
- Test VMs: 10.10.50.40-60
- CI/CD: 10.10.50.70
VLAN 60 (Management): 10.10.60.0/24
- Router: 10.10.60.1
- Jump Host: 10.10.60.10
- WAC: 10.10.60.20
- Arc Agents: 10.10.60.30+
VLAN 99 (DMZ): 10.10.99.0/24
- Router: 10.10.99.1
- Cloudflare Tunnel: 10.10.99.10
- Reverse Proxy: 10.10.99.20
- Bastion: 10.10.99.30
```
### Inter-VLAN Routing
**Default Policy:** Deny all inter-VLAN traffic
**Allowed Routes:**
- Management (60) → All VLANs (administrative access)
- Compute (20) → Storage (10) (storage access)
- App Tier (30) → Storage (10) (application storage)
- Observability (40) → All VLANs (monitoring access)
- DMZ (99) → App Tier (30), Management (60) (reverse proxy access)
**Firewall Rules:**
- Explicit allow rules for required traffic
- Default deny for all other inter-VLAN traffic
- Log all denied traffic for security monitoring
### Multi-WAN Configuration
**WAN Interfaces:**
- 4× Spectrum 1Gbps connections via i350-T4
- Each WAN on separate interface (WAN1-4)
**Load Balancing:**
- mwan3 for multi-WAN load balancing
- Per-ISP health checks
- Automatic failover
**Policy Routing:**
- Route specific traffic over specific WANs
- Balance traffic across all WANs
- Failover to remaining WANs if one fails