Files
smom-dbis-138/docs/deployment/DEPLOYMENT_CONFIGURATION_AUDIT.md

466 lines
11 KiB
Markdown

# Deployment Configuration Audit Report
## Executive Summary
This document provides a comprehensive audit of the deployment configuration for the DeFi Oracle Meta Mainnet (ChainID 138), identifying misconfigurations, gaps, and recommendations.
**Audit Date**: 2026-04-13
**Chain ID**: 138
**Status**: Historical audit snapshot; revalidate against current configs before acting
---
## Critical Issues
### 1. ❌ Genesis File Missing Validators
**Issue**: The `config/genesis.json` file has `extraData: "0x"` which means no validators are configured in the genesis block.
**Impact**:
- QBFT requires validators to be specified in the genesis `extraData` field
- Network cannot start without validators
- Blocks cannot be produced
**Location**: `config/genesis.json` line 35
**Current State**:
```json
"extraData": "0x"
```
**Required**: Validator addresses must be encoded in `extraData` using QBFT format:
```
extraData = RLP([32 bytes Vanity, [][20 bytes]Validators, 65 bytes Signature])
```
**Fix**: Run `./scripts/generate-genesis-proper.sh 4` to regenerate genesis with validator addresses from `keys/validators/`.
---
### 2. ❌ Terraform Node Counts Disabled
**Issue**: In `terraform/terraform.tfvars`, sentries and RPC nodes are set to 0:
```hcl
node_count = {
system = 1
validators = 1
sentries = 0 # ❌ Disabled
rpc = 0 # ❌ Disabled
}
```
**Impact**:
- No RPC endpoints will be available (explains why RPCs are not live)
- No sentry nodes for P2P connectivity
- Network cannot be accessed externally
- Contracts cannot be deployed
**Fix**: Update `terraform/terraform.tfvars`:
```hcl
node_count = {
system = 3
validators = 4
sentries = 3
rpc = 3
}
```
**Note**: Current configuration shows quota constraints (4 vCPUs remaining). Consider:
1. Requesting quota increase
2. Using smaller VM sizes
3. Staged deployment (deploy validators first, then sentries/RPC)
---
### 3. ⚠️ Kubernetes Version Mismatch
**Issue**: `terraform/terraform.tfvars` specifies `kubernetes_version = "1.33"` which is likely invalid.
**Impact**:
- Terraform may fail during AKS cluster creation
- AKS may not support version 1.33
**Current Supported Versions**: AKS typically supports versions up to 1.28-1.30 range.
**Fix**: Update to a supported version:
```hcl
kubernetes_version = "1.28" # or latest supported
```
**Verification**: Check supported versions:
```bash
az aks get-versions --location westeurope --output table
```
---
## Configuration Gaps
### 4. ⚠️ Missing Validator Addresses in Genesis
**Issue**: Genesis file doesn't include validator addresses in `extraData`.
**Required**: Validator public keys must be extracted from `keys/validators/` and encoded in genesis `extraData`.
**Fix**: Ensure `scripts/generate-genesis-proper.sh`:
1. Reads validator public keys from `keys/validators/*/key.pub`
2. Encodes them in QBFT format
3. Updates `extraData` field
---
### 5. ⚠️ Static Nodes Configuration
**Issue**: `config/static-nodes.json` may be empty or incomplete.
**Impact**: Nodes may not be able to peer with each other.
**Required**: Static nodes should include:
- Validator enode addresses
- Sentry enode addresses
**Fix**: Ensure static-nodes.json is generated with all node enode addresses.
---
### 6. ⚠️ Terraform Backend Not Configured
**Issue**: `terraform/main.tf` has backend configuration but it's commented/empty.
**Impact**:
- Terraform state may not be stored properly
- State locking may not work
- Team collaboration issues
**Fix**: Configure Terraform backend:
```hcl
backend "azurerm" {
resource_group_name = "tfstate-rg"
storage_account_name = "tfstate<random>"
container_name = "tfstate"
key = "defi-oracle-mainnet.terraform.tfstate"
}
```
---
### 7. ⚠️ Missing Application Gateway Configuration
**Issue**: Application Gateway configuration may be incomplete for RPC endpoints.
**Required**:
- Backend pool configuration for RPC nodes
- HTTP settings
- Listener configuration
- Routing rules
- WAF rules
**Location**: Check `terraform/modules/networking/` for Application Gateway configuration.
---
### 8. ⚠️ Missing DNS Configuration
**Issue**: DNS records for `rpc.d-bis.org` and `rpc2.d-bis.org` may not be configured.
**Impact**: RPC endpoints won't be accessible via domain names.
**Fix**: After Application Gateway deployment, configure Cloudflare DNS:
```bash
./scripts/deployment/cloudflare-dns.sh --zone-id $CLOUDFLARE_ZONE_ID --api-token $CLOUDFLARE_API_TOKEN --ip <gateway-ip>
```
---
## Consistency Checks
### ✅ Chain ID Consistency
**Status**: ✅ **CONSISTENT**
All configurations use Chain ID 138:
- `config/genesis.json`: ✅ 138
- `helm/besu-network/values.yaml`: ✅ 138
- `config/rpc/besu-config.toml`: ✅ network-id=138
- `config/validators/besu-config.toml`: ✅ network-id=138
- `config/sentries/besu-config.toml`: ✅ network-id=138
- `config/blockscout/config.json`: ✅ 138
---
### ✅ QBFT Configuration
**Status**: ✅ **CONSISTENT**
QBFT parameters are consistent:
- Block period: 2 seconds ✅
- Epoch length: 30,000 blocks ✅
- Request timeout: 10 seconds ✅
**Location**: `config/genesis.json` and all Besu config files.
---
### ⚠️ Resource Configuration Inconsistencies
**Issue**: Resource requests/limits differ between Helm values and Terraform node sizes.
**Helm values-validators.yaml**:
- Requests: cpu: "4", memory: "8Gi"
- Limits: cpu: "8", memory: "16Gi"
**Helm values.yaml (base)**:
- Requests: cpu: "2", memory: "4Gi"
- Limits: cpu: "4", memory: "8Gi"
**Terraform terraform.tfvars**:
- VM Size: `Standard_D4s_v3` (4 vCPUs, 16 GiB RAM)
**Analysis**:
- Helm values-validators.yaml requests 4 CPUs but base values.yaml requests 2 CPUs
- Terraform uses D4s_v3 (4 vCPUs) which matches values-validators.yaml
- Base values.yaml may be overridden by values-validators.yaml (correct)
**Recommendation**: Ensure values-validators.yaml is used when deploying validators.
---
### ⚠️ Storage Configuration
**Status**: ⚠️ **INCONSISTENT**
**Helm values-validators.yaml**: 512Gi
**Helm values-rpc.yaml**: 500Gi
**Helm values.yaml (base)**: 256Gi
**k8s/base/validators/statefulset.yaml**: 512Gi ✅
**k8s/base/rpc/statefulset.yaml**: 256Gi ❌ (should be 500Gi per values-rpc.yaml)
**Fix**: Update `k8s/base/rpc/statefulset.yaml` storage size to match Helm values.
---
## Blockchain Technology Configuration
### Besu Configuration
#### Validators
- ✅ Consensus: QBFT
- ✅ RPC: Disabled (correct for security)
- ✅ P2P: Enabled on port 30303
- ✅ Sync Mode: FULL
- ✅ Network ID: 138
- ✅ Metrics: Enabled on port 9545
#### Sentries
- ✅ Consensus: QBFT (read-only)
- ✅ RPC: Enabled but internal only (127.0.0.1)
- ✅ P2P: Enabled on port 30303
- ✅ Sync Mode: FULL
- ✅ Network ID: 138
- ✅ Metrics: Enabled
#### RPC Nodes
- ✅ Consensus: QBFT (read-only)
- ✅ RPC: Enabled publicly (0.0.0.0)
- ✅ P2P: Disabled (correct)
- ✅ Sync Mode: SNAP (correct for RPC nodes)
- ✅ Network ID: 138
- ✅ CORS: Enabled with wildcard (⚠️ should be restricted in production)
- ✅ Host Allowlist: Wildcard (⚠️ should be restricted in production)
**Security Concern**: RPC nodes have `corsOrigins: ["*"]` and `hostAllowlist: ["*"]`. For production, these should be restricted to specific domains.
---
### Network Architecture
**Tiered Architecture**: ✅ **CORRECTLY CONFIGURED**
1. **Validators** (Private subnets)
- ✅ No public IPs
- ✅ RPC disabled
- ✅ P2P to sentries only
2. **Sentries** (Public subnets)
- ✅ Public P2P
- ✅ Internal RPC only
- ✅ Peer to validators and sentries
3. **RPC Nodes** (DMZ subnet)
- ✅ No P2P
- ✅ Public RPC
- ✅ Behind Application Gateway
---
## Missing Configurations
### 1. Application Gateway Configuration
**Status**: ⚠️ **MISSING OR INCOMPLETE**
**Required**:
- Backend pool with RPC node IPs
- HTTP settings
- Listener on port 443 (HTTPS)
- Routing rules
- WAF policy
- SSL certificate configuration
**Location**: Check `terraform/modules/networking/` for Application Gateway module.
---
### 2. Monitoring Configuration
**Status**: ⚠️ **PARTIALLY CONFIGURED**
**Found**:
- ✅ Prometheus configuration referenced
- ✅ Grafana optional
- ✅ Metrics enabled on Besu nodes
**Missing**:
- ServiceMonitor CRD configuration
- Alert rules
- Alertmanager configuration
---
### 3. Key Management
**Status**: ⚠️ **NEEDS VERIFICATION**
**Found**:
- ✅ Validator keys directory structure
- ✅ Key generation scripts
- ✅ Azure Key Vault module
**Missing Verification**:
- Keys stored in Azure Key Vault
- Kubernetes secrets created from Key Vault
- Key rotation procedures
---
### 4. Backup Configuration
**Status**: ⚠️ **NOT CONFIGURED**
**Missing**:
- Backup storage account configuration
- Backup schedule
- Chaindata backup procedures
- Key backup procedures
---
## Recommendations
### Immediate Actions (Before Deployment)
1. **Fix Genesis File**
```bash
./scripts/generate-genesis-proper.sh 4
```
Verify `extraData` contains validator addresses.
2. **Update Terraform Node Counts**
```hcl
node_count = {
system = 3
validators = 4
sentries = 3
rpc = 3
}
```
3. **Fix Kubernetes Version**
```hcl
kubernetes_version = "1.28" # Check latest supported
```
4. **Verify Validator Keys**
```bash
ls -la keys/validators/*/key.pub
```
Ensure 4 validator public keys exist.
### Pre-Deployment Checklist
- [ ] Genesis file has validators in extraData
- [ ] Terraform node counts are correct
- [ ] Kubernetes version is supported
- [ ] Validator keys are generated
- [ ] Static nodes are configured
- [ ] Terraform backend is configured
- [ ] Application Gateway is configured
- [ ] DNS records are ready
- [ ] Monitoring is configured
- [ ] Backup procedures are defined
### Post-Deployment Verification
- [ ] Validators are producing blocks
- [ ] Sentries are peering correctly
- [ ] RPC endpoints are accessible
- [ ] Application Gateway is routing correctly
- [ ] DNS is resolving
- [ ] Monitoring is collecting metrics
- [ ] Contracts can be deployed
---
## Configuration Files Summary
### ✅ Correctly Configured
- Chain ID: 138 (consistent across all files)
- QBFT parameters (block period, epoch, timeout)
- Network ID: 138 (consistent)
- Besu image: hyperledger/besu:23.10.0
- Resource sizing (mostly consistent)
- Storage classes: managed-premium
- Namespace: besu-network
### ❌ Needs Fixing
- Genesis extraData (missing validators)
- Terraform node counts (sentries=0, rpc=0)
- Kubernetes version (1.33 likely invalid)
- RPC CORS/host allowlist (too permissive)
- Storage size in k8s/rpc/statefulset.yaml (inconsistent)
### ⚠️ Needs Verification
- Terraform backend configuration
- Application Gateway configuration
- DNS configuration
- Key Vault integration
- Monitoring setup
- Backup procedures
---
## Next Steps
1. **Fix Critical Issues** (Genesis, Node Counts, K8s Version)
2. **Regenerate Genesis** with validator addresses
3. **Update Terraform Configuration**
4. **Verify All Configurations**
5. **Deploy Infrastructure**
6. **Deploy Kubernetes Resources**
7. **Deploy Contracts**
8. **Verify End-to-End**
---
## Support
For questions or issues:
- Review configuration files
- Check deployment documentation
- Verify prerequisites
- Run validation scripts