Apply Composer changes: comprehensive API updates, migrations, middleware, and infrastructure improvements
- Add comprehensive database migrations (001-024) for schema evolution - Enhance API schema with expanded type definitions and resolvers - Add new middleware: audit logging, rate limiting, MFA enforcement, security, tenant auth - Implement new services: AI optimization, billing, blockchain, compliance, marketplace - Add adapter layer for cloud integrations (Cloudflare, Kubernetes, Proxmox, storage) - Update Crossplane provider with enhanced VM management capabilities - Add comprehensive test suite for API endpoints and services - Update frontend components with improved GraphQL subscriptions and real-time updates - Enhance security configurations and headers (CSP, CORS, etc.) - Update documentation and configuration files - Add new CI/CD workflows and validation scripts - Implement design system improvements and UI enhancements
This commit is contained in:
240
infrastructure/monitoring/README.md
Normal file
240
infrastructure/monitoring/README.md
Normal file
@@ -0,0 +1,240 @@
|
||||
# Infrastructure Monitoring
|
||||
|
||||
Comprehensive monitoring solutions for all infrastructure components in Sankofa Phoenix.
|
||||
|
||||
## Overview
|
||||
|
||||
This directory contains monitoring components including custom Prometheus exporters, Grafana dashboards, and alerting rules for infrastructure monitoring.
|
||||
|
||||
## Components
|
||||
|
||||
### Exporters (`exporters/`)
|
||||
|
||||
Custom Prometheus exporters for:
|
||||
- Proxmox VE metrics
|
||||
- TP-Link Omada metrics
|
||||
- Network switch/router metrics
|
||||
- Infrastructure health checks
|
||||
|
||||
### Dashboards (`dashboards/`)
|
||||
|
||||
Grafana dashboards for:
|
||||
- Infrastructure overview
|
||||
- Proxmox cluster health
|
||||
- Network performance
|
||||
- Omada controller status
|
||||
- Site-level monitoring
|
||||
|
||||
## Exporters
|
||||
|
||||
### Proxmox Exporter
|
||||
|
||||
The Proxmox exporter (`pve_exporter`) provides metrics for:
|
||||
- VM status and resource usage
|
||||
- Node health and performance
|
||||
- Storage pool utilization
|
||||
- Network interface statistics
|
||||
- Cluster status
|
||||
|
||||
**Installation:**
|
||||
```bash
|
||||
pip install pve_exporter
|
||||
```
|
||||
|
||||
**Configuration:**
|
||||
```yaml
|
||||
exporter:
|
||||
listen_address: 0.0.0.0:9221
|
||||
proxmox:
|
||||
endpoint: https://pve1.sankofa.nexus:8006
|
||||
username: monitoring@pam
|
||||
password: ${PROXMOX_PASSWORD}
|
||||
```
|
||||
|
||||
### Omada Exporter
|
||||
|
||||
Custom exporter for TP-Link Omada Controller metrics:
|
||||
- Access point status
|
||||
- Client device counts
|
||||
- Network throughput
|
||||
- Controller health
|
||||
|
||||
**See**: `exporters/omada_exporter/` for implementation
|
||||
|
||||
### Network Exporter
|
||||
|
||||
SNMP-based exporter for network devices:
|
||||
- Switch port statistics
|
||||
- Router interface metrics
|
||||
- VLAN utilization
|
||||
- Network topology changes
|
||||
|
||||
**See**: `exporters/network_exporter/` for implementation
|
||||
|
||||
## Dashboards
|
||||
|
||||
### Infrastructure Overview
|
||||
|
||||
Comprehensive dashboard showing:
|
||||
- All sites status
|
||||
- Resource utilization
|
||||
- Health scores
|
||||
- Alert summary
|
||||
|
||||
**Location**: `dashboards/infrastructure-overview.json`
|
||||
|
||||
### Proxmox Cluster
|
||||
|
||||
Dashboard for Proxmox clusters:
|
||||
- Cluster health
|
||||
- Node performance
|
||||
- VM resource usage
|
||||
- Storage utilization
|
||||
|
||||
**Location**: `dashboards/proxmox-cluster.json`
|
||||
|
||||
### Network Performance
|
||||
|
||||
Network performance dashboard:
|
||||
- Bandwidth utilization
|
||||
- Latency metrics
|
||||
- Error rates
|
||||
- Top talkers
|
||||
|
||||
**Location**: `dashboards/network-performance.json`
|
||||
|
||||
### Omada Controller
|
||||
|
||||
Omada-specific dashboard:
|
||||
- Controller status
|
||||
- Access point health
|
||||
- Client statistics
|
||||
- Network policies
|
||||
|
||||
**Location**: `dashboards/omada-controller.json`
|
||||
|
||||
## Installation
|
||||
|
||||
### Deploy Exporters
|
||||
|
||||
```bash
|
||||
# Deploy all exporters
|
||||
kubectl apply -f exporters/manifests/
|
||||
|
||||
# Or deploy individually
|
||||
kubectl apply -f exporters/manifests/proxmox-exporter.yaml
|
||||
kubectl apply -f exporters/manifests/omada-exporter.yaml
|
||||
```
|
||||
|
||||
### Import Dashboards
|
||||
|
||||
```bash
|
||||
# Import all dashboards to Grafana
|
||||
./scripts/import-dashboards.sh
|
||||
|
||||
# Or import individually
|
||||
grafana-cli admin import-dashboard dashboards/infrastructure-overview.json
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
### Prometheus Scrape Configuration
|
||||
|
||||
```yaml
|
||||
scrape_configs:
|
||||
- job_name: 'proxmox'
|
||||
static_configs:
|
||||
- targets:
|
||||
- 'pve-exporter.monitoring.svc.cluster.local:9221'
|
||||
|
||||
- job_name: 'omada'
|
||||
static_configs:
|
||||
- targets:
|
||||
- 'omada-exporter.monitoring.svc.cluster.local:9222'
|
||||
|
||||
- job_name: 'network'
|
||||
static_configs:
|
||||
- targets:
|
||||
- 'network-exporter.monitoring.svc.cluster.local:9223'
|
||||
```
|
||||
|
||||
### Alerting Rules
|
||||
|
||||
Alert rules are defined in `exporters/alert-rules/`:
|
||||
|
||||
- `proxmox-alerts.yaml`: Proxmox cluster alerts
|
||||
- `omada-alerts.yaml`: Omada controller alerts
|
||||
- `network-alerts.yaml`: Network infrastructure alerts
|
||||
|
||||
## Metrics
|
||||
|
||||
### Proxmox Metrics
|
||||
|
||||
- `pve_node_status`: Node status (0=offline, 1=online)
|
||||
- `pve_vm_status`: VM status
|
||||
- `pve_storage_used_bytes`: Storage usage
|
||||
- `pve_network_rx_bytes`: Network receive bytes
|
||||
- `pve_network_tx_bytes`: Network transmit bytes
|
||||
|
||||
### Omada Metrics
|
||||
|
||||
- `omada_ap_status`: Access point status
|
||||
- `omada_clients_total`: Total client count
|
||||
- `omada_throughput_bytes`: Network throughput
|
||||
- `omada_controller_status`: Controller health
|
||||
|
||||
### Network Metrics
|
||||
|
||||
- `network_port_status`: Switch port status
|
||||
- `network_port_rx_bytes`: Port receive bytes
|
||||
- `network_port_tx_bytes`: Port transmit bytes
|
||||
- `network_vlan_utilization`: VLAN utilization
|
||||
|
||||
## Alerts
|
||||
|
||||
### Critical Alerts
|
||||
|
||||
- Proxmox cluster node down
|
||||
- Omada controller unreachable
|
||||
- Network switch offline
|
||||
- High resource utilization (>90%)
|
||||
|
||||
### Warning Alerts
|
||||
|
||||
- High resource utilization (>80%)
|
||||
- Network latency spikes
|
||||
- Access point offline
|
||||
- Storage pool >80% full
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Exporter Issues
|
||||
|
||||
```bash
|
||||
# Check exporter status
|
||||
kubectl get pods -n monitoring -l app=proxmox-exporter
|
||||
|
||||
# View exporter logs
|
||||
kubectl logs -n monitoring -l app=proxmox-exporter
|
||||
|
||||
# Test exporter endpoint
|
||||
curl http://proxmox-exporter.monitoring.svc.cluster.local:9221/metrics
|
||||
```
|
||||
|
||||
### Dashboard Issues
|
||||
|
||||
```bash
|
||||
# Verify dashboard import
|
||||
grafana-cli admin ls-dashboard
|
||||
|
||||
# Check dashboard data sources
|
||||
# In Grafana UI: Configuration > Data Sources
|
||||
```
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [Proxmox Management](../proxmox/README.md)
|
||||
- [Omada Management](../omada/README.md)
|
||||
- [Network Management](../network/README.md)
|
||||
- [Infrastructure Management](../README.md)
|
||||
|
||||
Reference in New Issue
Block a user