smom-dbis-138/orchestration/portal/MONITORING_SETUP.md

# 🔍 Service Monitoring - Besu, Cacti, Firefly, Chainlink CCIP

## Overview

Comprehensive monitoring system for blockchain and interoperability services integrated into the orchestration portal.

## Monitored Services

### 1. **Besu (Hyperledger Besu)**
- Block number and chain status
- Peer count and network connectivity
- Sync status (syncing/synced/behind)
- Gas price and network metrics
- Pending transactions
- Resource usage (CPU, memory, disk)
- Chain ID and network ID

### 2. **Cacti (Hyperledger Cacti)**
- Connector count
- Active connections
- Transaction metrics (total, pending, failed)
- Average latency
- Resource usage (CPU, memory)

### 3. **Firefly (Hyperledger Firefly)**
- Namespace count
- Active APIs
- Transaction metrics (total, pending, failed)
- Average latency
- Resource usage (CPU, memory)
- Database connections

### 4. **Chainlink CCIP (Cross-Chain Interoperability Protocol)**
- Router address and configuration
- Active chains count
- Message metrics (total, pending, failed)
- Average latency
- Token transfers
- Fee token balance
- Resource usage (CPU, memory)

## Features

### ✅ Implemented

1. **Monitoring API Endpoints**
   - `/api/monitoring/dashboard` - Overall health summary
   - `/api/monitoring/besu` - Besu metrics
   - `/api/monitoring/cacti` - Cacti metrics
   - `/api/monitoring/firefly` - Firefly metrics
   - `/api/monitoring/chainlink-ccip` - Chainlink CCIP metrics
   - `/api/monitoring/services/:type/:name/status` - Service status
   - `/api/monitoring/environments/:name/collect` - Collect all metrics

2. **Database Schema**
   - `service_monitoring` table for metrics storage
   - Indexed for fast queries
   - Supports historical data

3. **Monitoring Service**
   - `MonitoringService` class for metrics collection
   - Simulated metrics (ready for Prometheus integration)
   - Health status calculation
   - Automatic metric storage

4. **Vue Dashboard**
   - `MonitoringDashboard.vue` - Main monitoring view
   - Service-specific components:
     - `BesuMonitoring.vue`
     - `CactiMonitoring.vue`
     - `FireflyMonitoring.vue`
     - `ChainlinkCCIPMonitoring.vue`
   - `ServiceHealthCard.vue` - Health overview cards
   - `MetricCard.vue` - Individual metric display

5. **Real-time Updates**
   - WebSocket integration
   - Auto-refresh every 30 seconds
   - Live updates on metric collection

## API Usage

### Get Monitoring Dashboard
```bash
GET /api/monitoring/dashboard
```

Response:
```json
{
  "besu": {
    "total": 5,
    "healthy": 4,
    "degraded": 1,
    "unhealthy": 0,
    "services": [...]
  },
  "cacti": {...},
  "firefly": {...},
  "chainlinkCcip": {...}
}
```

### Get Besu Metrics
```bash
GET /api/monitoring/besu?environment=workload-azure-eastus&service=besu-validator-1
```

### Collect All Metrics
```bash
POST /api/monitoring/environments/workload-azure-eastus/collect
```

## Database Schema

### service_monitoring
```sql
CREATE TABLE service_monitoring (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  service_name TEXT NOT NULL,
  service_type TEXT NOT NULL,
  environment TEXT,
  metric_name TEXT NOT NULL,
  metric_value REAL NOT NULL,
  metric_unit TEXT,
  status TEXT NOT NULL,
  timestamp TEXT NOT NULL,
  metadata TEXT
);
```

## Integration with Prometheus

The monitoring service is designed to integrate with Prometheus. Replace the simulated metrics with actual Prometheus queries:

```typescript
// Example: Replace collectBesuMetrics with Prometheus query
async collectBesuMetrics(environment: string, serviceName: string): Promise<BesuMetrics> {
  // Query Prometheus
  const blockNumber = await prometheus.query(
    `besu_blockchain_blockNumber{instance="${serviceName}"}`
  );

  const peerCount = await prometheus.query(
    `besu_network_peers{instance="${serviceName}"}`
  );

  // ... etc
}
```

## Metrics Collected

### Besu
- `block_number` - Current block number
- `peer_count` - Number of connected peers
- `cpu_usage` - CPU utilization percentage
- `memory_usage` - Memory utilization percentage
- `disk_usage` - Disk utilization percentage

### Cacti
- `connector_count` - Number of connectors
- `active_connections` - Active connection count
- `failed_transactions` - Failed transaction count
- `average_latency` - Average transaction latency

### Firefly
- `namespace_count` - Number of namespaces
- `active_apis` - Active API count
- `failed_transactions` - Failed transaction count
- `database_connections` - Database connection count

### Chainlink CCIP
- `active_chains` - Number of active chains
- `total_messages` - Total message count
- `failed_messages` - Failed message count
- `average_latency` - Average message latency

## Health Status Calculation

### Healthy
- All critical metrics within normal ranges
- No failed operations above threshold
- Resource usage below warning levels

### Degraded
- Some metrics outside optimal range
- Increased failed operations
- Resource usage approaching limits

### Unhealthy
- Critical metrics in danger zone
- High failure rates
- Resource usage at critical levels

## Access

**URL**: `http://localhost:5000/monitoring` or `/monitoring` in the Vue app

## Future Enhancements

- [ ] Prometheus integration (replace simulated metrics)
- [ ] Grafana dashboards export
- [ ] Alert rules and thresholds
- [ ] Historical trend analysis
- [ ] Custom metric queries
- [ ] Service-specific dashboards
- [ ] Export metrics to CSV/JSON
- [ ] Metric comparison across environments
- [ ] Performance benchmarking

## Configuration

### Environment Variables
```bash
# Prometheus endpoint (when integrated)
PROMETHEUS_URL=http://prometheus:9090

# Metrics collection interval
METRICS_COLLECTION_INTERVAL=30000  # 30 seconds
```

## Testing

Test monitoring endpoints:
```bash
# Get dashboard
curl http://localhost:5000/api/monitoring/dashboard

# Get Besu metrics
curl http://localhost:5000/api/monitoring/besu?environment=workload-azure-eastus

# Collect metrics
curl -X POST http://localhost:5000/api/monitoring/environments/workload-azure-eastus/collect
```

---

**Status**: ✅ **Monitoring system implemented and ready!**

**Last Updated**: 2024-11-19
**Version**: 1.0.0