WIP: HYBX OMNL and deployment documentation updates

This commit is contained in:
defiQUG
2026-06-02 06:09:56 -07:00
parent f04a7cb7c8
commit d31aad7d66
33 changed files with 78 additions and 2878 deletions

View File

@@ -0,0 +1,252 @@
# CCIP Monitoring Guide for ChainID 138
**Date**: 2025-01-27
**Network**: ChainID 138 (DeFi Oracle Meta Mainnet)
---
## Overview
This guide provides monitoring setup and best practices for CCIP infrastructure on ChainID 138.
---
## Monitoring Components
### 1. CCIP Router Monitoring
#### Events to Monitor
- `MessageSent`: Track all outgoing messages
- Parameters: `messageId`, `destinationChainSelector`, `sender`, `receiver`, `data`, `tokenAmounts`, `feeToken`, `extraArgs`
- `MessageReceived`: Track all incoming messages
- Parameters: `messageId`, `sourceChainSelector`, `sender`, `data`, `tokenAmounts`
#### Metrics to Track
- Message volume (sent/received per hour/day)
- Fee collection amounts
- Average message size
- Success/failure rates
- Destination chain distribution
#### Alerts
- High message failure rate (>5%)
- Unusual fee collection patterns
- Router contract errors
- Unsupported chain access attempts
### 2. Bridge Monitoring
#### Events to Monitor
**CCIPWETH9Bridge & CCIPWETH10Bridge**:
- `CrossChainTransferInitiated`: Track outgoing transfers
- Parameters: `messageId`, `sender`, `destinationChainSelector`, `recipient`, `amount`, `nonce`
- `CrossChainTransferCompleted`: Track completed transfers
- Parameters: `messageId`, `sourceChainSelector`, `recipient`, `amount`
- `DestinationAdded`: Track configuration changes
- `DestinationRemoved`: Track configuration changes
- `DestinationUpdated`: Track configuration changes
#### Metrics to Track
- Transfer volume (amount and count)
- Average transfer size
- Transfer success rate
- Time to completion
- Fee costs per transfer
- Destination chain usage
#### Alerts
- Failed transfers
- Stuck transfers (no completion after X hours)
- Unusual transfer patterns
- Configuration changes
- Insufficient fee errors
---
## Monitoring Setup
### Option 1: Event Logging Script
Create a script to monitor events:
```bash
#!/bin/bash
# Monitor CCIP events
RPC_URL="${RPC_URL_138:-http://localhost:8545}"
ROUTER="${CCIP_CHAIN138_ROUTER:-}"
BRIDGE9="${CCIPWETH9_BRIDGE_CHAIN138:-}"
BRIDGE10="${CCIPWETH10_BRIDGE_CHAIN138:-}"
# Monitor router events
cast logs --from-block latest \
--address "$ROUTER" \
--rpc-url "$RPC_URL" \
"MessageSent(bytes32,uint64,address,bytes,tuple[],address,bytes)"
# Monitor bridge events
cast logs --from-block latest \
--address "$BRIDGE9" \
--rpc-url "$RPC_URL" \
"CrossChainTransferInitiated(bytes32,address,uint64,address,uint256,uint256)"
```
### Option 2: Prometheus Integration
Set up Prometheus to scrape CCIP metrics:
```yaml
# prometheus.yml
scrape_configs:
- job_name: 'ccip-router'
static_configs:
- targets: ['localhost:9545']
metrics_path: '/metrics'
```
### Option 3: Grafana Dashboards
Create dashboards for:
- Message volume over time
- Transfer amounts and counts
- Fee collection
- Success/failure rates
- Destination chain distribution
---
## Alerting Rules
### Critical Alerts
1. **Router Down**: Router contract becomes unresponsive
2. **Bridge Failure**: Bridge fails to process transfers
3. **High Failure Rate**: >10% message/transfer failures
4. **Configuration Change**: Unauthorized configuration changes
### Warning Alerts
1. **High Volume**: Unusual message/transfer volume
2. **Fee Anomaly**: Unusual fee collection patterns
3. **Slow Processing**: Messages taking longer than expected
---
## Logging
### Recommended Log Levels
- **INFO**: Normal operations (messages sent/received, transfers)
- **WARN**: Recoverable errors, configuration changes
- **ERROR**: Failed operations, contract errors
- **DEBUG**: Detailed operation logs (for troubleshooting)
### Log Retention
- **Event Logs**: Retain for 90 days
- **Error Logs**: Retain for 180 days
- **Audit Logs**: Retain for 1 year
---
## Health Checks
### Router Health Check
```bash
# Check router is responsive
cast call "$ROUTER" "feeToken()" --rpc-url "$RPC_URL"
# Check supported chains
cast call "$ROUTER" "supportedChains(uint64)" "5009297550715157269" --rpc-url "$RPC_URL"
```
### Bridge Health Check
```bash
# Check bridge router connection
cast call "$BRIDGE9" "ccipRouter()" --rpc-url "$RPC_URL"
# Check destinations
cast call "$BRIDGE9" "destinations(uint64)" "5009297550715157269" --rpc-url "$RPC_URL"
```
---
## Performance Metrics
### Key Performance Indicators (KPIs)
1. **Message Throughput**: Messages per second
2. **Transfer Throughput**: Transfers per hour
3. **Average Latency**: Time from send to receive
4. **Success Rate**: Percentage of successful operations
5. **Fee Efficiency**: Average fee per operation
### Target Metrics
- Message success rate: >99%
- Average latency: <5 minutes
- Transfer success rate: >99.5%
- System uptime: >99.9%
---
## Incident Response
### Escalation Procedures
1. **Level 1**: Automated alerts → On-call engineer
2. **Level 2**: Critical failures → Team lead
3. **Level 3**: System-wide issues → CTO/Management
### Response Playbook
1. **Router Failure**:
- Check contract status
- Verify RPC connectivity
- Review recent transactions
- Check for configuration changes
2. **Bridge Failure**:
- Verify router connectivity
- Check destination configuration
- Review transfer logs
- Verify fee payment
3. **High Failure Rate**:
- Analyze failure patterns
- Check network conditions
- Review recent changes
- Escalate if needed
---
## Monitoring Tools
### Recommended Tools
- **Prometheus**: Metrics collection
- **Grafana**: Visualization and dashboards
- **ELK Stack**: Log aggregation
- **PagerDuty**: Alerting and on-call
- **Custom Scripts**: Event monitoring
---
## Related Documentation
- [CCIP Deployment Guide](../ccip/DEPLOYMENT_GUIDE_CHAIN138.md)
- [CCIP Review](../CCIP_CHAIN138_REVIEW.md)
- [Operations Runbooks](CCIP_RUNBOOKS.md)
---
**Last Updated**: 2025-01-27