253 lines
5.7 KiB
Markdown
253 lines
5.7 KiB
Markdown
# CCIP Monitoring Guide for ChainID 138
|
|
|
|
**Date**: 2025-01-27
|
|
**Network**: ChainID 138 (DeFi Oracle Meta Mainnet)
|
|
|
|
---
|
|
|
|
## Overview
|
|
|
|
This guide provides monitoring setup and best practices for CCIP infrastructure on ChainID 138.
|
|
|
|
---
|
|
|
|
## Monitoring Components
|
|
|
|
### 1. CCIP Router Monitoring
|
|
|
|
#### Events to Monitor
|
|
|
|
- `MessageSent`: Track all outgoing messages
|
|
- Parameters: `messageId`, `destinationChainSelector`, `sender`, `receiver`, `data`, `tokenAmounts`, `feeToken`, `extraArgs`
|
|
- `MessageReceived`: Track all incoming messages
|
|
- Parameters: `messageId`, `sourceChainSelector`, `sender`, `data`, `tokenAmounts`
|
|
|
|
#### Metrics to Track
|
|
|
|
- Message volume (sent/received per hour/day)
|
|
- Fee collection amounts
|
|
- Average message size
|
|
- Success/failure rates
|
|
- Destination chain distribution
|
|
|
|
#### Alerts
|
|
|
|
- High message failure rate (>5%)
|
|
- Unusual fee collection patterns
|
|
- Router contract errors
|
|
- Unsupported chain access attempts
|
|
|
|
### 2. Bridge Monitoring
|
|
|
|
#### Events to Monitor
|
|
|
|
**CCIPWETH9Bridge & CCIPWETH10Bridge**:
|
|
- `CrossChainTransferInitiated`: Track outgoing transfers
|
|
- Parameters: `messageId`, `sender`, `destinationChainSelector`, `recipient`, `amount`, `nonce`
|
|
- `CrossChainTransferCompleted`: Track completed transfers
|
|
- Parameters: `messageId`, `sourceChainSelector`, `recipient`, `amount`
|
|
- `DestinationAdded`: Track configuration changes
|
|
- `DestinationRemoved`: Track configuration changes
|
|
- `DestinationUpdated`: Track configuration changes
|
|
|
|
#### Metrics to Track
|
|
|
|
- Transfer volume (amount and count)
|
|
- Average transfer size
|
|
- Transfer success rate
|
|
- Time to completion
|
|
- Fee costs per transfer
|
|
- Destination chain usage
|
|
|
|
#### Alerts
|
|
|
|
- Failed transfers
|
|
- Stuck transfers (no completion after X hours)
|
|
- Unusual transfer patterns
|
|
- Configuration changes
|
|
- Insufficient fee errors
|
|
|
|
---
|
|
|
|
## Monitoring Setup
|
|
|
|
### Option 1: Event Logging Script
|
|
|
|
Create a script to monitor events:
|
|
|
|
```bash
|
|
#!/bin/bash
|
|
# Monitor CCIP events
|
|
|
|
RPC_URL="${RPC_URL_138:-http://localhost:8545}"
|
|
ROUTER="${CCIP_CHAIN138_ROUTER:-}"
|
|
BRIDGE9="${CCIPWETH9_BRIDGE_CHAIN138:-}"
|
|
BRIDGE10="${CCIPWETH10_BRIDGE_CHAIN138:-}"
|
|
|
|
# Monitor router events
|
|
cast logs --from-block latest \
|
|
--address "$ROUTER" \
|
|
--rpc-url "$RPC_URL" \
|
|
"MessageSent(bytes32,uint64,address,bytes,tuple[],address,bytes)"
|
|
|
|
# Monitor bridge events
|
|
cast logs --from-block latest \
|
|
--address "$BRIDGE9" \
|
|
--rpc-url "$RPC_URL" \
|
|
"CrossChainTransferInitiated(bytes32,address,uint64,address,uint256,uint256)"
|
|
```
|
|
|
|
### Option 2: Prometheus Integration
|
|
|
|
Set up Prometheus to scrape CCIP metrics:
|
|
|
|
```yaml
|
|
# prometheus.yml
|
|
scrape_configs:
|
|
- job_name: 'ccip-router'
|
|
static_configs:
|
|
- targets: ['localhost:9545']
|
|
metrics_path: '/metrics'
|
|
```
|
|
|
|
### Option 3: Grafana Dashboards
|
|
|
|
Create dashboards for:
|
|
- Message volume over time
|
|
- Transfer amounts and counts
|
|
- Fee collection
|
|
- Success/failure rates
|
|
- Destination chain distribution
|
|
|
|
---
|
|
|
|
## Alerting Rules
|
|
|
|
### Critical Alerts
|
|
|
|
1. **Router Down**: Router contract becomes unresponsive
|
|
2. **Bridge Failure**: Bridge fails to process transfers
|
|
3. **High Failure Rate**: >10% message/transfer failures
|
|
4. **Configuration Change**: Unauthorized configuration changes
|
|
|
|
### Warning Alerts
|
|
|
|
1. **High Volume**: Unusual message/transfer volume
|
|
2. **Fee Anomaly**: Unusual fee collection patterns
|
|
3. **Slow Processing**: Messages taking longer than expected
|
|
|
|
---
|
|
|
|
## Logging
|
|
|
|
### Recommended Log Levels
|
|
|
|
- **INFO**: Normal operations (messages sent/received, transfers)
|
|
- **WARN**: Recoverable errors, configuration changes
|
|
- **ERROR**: Failed operations, contract errors
|
|
- **DEBUG**: Detailed operation logs (for troubleshooting)
|
|
|
|
### Log Retention
|
|
|
|
- **Event Logs**: Retain for 90 days
|
|
- **Error Logs**: Retain for 180 days
|
|
- **Audit Logs**: Retain for 1 year
|
|
|
|
---
|
|
|
|
## Health Checks
|
|
|
|
### Router Health Check
|
|
|
|
```bash
|
|
# Check router is responsive
|
|
cast call "$ROUTER" "feeToken()" --rpc-url "$RPC_URL"
|
|
|
|
# Check supported chains
|
|
cast call "$ROUTER" "supportedChains(uint64)" "5009297550715157269" --rpc-url "$RPC_URL"
|
|
```
|
|
|
|
### Bridge Health Check
|
|
|
|
```bash
|
|
# Check bridge router connection
|
|
cast call "$BRIDGE9" "ccipRouter()" --rpc-url "$RPC_URL"
|
|
|
|
# Check destinations
|
|
cast call "$BRIDGE9" "destinations(uint64)" "5009297550715157269" --rpc-url "$RPC_URL"
|
|
```
|
|
|
|
---
|
|
|
|
## Performance Metrics
|
|
|
|
### Key Performance Indicators (KPIs)
|
|
|
|
1. **Message Throughput**: Messages per second
|
|
2. **Transfer Throughput**: Transfers per hour
|
|
3. **Average Latency**: Time from send to receive
|
|
4. **Success Rate**: Percentage of successful operations
|
|
5. **Fee Efficiency**: Average fee per operation
|
|
|
|
### Target Metrics
|
|
|
|
- Message success rate: >99%
|
|
- Average latency: <5 minutes
|
|
- Transfer success rate: >99.5%
|
|
- System uptime: >99.9%
|
|
|
|
---
|
|
|
|
## Incident Response
|
|
|
|
### Escalation Procedures
|
|
|
|
1. **Level 1**: Automated alerts → On-call engineer
|
|
2. **Level 2**: Critical failures → Team lead
|
|
3. **Level 3**: System-wide issues → CTO/Management
|
|
|
|
### Response Playbook
|
|
|
|
1. **Router Failure**:
|
|
- Check contract status
|
|
- Verify RPC connectivity
|
|
- Review recent transactions
|
|
- Check for configuration changes
|
|
|
|
2. **Bridge Failure**:
|
|
- Verify router connectivity
|
|
- Check destination configuration
|
|
- Review transfer logs
|
|
- Verify fee payment
|
|
|
|
3. **High Failure Rate**:
|
|
- Analyze failure patterns
|
|
- Check network conditions
|
|
- Review recent changes
|
|
- Escalate if needed
|
|
|
|
---
|
|
|
|
## Monitoring Tools
|
|
|
|
### Recommended Tools
|
|
|
|
- **Prometheus**: Metrics collection
|
|
- **Grafana**: Visualization and dashboards
|
|
- **ELK Stack**: Log aggregation
|
|
- **PagerDuty**: Alerting and on-call
|
|
- **Custom Scripts**: Event monitoring
|
|
|
|
---
|
|
|
|
## Related Documentation
|
|
|
|
- [CCIP Deployment Guide](../ccip/DEPLOYMENT_GUIDE_CHAIN138.md)
|
|
- [CCIP Review](../CCIP_CHAIN138_REVIEW.md)
|
|
- [Operations Runbooks](CCIP_RUNBOOKS.md)
|
|
|
|
---
|
|
|
|
**Last Updated**: 2025-01-27
|
|
|