5.7 KiB
5.7 KiB
CCIP Monitoring Guide for ChainID 138
Date: 2025-01-27
Network: ChainID 138 (DeFi Oracle Meta Mainnet)
Overview
This guide provides monitoring setup and best practices for CCIP infrastructure on ChainID 138.
Monitoring Components
1. CCIP Router Monitoring
Events to Monitor
MessageSent: Track all outgoing messages- Parameters:
messageId,destinationChainSelector,sender,receiver,data,tokenAmounts,feeToken,extraArgs
- Parameters:
MessageReceived: Track all incoming messages- Parameters:
messageId,sourceChainSelector,sender,data,tokenAmounts
- Parameters:
Metrics to Track
- Message volume (sent/received per hour/day)
- Fee collection amounts
- Average message size
- Success/failure rates
- Destination chain distribution
Alerts
- High message failure rate (>5%)
- Unusual fee collection patterns
- Router contract errors
- Unsupported chain access attempts
2. Bridge Monitoring
Events to Monitor
CCIPWETH9Bridge & CCIPWETH10Bridge:
CrossChainTransferInitiated: Track outgoing transfers- Parameters:
messageId,sender,destinationChainSelector,recipient,amount,nonce
- Parameters:
CrossChainTransferCompleted: Track completed transfers- Parameters:
messageId,sourceChainSelector,recipient,amount
- Parameters:
DestinationAdded: Track configuration changesDestinationRemoved: Track configuration changesDestinationUpdated: Track configuration changes
Metrics to Track
- Transfer volume (amount and count)
- Average transfer size
- Transfer success rate
- Time to completion
- Fee costs per transfer
- Destination chain usage
Alerts
- Failed transfers
- Stuck transfers (no completion after X hours)
- Unusual transfer patterns
- Configuration changes
- Insufficient fee errors
Monitoring Setup
Option 1: Event Logging Script
Create a script to monitor events:
#!/bin/bash
# Monitor CCIP events
RPC_URL="${RPC_URL_138:-http://localhost:8545}"
ROUTER="${CCIP_CHAIN138_ROUTER:-}"
BRIDGE9="${CCIPWETH9_BRIDGE_CHAIN138:-}"
BRIDGE10="${CCIPWETH10_BRIDGE_CHAIN138:-}"
# Monitor router events
cast logs --from-block latest \
--address "$ROUTER" \
--rpc-url "$RPC_URL" \
"MessageSent(bytes32,uint64,address,bytes,tuple[],address,bytes)"
# Monitor bridge events
cast logs --from-block latest \
--address "$BRIDGE9" \
--rpc-url "$RPC_URL" \
"CrossChainTransferInitiated(bytes32,address,uint64,address,uint256,uint256)"
Option 2: Prometheus Integration
Set up Prometheus to scrape CCIP metrics:
# prometheus.yml
scrape_configs:
- job_name: 'ccip-router'
static_configs:
- targets: ['localhost:9545']
metrics_path: '/metrics'
Option 3: Grafana Dashboards
Create dashboards for:
- Message volume over time
- Transfer amounts and counts
- Fee collection
- Success/failure rates
- Destination chain distribution
Alerting Rules
Critical Alerts
- Router Down: Router contract becomes unresponsive
- Bridge Failure: Bridge fails to process transfers
- High Failure Rate: >10% message/transfer failures
- Configuration Change: Unauthorized configuration changes
Warning Alerts
- High Volume: Unusual message/transfer volume
- Fee Anomaly: Unusual fee collection patterns
- Slow Processing: Messages taking longer than expected
Logging
Recommended Log Levels
- INFO: Normal operations (messages sent/received, transfers)
- WARN: Recoverable errors, configuration changes
- ERROR: Failed operations, contract errors
- DEBUG: Detailed operation logs (for troubleshooting)
Log Retention
- Event Logs: Retain for 90 days
- Error Logs: Retain for 180 days
- Audit Logs: Retain for 1 year
Health Checks
Router Health Check
# Check router is responsive
cast call "$ROUTER" "feeToken()" --rpc-url "$RPC_URL"
# Check supported chains
cast call "$ROUTER" "supportedChains(uint64)" "5009297550715157269" --rpc-url "$RPC_URL"
Bridge Health Check
# Check bridge router connection
cast call "$BRIDGE9" "ccipRouter()" --rpc-url "$RPC_URL"
# Check destinations
cast call "$BRIDGE9" "destinations(uint64)" "5009297550715157269" --rpc-url "$RPC_URL"
Performance Metrics
Key Performance Indicators (KPIs)
- Message Throughput: Messages per second
- Transfer Throughput: Transfers per hour
- Average Latency: Time from send to receive
- Success Rate: Percentage of successful operations
- Fee Efficiency: Average fee per operation
Target Metrics
- Message success rate: >99%
- Average latency: <5 minutes
- Transfer success rate: >99.5%
- System uptime: >99.9%
Incident Response
Escalation Procedures
- Level 1: Automated alerts → On-call engineer
- Level 2: Critical failures → Team lead
- Level 3: System-wide issues → CTO/Management
Response Playbook
-
Router Failure:
- Check contract status
- Verify RPC connectivity
- Review recent transactions
- Check for configuration changes
-
Bridge Failure:
- Verify router connectivity
- Check destination configuration
- Review transfer logs
- Verify fee payment
-
High Failure Rate:
- Analyze failure patterns
- Check network conditions
- Review recent changes
- Escalate if needed
Monitoring Tools
Recommended Tools
- Prometheus: Metrics collection
- Grafana: Visualization and dashboards
- ELK Stack: Log aggregation
- PagerDuty: Alerting and on-call
- Custom Scripts: Event monitoring
Related Documentation
Last Updated: 2025-01-27