# CCIP Monitoring Guide for ChainID 138 **Date**: 2025-01-27 **Network**: ChainID 138 (DeFi Oracle Meta Mainnet) --- ## Overview This guide provides monitoring setup and best practices for CCIP infrastructure on ChainID 138. --- ## Monitoring Components ### 1. CCIP Router Monitoring #### Events to Monitor - `MessageSent`: Track all outgoing messages - Parameters: `messageId`, `destinationChainSelector`, `sender`, `receiver`, `data`, `tokenAmounts`, `feeToken`, `extraArgs` - `MessageReceived`: Track all incoming messages - Parameters: `messageId`, `sourceChainSelector`, `sender`, `data`, `tokenAmounts` #### Metrics to Track - Message volume (sent/received per hour/day) - Fee collection amounts - Average message size - Success/failure rates - Destination chain distribution #### Alerts - High message failure rate (>5%) - Unusual fee collection patterns - Router contract errors - Unsupported chain access attempts ### 2. Bridge Monitoring #### Events to Monitor **CCIPWETH9Bridge & CCIPWETH10Bridge**: - `CrossChainTransferInitiated`: Track outgoing transfers - Parameters: `messageId`, `sender`, `destinationChainSelector`, `recipient`, `amount`, `nonce` - `CrossChainTransferCompleted`: Track completed transfers - Parameters: `messageId`, `sourceChainSelector`, `recipient`, `amount` - `DestinationAdded`: Track configuration changes - `DestinationRemoved`: Track configuration changes - `DestinationUpdated`: Track configuration changes #### Metrics to Track - Transfer volume (amount and count) - Average transfer size - Transfer success rate - Time to completion - Fee costs per transfer - Destination chain usage #### Alerts - Failed transfers - Stuck transfers (no completion after X hours) - Unusual transfer patterns - Configuration changes - Insufficient fee errors --- ## Monitoring Setup ### Option 1: Event Logging Script Create a script to monitor events: ```bash #!/bin/bash # Monitor CCIP events RPC_URL="${RPC_URL_138:-http://localhost:8545}" ROUTER="${CCIP_CHAIN138_ROUTER:-}" BRIDGE9="${CCIPWETH9_BRIDGE_CHAIN138:-}" BRIDGE10="${CCIPWETH10_BRIDGE_CHAIN138:-}" # Monitor router events cast logs --from-block latest \ --address "$ROUTER" \ --rpc-url "$RPC_URL" \ "MessageSent(bytes32,uint64,address,bytes,tuple[],address,bytes)" # Monitor bridge events cast logs --from-block latest \ --address "$BRIDGE9" \ --rpc-url "$RPC_URL" \ "CrossChainTransferInitiated(bytes32,address,uint64,address,uint256,uint256)" ``` ### Option 2: Prometheus Integration Set up Prometheus to scrape CCIP metrics: ```yaml # prometheus.yml scrape_configs: - job_name: 'ccip-router' static_configs: - targets: ['localhost:9545'] metrics_path: '/metrics' ``` ### Option 3: Grafana Dashboards Create dashboards for: - Message volume over time - Transfer amounts and counts - Fee collection - Success/failure rates - Destination chain distribution --- ## Alerting Rules ### Critical Alerts 1. **Router Down**: Router contract becomes unresponsive 2. **Bridge Failure**: Bridge fails to process transfers 3. **High Failure Rate**: >10% message/transfer failures 4. **Configuration Change**: Unauthorized configuration changes ### Warning Alerts 1. **High Volume**: Unusual message/transfer volume 2. **Fee Anomaly**: Unusual fee collection patterns 3. **Slow Processing**: Messages taking longer than expected --- ## Logging ### Recommended Log Levels - **INFO**: Normal operations (messages sent/received, transfers) - **WARN**: Recoverable errors, configuration changes - **ERROR**: Failed operations, contract errors - **DEBUG**: Detailed operation logs (for troubleshooting) ### Log Retention - **Event Logs**: Retain for 90 days - **Error Logs**: Retain for 180 days - **Audit Logs**: Retain for 1 year --- ## Health Checks ### Router Health Check ```bash # Check router is responsive cast call "$ROUTER" "feeToken()" --rpc-url "$RPC_URL" # Check supported chains cast call "$ROUTER" "supportedChains(uint64)" "5009297550715157269" --rpc-url "$RPC_URL" ``` ### Bridge Health Check ```bash # Check bridge router connection cast call "$BRIDGE9" "ccipRouter()" --rpc-url "$RPC_URL" # Check destinations cast call "$BRIDGE9" "destinations(uint64)" "5009297550715157269" --rpc-url "$RPC_URL" ``` --- ## Performance Metrics ### Key Performance Indicators (KPIs) 1. **Message Throughput**: Messages per second 2. **Transfer Throughput**: Transfers per hour 3. **Average Latency**: Time from send to receive 4. **Success Rate**: Percentage of successful operations 5. **Fee Efficiency**: Average fee per operation ### Target Metrics - Message success rate: >99% - Average latency: <5 minutes - Transfer success rate: >99.5% - System uptime: >99.9% --- ## Incident Response ### Escalation Procedures 1. **Level 1**: Automated alerts → On-call engineer 2. **Level 2**: Critical failures → Team lead 3. **Level 3**: System-wide issues → CTO/Management ### Response Playbook 1. **Router Failure**: - Check contract status - Verify RPC connectivity - Review recent transactions - Check for configuration changes 2. **Bridge Failure**: - Verify router connectivity - Check destination configuration - Review transfer logs - Verify fee payment 3. **High Failure Rate**: - Analyze failure patterns - Check network conditions - Review recent changes - Escalate if needed --- ## Monitoring Tools ### Recommended Tools - **Prometheus**: Metrics collection - **Grafana**: Visualization and dashboards - **ELK Stack**: Log aggregation - **PagerDuty**: Alerting and on-call - **Custom Scripts**: Event monitoring --- ## Related Documentation - [CCIP Deployment Guide](../ccip/DEPLOYMENT_GUIDE_CHAIN138.md) - [CCIP Review](../CCIP_CHAIN138_REVIEW.md) - [Operations Runbooks](CCIP_RUNBOOKS.md) --- **Last Updated**: 2025-01-27