# CCIP Operations Runbook ## Overview This runbook provides operational procedures for managing CCIP (Chainlink Cross-Chain Interoperability Protocol) on the DeFi Oracle Meta Mainnet. ## Daily Operations ### Health Checks 1. **Check CCIP Router Status** ```bash kubectl get pods -n besu-network -l app=ccip-router kubectl logs -n besu-network -l app=ccip-router --tail=100 ``` 2. **Check Message Processing** ```bash # Check recent messages cast logs --from-block latest-100 --address $CCIP_SENDER --rpc-url $RPC_URL | grep MessageSent ``` 3. **Monitor Metrics** - Message send success rate - Message delivery latency - Fee consumption - Error rates ### LINK Balance Monitoring 1. **Check Balance** ```bash cast call $LINK_TOKEN "balanceOf(address)" $CCIP_SENDER --rpc-url $RPC_URL ``` 2. **Set Alert Threshold** - Alert when balance < 10 LINK - Alert when balance < 5 LINK (critical) 3. **Refill Balance** ```bash cast send $LINK_TOKEN "transfer(address,uint256)" $CCIP_SENDER $AMOUNT \ --rpc-url $RPC_URL --private-key $PRIVATE_KEY ``` ## Weekly Operations ### Review Message Statistics 1. **Message Volume** - Total messages sent - Success rate - Average latency 2. **Fee Analysis** - Total fees spent - Average fee per message - Fee trends 3. **Error Analysis** - Error types - Error frequency - Root causes ### Performance Review 1. **Latency Analysis** - Average delivery time - P95/P99 latency - Outliers 2. **Throughput** - Messages per hour/day - Peak load times - Capacity planning ## Monthly Operations ### Security Review 1. **Access Control Audit** - Review authorized senders/receivers - Check for unauthorized access - Verify role assignments 2. **Message Validation** - Review message format compliance - Check for anomalies - Verify replay protection ### Cost Optimization 1. **Fee Optimization** - Review fee trends - Identify optimization opportunities - Implement improvements 2. **Message Optimization** - Reduce message size - Batch updates when possible - Optimize encoding ## Incident Response ### Message Delivery Failure 1. **Identify Issue** - Check message status - Verify target chain status - Check router logs 2. **Diagnose** - Check LINK balance - Verify router configuration - Check target chain connectivity 3. **Resolve** - Fix underlying issue - Resend message if needed - Update configuration if required ### High Error Rate 1. **Investigate** - Check error logs - Identify error patterns - Review recent changes 2. **Mitigate** - Pause sending if critical - Fix root cause - Resume operations ### Router Unavailable 1. **Check Status** - Verify router deployment - Check service health - Review logs 2. **Recovery** - Restart router if needed - Verify connectivity - Test message sending ## Maintenance ### Contract Upgrades 1. **Plan Upgrade** - Review upgrade proposal - Test in staging - Schedule maintenance window 2. **Execute Upgrade** - Pause operations - Deploy new contracts - Update configurations - Resume operations 3. **Verify** - Test message sending - Verify message receiving - Monitor for issues ### Configuration Changes 1. **Review Impact** - Assess change impact - Test in staging - Plan rollback 2. **Apply Changes** - Update configuration - Monitor closely - Verify functionality ## Monitoring ### Key Metrics - `ccip_messages_sent_total`: Total messages sent - `ccip_messages_received_total`: Total messages received - `ccip_message_latency_seconds`: Message delivery latency - `ccip_fees_total`: Total LINK spent on fees - `ccip_errors_total`: Total errors ### Alerts - High error rate (> 5%) - Low success rate (< 95%) - High latency (> 5 minutes) - Low LINK balance (< 10 LINK) - Router unavailable ## References - [CCIP Integration Guide](../docs/CCIP_INTEGRATION.md) - [CCIP Router Setup](../docs/CCIP_ROUTER_SETUP.md) - [CCIP Troubleshooting](../docs/CCIP_TROUBLESHOOTING.md) - [CCIP Incident Response](ccip-incident-response.md)