d-bis/smom-dbis-138

Fork 0

Files

defiQUG d31aad7d66 WIP: HYBX OMNL and deployment documentation updates

2026-06-02 06:09:56 -07:00

5.7 KiB

Raw Blame History

CCIP Monitoring Guide for ChainID 138

Date: 2025-01-27
Network: ChainID 138 (DeFi Oracle Meta Mainnet)

Overview

This guide provides monitoring setup and best practices for CCIP infrastructure on ChainID 138.

Monitoring Components

1. CCIP Router Monitoring

Events to Monitor

MessageSent: Track all outgoing messages
- Parameters: messageId, destinationChainSelector, sender, receiver, data, tokenAmounts, feeToken, extraArgs
MessageReceived: Track all incoming messages
- Parameters: messageId, sourceChainSelector, sender, data, tokenAmounts

Metrics to Track

Message volume (sent/received per hour/day)
Fee collection amounts
Average message size
Success/failure rates
Destination chain distribution

Alerts

High message failure rate (>5%)
Unusual fee collection patterns
Router contract errors
Unsupported chain access attempts

2. Bridge Monitoring

Events to Monitor

CCIPWETH9Bridge & CCIPWETH10Bridge:

CrossChainTransferInitiated: Track outgoing transfers
- Parameters: messageId, sender, destinationChainSelector, recipient, amount, nonce
CrossChainTransferCompleted: Track completed transfers
- Parameters: messageId, sourceChainSelector, recipient, amount
DestinationAdded: Track configuration changes
DestinationRemoved: Track configuration changes
DestinationUpdated: Track configuration changes

Metrics to Track

Transfer volume (amount and count)
Average transfer size
Transfer success rate
Time to completion
Fee costs per transfer
Destination chain usage

Alerts

Failed transfers
Stuck transfers (no completion after X hours)
Unusual transfer patterns
Configuration changes
Insufficient fee errors

Monitoring Setup

Option 1: Event Logging Script

Create a script to monitor events:

#!/bin/bash
# Monitor CCIP events

RPC_URL="${RPC_URL_138:-http://localhost:8545}"
ROUTER="${CCIP_CHAIN138_ROUTER:-}"
BRIDGE9="${CCIPWETH9_BRIDGE_CHAIN138:-}"
BRIDGE10="${CCIPWETH10_BRIDGE_CHAIN138:-}"

# Monitor router events
cast logs --from-block latest \
  --address "$ROUTER" \
  --rpc-url "$RPC_URL" \
  "MessageSent(bytes32,uint64,address,bytes,tuple[],address,bytes)"

# Monitor bridge events
cast logs --from-block latest \
  --address "$BRIDGE9" \
  --rpc-url "$RPC_URL" \
  "CrossChainTransferInitiated(bytes32,address,uint64,address,uint256,uint256)"

Option 2: Prometheus Integration

Set up Prometheus to scrape CCIP metrics:

# prometheus.yml
scrape_configs:
  - job_name: 'ccip-router'
    static_configs:
      - targets: ['localhost:9545']
    metrics_path: '/metrics'

Option 3: Grafana Dashboards

Create dashboards for:

Message volume over time
Transfer amounts and counts
Fee collection
Success/failure rates
Destination chain distribution

Alerting Rules

Critical Alerts

Router Down: Router contract becomes unresponsive
Bridge Failure: Bridge fails to process transfers
High Failure Rate: >10% message/transfer failures
Configuration Change: Unauthorized configuration changes

Warning Alerts

High Volume: Unusual message/transfer volume
Fee Anomaly: Unusual fee collection patterns
Slow Processing: Messages taking longer than expected

Logging

Recommended Log Levels

INFO: Normal operations (messages sent/received, transfers)
WARN: Recoverable errors, configuration changes
ERROR: Failed operations, contract errors
DEBUG: Detailed operation logs (for troubleshooting)

Log Retention

Event Logs: Retain for 90 days
Error Logs: Retain for 180 days
Audit Logs: Retain for 1 year

Health Checks

Router Health Check

# Check router is responsive
cast call "$ROUTER" "feeToken()" --rpc-url "$RPC_URL"

# Check supported chains
cast call "$ROUTER" "supportedChains(uint64)" "5009297550715157269" --rpc-url "$RPC_URL"

Bridge Health Check

# Check bridge router connection
cast call "$BRIDGE9" "ccipRouter()" --rpc-url "$RPC_URL"

# Check destinations
cast call "$BRIDGE9" "destinations(uint64)" "5009297550715157269" --rpc-url "$RPC_URL"

Performance Metrics

Key Performance Indicators (KPIs)

Message Throughput: Messages per second
Transfer Throughput: Transfers per hour
Average Latency: Time from send to receive
Success Rate: Percentage of successful operations
Fee Efficiency: Average fee per operation

Target Metrics

Message success rate: >99%
Average latency: <5 minutes
Transfer success rate: >99.5%
System uptime: >99.9%

Incident Response

Escalation Procedures

Level 1: Automated alerts → On-call engineer
Level 2: Critical failures → Team lead
Level 3: System-wide issues → CTO/Management

Response Playbook

Router Failure:
- Check contract status
- Verify RPC connectivity
- Review recent transactions
- Check for configuration changes
Bridge Failure:
- Verify router connectivity
- Check destination configuration
- Review transfer logs
- Verify fee payment
High Failure Rate:
- Analyze failure patterns
- Check network conditions
- Review recent changes
- Escalate if needed

Monitoring Tools

Recommended Tools

Prometheus: Metrics collection
Grafana: Visualization and dashboards
ELK Stack: Log aggregation
PagerDuty: Alerting and on-call
Custom Scripts: Event monitoring

Last Updated: 2025-01-27

5.7 KiB Raw Blame History

CCIP Monitoring Guide for ChainID 138

Overview

Monitoring Components

1. CCIP Router Monitoring

Events to Monitor

Metrics to Track

Alerts

2. Bridge Monitoring

Events to Monitor

Metrics to Track

Alerts

Monitoring Setup

Option 1: Event Logging Script

Option 2: Prometheus Integration

Option 3: Grafana Dashboards

Alerting Rules

Critical Alerts

Warning Alerts

Logging

Recommended Log Levels

Log Retention

Health Checks

Router Health Check

Bridge Health Check

Performance Metrics

Key Performance Indicators (KPIs)

Target Metrics

Incident Response

Escalation Procedures

Response Playbook

Monitoring Tools

Recommended Tools

Related Documentation

5.7 KiB

Raw Blame History