# CCIP Relay Service Off-chain relay for forwarding Chain 138 `MessageSent` events to destination relay routers/bridges. ## Current Topology Source (Chain 138) — match `.env.bsc` / operator deploy: - Router: `0x42DAb7b888Dd382bD5Adcf9E038dBF1fD03b4817` - WETH9 bridge: `0xcacfd227A040002e49e2e01626363071324f820a` Destinations: - BSC relay router: `0x4d9Bc6c74ba65E37c4139F0aEC9fc5Ddff28Dcc4` - BSC relay bridge: `0x886C6A4ABC064dbf74E7caEc460b7eeC31F1b78C` - AVAX relay router: `0x2a0023Ad5ce1Ac6072B454575996DfFb1BB11b16` - AVAX relay bridge: `0x3f8C409C6072a2B6a4Ff17071927bA70F80c725F` Direct first-hop support from Chain 138 is intentionally narrow today: - Mainnet: supported with the default `.env` profile - BSC: supported with `.env.bsc` - Avalanche: supported with `.env.avax` - Gnosis / Cronos / Celo / Polygon / Arbitrum / Optimism / Base: treat as `via Mainnet hub` unless a dedicated relay router + relay profile are added and proven live Important: on 2026-04-04, a direct `138 -> Arbitrum` WETH send produced a real source `MessageSent` event but no destination delivery because the live relay worker was running a Mainnet-only destination profile. There is currently no tracked `.env.arbitrum` profile in this folder. ## Env Profiles Use the prebuilt env files in this folder: - `.env.mainnet-cw` — Chain 138 cW → **Ethereum mainnet** (`CW_BRIDGE_MAINNET`) - `.env.mainnet-weth` — WETH lane to mainnet - `.env.bsc` (template: `.env.bsc.example`) - `.env.bsc-cw` — Chain 138 cW → **BSC** (`CWMultiTokenBridgeL2` @ `0x0909Fc58…`) - `.env.avax-cw` — cW → Avalanche - `.env.avax` — WETH → Avalanche - `.env` (default/fallback) - `.env.local` — **only** when running without a named profile, or set `RELAY_ALLOW_ENV_LOCAL=1` **Pre-flight (required before restart):** ```bash ./scripts/verify/validate-relay-profiles.sh ./scripts/verify/diagnose-cw-mesh-ccip-relay.sh # mainnet cW lane + balances ``` Named profiles **do not** load `.env.local` (prevents mainnet router + Avalanche RPC mismatch). Each profile sets destination RPC, selector, relay router/bridge, and destination WETH token. ### `START_BLOCK` after catch-up When historical `MessageSent` logs are fully relayed, set **`START_BLOCK=latest`** in `.env.bsc` (or your profile) so a cold start only scans from **~current head − 1** instead of re-queuing the whole backfill range. To replay from an old height again, set an explicit decimal block (e.g. `3012930`) and restart. **BSC RPC:** Prefer a node that accepts short `eth_getLogs` windows (e.g. `https://bsc.publicnode.com`). Some Binance seeds return `-32005` for log queries the relay uses for destination checks. ### Fund BSC relay bridge (WETH) From repo root (loads `smom-dbis-138/.env` and relay `.env.bsc` for addresses): ```bash ./scripts/bridge/fund-bsc-relay-bridge.sh --dry-run ./scripts/bridge/fund-bsc-relay-bridge.sh # full deployer WETH → bridge # ./scripts/bridge/fund-bsc-relay-bridge.sh 1000000000000000 # 0.001 WETH wei ``` Wrap BNB to WETH on the deployer first (`cast send "deposit()" --value ...` on BSC) if needed. ### Fund Mainnet relay bridge (WETH) From repo root: ```bash ./scripts/bridge/fund-mainnet-relay-bridge.sh --dry-run ./scripts/bridge/fund-mainnet-relay-bridge.sh # full deployer WETH → bridge # ./scripts/bridge/fund-mainnet-relay-bridge.sh 1000000000000000 # 0.001 WETH wei ``` ## Destination tx confirmation timeout | Env | Default | Purpose | |-----|---------|---------| | `RELAY_TX_CONFIRM_TIMEOUT_MS` | `180000` (3 min) | Max wait for `tx.wait()` on mainnet relay txs. On timeout the message is retried instead of blocking the queue processor indefinitely. | ## Relay shedding (save destination gas) When **no** 138→Mainnet (or configured destination) relay deliveries are needed, pause **destination-chain** transactions so the relayer does not spend native gas on `relayMessage` / direct `ccipReceive`: | Variable | Meaning | |----------|---------| | `RELAY_SHEDDING=1` | **On** — shedding active (`true` / `yes` / `on` also work). | | `RELAY_DELIVERY_ENABLED=0` | Same as shedding on (`false` / `no` / `off`). | | `RELAY_SHEDDING_SOURCE_POLL_INTERVAL_MS` | Source router log poll interval while shedding (default **60000** ms, min 5000). Reduces Chain 138 RPC usage. | | `RELAY_SHEDDING_QUEUE_POLL_MS` | Idle interval for the queue loop while shedding (default **5000** ms, min 1000). | **Behavior:** Source `MessageSent` logs are still ingested and messages queue locally. Pending queue state is now persisted to `services/relay/data/queue-state.json` by default (override with `RELAY_QUEUE_STATE_PATH`), so a restart no longer drops queued work. For production, still plan shedding around low bridge traffic so the persisted backlog stays small and intentional. ## Skip specific message IDs Use `RELAY_SKIP_MESSAGE_IDS` as a comma-separated list of source `MessageSent.messageId` values that the relay should intentionally ignore. This is the safest operational way to park an already-confirmed source message when: - destination relay inventory is below the requested release amount - you do not want the relay to keep retrying it after service restarts - there is no on-chain cancel / refund path on the source bridge Example: ```bash RELAY_SKIP_MESSAGE_IDS=0xf718c9895c0a5442349996383184d017d2fa041af7aaeb9f0c0675d3ceed756b ``` The relay checks this list during live event ingestion, historical replay, and queue processing. For the current Mainnet WETH backlog policy, see: - [`docs/03-deployment/MAINNET_WETH_RELAY_BACKLOG_POLICY.md`](../../../docs/03-deployment/MAINNET_WETH_RELAY_BACKLOG_POLICY.md) ### On-chain pause (`CCIPRelayRouter`) The destination **CCIPRelayRouter** inherits OpenZeppelin **`Pausable`**: admins with `DEFAULT_ADMIN_ROLE` may call **`pause()`** / **`unpause()`**. While paused, **`relayMessage` reverts** (no delivery through the router). **Relay service:** Before sending `relayMessage`, the worker calls **`paused()`** on the destination router (router mode only). If paused, it **re-queues** the message and waits 15s instead of broadcasting a reverting tx. Older routers without `paused()` skip this check (call errors are logged at debug). **Important:** If you `pause()` the router but leave the relay **process** running **without** `RELAY_SHEDDING=1`, failed txs are much less likely thanks to the check above, but off-chain activity (source polling, queue growth) still runs. Prefer **`RELAY_SHEDDING=1`** (or stop the service) whenever the router is paused for an extended period. **Direct-delivery** mode (`DEST_DELIVERY_MODE=direct`) calls the bridge’s `ccipReceive` directly and **does not** go through the router—pause the router alone does not stop that path; use shedding or revoke `ROUTER_ROLE` on the bridge as appropriate. ## Start Relay ```bash cd /home/intlc/projects/proxmox/smom-dbis-138/services/relay npm install # BSC relay profile ./start-relay.sh bsc # AVAX relay profile ./start-relay.sh avax # Default profile ./start-relay.sh ``` `start-relay.sh` loads env in this order: 1. `.env.` (if profile argument provided) 2. `.env.local` 3. `.env` If parent project `.env` defines `PRIVATE_KEY`, `${PRIVATE_KEY}` references in relay env files are expanded. ## Relay Health Endpoint The relay now exposes a lightweight JSON status endpoint for explorer / mission-control monitoring. - Default listen address: `0.0.0.0` - Default port: `9860` - Endpoints: `GET /healthz`, `GET /health`, `GET /status` - Health payload includes `concurrency.active_relay_tasks`, `concurrency.max_concurrent`, and `queue.in_flight` Optional env overrides: ```bash RELAY_HEALTH_ENABLED=1 RELAY_HEALTH_HOST=0.0.0.0 RELAY_HEALTH_PORT=9860 ``` ### Fleet health monitor (all lanes) From repo root: ```bash pnpm relay:monitor-health RELAY_MONITOR_STRICT=1 pnpm relay:monitor-health # exit 1 on alerts pnpm relay:check-eth # relayer ETH on mainnet (min 0.05) pnpm relay:audit-env # START_BLOCK / shedding / concurrency audit ``` Endpoint registry: `config/ccip-relay-health-endpoints.v1.json` ### Unstick stuck messages (mainnet-cw / bsc-cw) ```bash # Dry-run ./scripts/deployment/unstick-ccip-relay-profile.sh --profile mainnet-cw --start-block 5623000 # Execute: stop, scrub failedIds, replay, drain, reset START_BLOCK=latest ./scripts/deployment/unstick-ccip-relay-profile.sh --profile mainnet-cw --start-block 5623000 --execute ``` ## Throughput and RPC optimization | Variable | Default | Purpose | |----------|---------|---------| | `RELAY_MAX_CONCURRENT` | `1` | Parallel queue workers (1–12). Mainnet cW profile uses `3`; BSC cW uses `2`. | | `DEST_RPC_URL_POOL` | — | Comma-separated destination RPC URLs for read calls (`processed`, inventory probes). Round-robin with failover. | | `RELAY_DEST_SUBMIT_RPC_URL` | — | Dedicated RPC for **submitting** relay txs (overrides pool for broadcasts). | | `BLINK_RPC_URL` / `MEV_BLOCKER_RPC_URL` | — | If set in parent `.env`, used as submit RPC when `RELAY_DEST_SUBMIT_RPC_URL` is unset. | **Behavior:** Each concurrent worker pulls the next queue message, uses `NonceManager` for ordered destination txs, and shares the same retry / shedding rules. Read probes (`isDeliveredOnDestination`, bridge inventory) use the RPC pool; writes use the submit URL when configured. Idle lanes (Avalanche WETH/cW, Avax→138) set `RELAY_SHEDDING=1` and slower `POLL_INTERVAL` to reduce RPC and gas spend until traffic resumes. Example from another LAN host: ```bash curl http://192.168.11.11:9860/healthz | jq . ``` Example explorer backend wiring: ```bash CCIP_RELAY_HEALTH_URL=http://192.168.11.11:9860/healthz CCIP_RELAY_HEALTH_URLS=mainnet-weth=http://192.168.11.11:9860/healthz,mainnet-cw=http://192.168.11.11:9863/healthz,bsc=http://192.168.11.11:9861/healthz,avax=http://192.168.11.11:9862/healthz ``` Recommended systemd ports when running multiple relay workers on the same host: - Mainnet WETH (default `.env`): `9860` - Mainnet cW (`ccip-relay-mainnet-cw.service`): `9863` - BSC WETH: `9861` - BSC cW (`ccip-relay-bsc-cw.service` on r630-04): `9867` - Avalanche: `9862` ### BSC profile (`start-relay.sh bsc`) - **Source:** Chain 138 public RPC (`RPC_URL_138` in `.env.bsc`). - **Destination:** `BSC_RELAY_RPC_URL` in `smom-dbis-138/.env` (Infura BSC; defaults to `BSC_MAINNET_RPC` / `BSC_RPC_URL`). - **Upstream (not used for relay txs):** `BSC_RPC_URL` / Infura — for operator `cast` and health cross-checks. - Sync + restart on r630-01: `../../../../scripts/deployment/sync-ccip-relay-bsc-r630-01.sh` - Verify: `../../../../scripts/verify/check-bsc-relay-rpc.sh` ## Critical Requirements - Relayer key must hold native gas on destination chain. - Destination relay bridge must hold enough WETH for payouts. - Explicit profile token overrides like `DEST_WETH9_ADDRESS` win over the generic multichain token map. This keeps relay-backed destinations pointed at their bridge-managed wrapped token instead of a public native wrapped asset. - Source bridge destination mapping must point to the correct destination relay bridge. - Source router `feeToken()` must be a deployed ERC20 with sufficient deployer balance. ## Fast Status Checks Check source destination mappings: ```bash cast call 0xcacfd227A040002e49e2e01626363071324f820a "destinations(uint64)" 11344663589394136015 --rpc-url https://rpc.public-0138.defi-oracle.io cast call 0xcacfd227A040002e49e2e01626363071324f820a "destinations(uint64)" 6433500567565415381 --rpc-url https://rpc.public-0138.defi-oracle.io ``` Check message settlement: ```bash cast call 0x886C6A4ABC064dbf74E7caEc460b7eeC31F1b78C "processedTransfers(bytes32)(bool)" --rpc-url https://bsc.publicnode.com cast call 0x3f8C409C6072a2B6a4Ff17071927bA70F80c725F "processedTransfers(bytes32)(bool)" --rpc-url https://avalanche-c-chain.publicnode.com ``` Check destination bridge liquidity: ```bash cast call "balanceOf(address)(uint256)" --rpc-url ```