Files
cross-chain-pmm-lps/docs/12-sim-scorecard.md
2026-03-02 12:14:07 -08:00

136 lines
7.1 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Simulation Scorecard Contract
Simulator runs must emit a **scorecard JSON** so results are comparable and deployability can be gated. This document defines the output schema and pass/fail gates.
---
## 1. Scorecard JSON schema
Every run (hub-only, full-quote, bridge shock) should produce a scorecard with at least:
| Field | Type | Description |
|-------|------|--------------|
| `capture_mean` | number | Mean router capture ratio (fraction 01) across chains/tokens |
| `capture_p95` | number | 95th percentile capture |
| `churn_mean` | number | Mean inventory churn per epoch |
| `churn_p95` | number | 95th percentile churn |
| `churn_max` | number | Max churn observed |
| `intervention_cost_total` | number | Total intervention cost (bridge/mint/burn) in run |
| `intervention_cost_per_1M_volume` | number | Intervention cost per 1M routed volume |
| `peak_deviation_bps` | number | Peak peg deviation in basis points |
| `reflexive_route_count` | number | Count of multi-hop routes through multiple PMM pools (reflexivity) |
| `drain_half_life_epochs` | object | Per (token, chain): epochs until PMM inventory halves under routing pressure; **too short = routing magnet** |
| `path_concentration_index` | number | HHI on path shares (01); **high = you dominate execution**; lower = flow diversified (safer) |
| `arb_volume_total` | number | Total volume traded by arb step (PR#2) |
| `arb_profit_total` | number | Arb profit from **execution** (actual PMM output vs oracle), not mid (PR#2 refinement) |
| `peak_deviation_bps_pre_arb` | number | Max pool \|δ\| before arb step (diagnostic: is arb doing the work?) |
| `peak_deviation_bps_post_arb` | number | Max pool \|δ\| after arb (current primary gate) |
| `peak_deviation_bps_post_bot` | number | Max pool \|δ\| after bot (inventory rebalancing effect) |
| `intervention_cost_inject_total` | number | Bot inject (bridge-in) cost only |
| `intervention_cost_withdraw_total` | number | Bot withdraw cost only |
| `intervention_cost_by_chain` | object | Per chain: `{ inject, withdraw }` — which chains are liquidity sinks |
| `scenario` | string | e.g. `hub_only_11`, `full_quote_1_56_137`, `bridge_shock_137_56` |
| `runId` | string | Optional run identifier |
**Example (minimal):**
```json
{
"scenario": "hub_only",
"capture_mean": 0.18,
"capture_p95": 0.28,
"churn_mean": 0.45,
"churn_p95": 0.82,
"churn_max": 1.1,
"intervention_cost_total": 1200,
"intervention_cost_per_1M_volume": 8.5,
"peak_deviation_bps": 18,
"reflexive_route_count": 0
}
```
A machine-readable schema lives in `config/scorecard-schema.json` for validation.
---
## 2. Pass/fail gates (“deployable” scenarios)
From [10-behavioral-stability-analysis.md](10-behavioral-stability-analysis.md):
| Gate | Condition | Rationale |
|------|-----------|-----------|
| **Churn (normal)** | `churn_mean` in [0.3, 0.8] | Healthy baseline |
| **Churn (stress)** | `churn_max` < 1.5 | Avoid constant bot intervention |
| **Capture (baseline)** | `capture_mean` in [0.10, 0.30] | Peg stabilizer, not global venue |
| **Intervention** | `intervention_cost_per_1M_volume` stable (no explosive jump vs baseline) | Linear in volume |
| **Full-quote vs hub** | If full-quote: `churn_mean` increase vs hub < 50% | Dont deploy full-quote if churn jumps >50% |
| **Peak deviation** | `peak_deviation_bps` below circuit-break (e.g. 200 bps USD) | Stay inside band |
| **Drain half-life** | `drain_half_life_epochs` not collapsing under full-quote vs hub | Routing magnet check |
| **Path concentration** | `path_concentration_index` not spiking during bridge shock | Diversified routing |
| **Reflexivity** | `reflexive_route_count` low relative to total routes | Avoid feedback loops |
**Sanity checks (PR#2):** Arb volume should rise when k is tight; bot interventions should rise when inventory targets are low.
**Pass:** All gates satisfied for the scenario.
**Fail:** Any gate violated; do not treat scenario as deployable without parameter change or topology reduction.
---
## 3. Phase 0 comparison (three scenarios)
Run and compare:
1. **Hub-only** across all 11 chains
2. **Full-quote** only on 1, 56, 137
3. **Bridge shock** (e.g. 137 → 56)
Compare deltas:
- **churn +%** (full-quote vs hub)
- **intervention cost +%** (full-quote vs hub)
- **peak deviation** under shock
If churn jumps >50% with full-quote → clear “dont deploy full-quote” rule.
---
## 4. Phase 0: Runnable scenarios and knob guidance
**Exact scenario JSONs to run** (in `config/scenarios/`):
| Scenario file | Description | Expected pass |
|---------------|-------------|----------------|
| `hub_only_11.json` | Hub topology, all 11 chains, 720 epochs | churn_mean in [0.3, 0.8], capture_mean in [0.10, 0.30], churn_max < 1.5 |
| `full_quote_1_56_137.json` | Full-quote on Ethereum, BSC, Polygon; 720 epochs | Same gates; churn_mean increase vs hub_only_11 < 50% |
| `bridge_shock_137_56.json` | Hub on 1/56/137; 5% migration 137 to 56 over 24 epochs | peak_deviation_bps < 200; damped re-center (not resonant). **Note:** Shock is a **stress injection** (paired local sell/buy), not cross-chain router equilibrium; see §6. |
**One command = one scorecard = pass/fail:** Run sim with scenario JSON; validate output against `config/scorecard-schema.json`; apply gates from section 2.
**If fail, what knob to turn first:**
| Symptom | First knob | Then |
|---------|------------|------|
| Capture too high | Increase feeBps | Then increase k |
| Churn too high | Reduce pool count (hub model only) | Then increase k |
| Intervention cost explodes | Increase latency penalty rho or widen bands | Add caps (maxTradeSizeUnits, maxDailyNotional) |
| Drain half-life too short | Increase k or lower depth | Consider publicRoutingEnabled false on defense pools |
| Path concentration too high | Widen topology or increase fee on dominant pools | Reduce single-pool magnetism |
---
## 5. Bridge shock modeling (Phase 0)
The **bridge shock** scenario (`bridge_shock_137_56.json`) is implemented as a **stress injection**, not as cross-chain path enumeration:
- Each epoch during the shock window, the sim adds **paired local trades**: sell cW→hub on the “from” chain (137), buy cW with hub on the “to” chain (56), at a magnitude that sums to the configured migration over the window.
- This tests **corridor defense under forced migration** (can arb + bot keep deviation and intervention in check?), which is what matters operationally for Phase 0.
- It does **not** model a router endogenously choosing to bridge because its cheaper; that requires cross-chain path selection (PR#3). When you add cross-chain routing, you can validate whether the same stress emerges from router equilibrium.
Be explicit when interpreting results: shock metrics answer “given forced migration, does the system damp?” not “does routing naturally push flow across chains?”
---
## 6. Confirming EUR defaults
Run **hub-only baseline** with (a) USD-only tokens, (b) USD + EUR tokens. Compare: churn_mean, churn_max, peak_deviation_bps, intervention_cost_per_1M_volume. If EUR tokens meaningfully worsen these: increase eurDefaults.k (e.g. 0.25), widen bands for EUR in peg-bands.json, and/or add routing caps (maxTradeSizeUnits) for EUR pools.