7.1 KiB
Simulation Scorecard Contract
Simulator runs must emit a scorecard JSON so results are comparable and deployability can be gated. This document defines the output schema and pass/fail gates.
1. Scorecard JSON schema
Every run (hub-only, full-quote, bridge shock) should produce a scorecard with at least:
| Field | Type | Description |
|---|---|---|
capture_mean |
number | Mean router capture ratio (fraction 0–1) across chains/tokens |
capture_p95 |
number | 95th percentile capture |
churn_mean |
number | Mean inventory churn per epoch |
churn_p95 |
number | 95th percentile churn |
churn_max |
number | Max churn observed |
intervention_cost_total |
number | Total intervention cost (bridge/mint/burn) in run |
intervention_cost_per_1M_volume |
number | Intervention cost per 1M routed volume |
peak_deviation_bps |
number | Peak peg deviation in basis points |
reflexive_route_count |
number | Count of multi-hop routes through multiple PMM pools (reflexivity) |
drain_half_life_epochs |
object | Per (token, chain): epochs until PMM inventory halves under routing pressure; too short = routing magnet |
path_concentration_index |
number | HHI on path shares (0–1); high = you dominate execution; lower = flow diversified (safer) |
arb_volume_total |
number | Total volume traded by arb step (PR#2) |
arb_profit_total |
number | Arb profit from execution (actual PMM output vs oracle), not mid (PR#2 refinement) |
peak_deviation_bps_pre_arb |
number | Max pool |δ| before arb step (diagnostic: is arb doing the work?) |
peak_deviation_bps_post_arb |
number | Max pool |δ| after arb (current primary gate) |
peak_deviation_bps_post_bot |
number | Max pool |δ| after bot (inventory rebalancing effect) |
intervention_cost_inject_total |
number | Bot inject (bridge-in) cost only |
intervention_cost_withdraw_total |
number | Bot withdraw cost only |
intervention_cost_by_chain |
object | Per chain: { inject, withdraw } — which chains are liquidity sinks |
scenario |
string | e.g. hub_only_11, full_quote_1_56_137, bridge_shock_137_56 |
runId |
string | Optional run identifier |
Example (minimal):
{
"scenario": "hub_only",
"capture_mean": 0.18,
"capture_p95": 0.28,
"churn_mean": 0.45,
"churn_p95": 0.82,
"churn_max": 1.1,
"intervention_cost_total": 1200,
"intervention_cost_per_1M_volume": 8.5,
"peak_deviation_bps": 18,
"reflexive_route_count": 0
}
A machine-readable schema lives in config/scorecard-schema.json for validation.
2. Pass/fail gates (“deployable” scenarios)
From 10-behavioral-stability-analysis.md:
| Gate | Condition | Rationale |
|---|---|---|
| Churn (normal) | churn_mean in [0.3, 0.8] |
Healthy baseline |
| Churn (stress) | churn_max < 1.5 |
Avoid constant bot intervention |
| Capture (baseline) | capture_mean in [0.10, 0.30] |
Peg stabilizer, not global venue |
| Intervention | intervention_cost_per_1M_volume stable (no explosive jump vs baseline) |
Linear in volume |
| Full-quote vs hub | If full-quote: churn_mean increase vs hub < 50% |
Don’t deploy full-quote if churn jumps >50% |
| Peak deviation | peak_deviation_bps below circuit-break (e.g. 200 bps USD) |
Stay inside band |
| Drain half-life | drain_half_life_epochs not collapsing under full-quote vs hub |
Routing magnet check |
| Path concentration | path_concentration_index not spiking during bridge shock |
Diversified routing |
| Reflexivity | reflexive_route_count low relative to total routes |
Avoid feedback loops |
Sanity checks (PR#2): Arb volume should rise when k is tight; bot interventions should rise when inventory targets are low.
Pass: All gates satisfied for the scenario.
Fail: Any gate violated; do not treat scenario as deployable without parameter change or topology reduction.
3. Phase 0 comparison (three scenarios)
Run and compare:
- Hub-only across all 11 chains
- Full-quote only on 1, 56, 137
- Bridge shock (e.g. 137 → 56)
Compare deltas:
- churn +% (full-quote vs hub)
- intervention cost +% (full-quote vs hub)
- peak deviation under shock
If churn jumps >50% with full-quote → clear “don’t deploy full-quote” rule.
4. Phase 0: Runnable scenarios and knob guidance
Exact scenario JSONs to run (in config/scenarios/):
| Scenario file | Description | Expected pass |
|---|---|---|
hub_only_11.json |
Hub topology, all 11 chains, 720 epochs | churn_mean in [0.3, 0.8], capture_mean in [0.10, 0.30], churn_max < 1.5 |
full_quote_1_56_137.json |
Full-quote on Ethereum, BSC, Polygon; 720 epochs | Same gates; churn_mean increase vs hub_only_11 < 50% |
bridge_shock_137_56.json |
Hub on 1/56/137; 5% migration 137 to 56 over 24 epochs | peak_deviation_bps < 200; damped re-center (not resonant). Note: Shock is a stress injection (paired local sell/buy), not cross-chain router equilibrium; see §6. |
One command = one scorecard = pass/fail: Run sim with scenario JSON; validate output against config/scorecard-schema.json; apply gates from section 2.
If fail, what knob to turn first:
| Symptom | First knob | Then |
|---|---|---|
| Capture too high | Increase feeBps | Then increase k |
| Churn too high | Reduce pool count (hub model only) | Then increase k |
| Intervention cost explodes | Increase latency penalty rho or widen bands | Add caps (maxTradeSizeUnits, maxDailyNotional) |
| Drain half-life too short | Increase k or lower depth | Consider publicRoutingEnabled false on defense pools |
| Path concentration too high | Widen topology or increase fee on dominant pools | Reduce single-pool magnetism |
5. Bridge shock modeling (Phase 0)
The bridge shock scenario (bridge_shock_137_56.json) is implemented as a stress injection, not as cross-chain path enumeration:
- Each epoch during the shock window, the sim adds paired local trades: sell cW→hub on the “from” chain (137), buy cW with hub on the “to” chain (56), at a magnitude that sums to the configured migration over the window.
- This tests corridor defense under forced migration (can arb + bot keep deviation and intervention in check?), which is what matters operationally for Phase 0.
- It does not model a router endogenously choosing to bridge because it’s cheaper; that requires cross-chain path selection (PR#3). When you add cross-chain routing, you can validate whether the same stress emerges from router equilibrium.
Be explicit when interpreting results: shock metrics answer “given forced migration, does the system damp?” not “does routing naturally push flow across chains?”
6. Confirming EUR defaults
Run hub-only baseline with (a) USD-only tokens, (b) USD + EUR tokens. Compare: churn_mean, churn_max, peak_deviation_bps, intervention_cost_per_1M_volume. If EUR tokens meaningfully worsen these: increase eurDefaults.k (e.g. 0.25), widen bands for EUR in peg-bands.json, and/or add routing caps (maxTradeSizeUnits) for EUR pools.