Files
proxmox/docs/00-meta/NEXT_STEPS_OPERATOR.md
defiQUG 2a6d3cfc7f
Some checks failed
Deploy to Phoenix / deploy (push) Has been cancelled
Update submodule references and improve CI workflow
- Update submodule references for explorer-monorepo and smom-dbis-138 to latest commits.
- Modify CI workflow to include shellcheck installation and enforce error severity for script checks.
- Update contract addresses in configuration and documentation to reflect the new canonical addresses for CCIPWETH9Bridge and CCIP Router.
- Revise integration test documentation to align with updated contract addresses and deployment statuses.

Made-with: Cursor
2026-03-24 22:50:52 -07:00

15 KiB
Raw Permalink Blame History

Next Steps — Operator Runbook

Last Updated: 2026-02-20
Purpose: Single runbook of copy-paste commands for all remaining operator/LAN/creds steps. Use after automated steps are done.

References: REMAINING_WORK_DETAILED_STEPS.md, WAVE2_WAVE3_OPERATOR_CHECKLIST.md, INFRA_DEPLOYMENT_LOCKED_AND_LOADED.md. Single fixes checklist (required + optional): FIXES_PREPARED.md. Full fixes (validators, block/tx, Sentries, RPCs, network, optional): FULL_FIXES_PREPARED.md. All next steps (consolidated): NEXT_STEPS_ALL.md. Dev/Codespaces (76.53.10.40): DEV_CODESPACES_NEXT_STEPS_CHECKLIST.md. Dev/Codespaces completion evidence: DEV_CODESPACES_COMPLETION_20260207.md.


Completed in this session (2026-02-20)

Item Result
Completable tasks run-completable-tasks-from-anywhere.sh — config validation OK, on-chain 45/45, run-all-validation --skip-genesis OK, reconcile-env --print.
Doc consolidation NEXT_STEPS_INDEX, DOCUMENTATION_CONSOLIDATION_PLAN; Batch 4+5 → 00-meta-pruned; root cleanup → archive/root-cleanup-20260220; ARCHIVE_CANDIDATES "Last reviewed" set.

Completed in previous session (2026-02-19)

Item Result
Completable tasks run-completable-tasks-from-anywhere.sh — config, 46 on-chain, validation passed.
Operator script run-all-operator-tasks-from-lan.sh — W0-1 skipped (off-LAN); Blockscout verify attempted (Blockscout unreachable).
RPC 2101 verify verify-rpc-2101-approve-and-sync.sh Chain 138, 19 peers, 5 validators, blocks advancing.
502 script address-all-remaining-502s.sh — backends 10130/10150/10151 OK; Besu 2101 restarted (finish from LAN for NPMplus).
Optional Phase 9 Smart accounts kit (informational) — ran; next: deploy EntryPoint/AccountFactory/Paymaster.
E2E verification verify-end-to-end-routing.sh with E2E_ACCEPT_502_INTERNAL=1 — run (report in verification-evidence).

Still from LAN: NPMplus backup, Blockscout verification, full 502/NPMplus proxy update. See COMPLETION_STATUS_20260215.


Completed in previous session (2026-02-06)

Item Result
Validation run-all-validation.sh --skip-genesis — passed
W1-1 dry-run setup-ssh-key-auth.sh --dry-run — steps printed
W1-2 dry-run firewall-proxmox-8006.sh --dry-run — UFW commands printed (ADMIN_CIDR=192.168.11.0/24)
NPMplus backup backup-npmplus.sh — ran successfully (local + on host); backup pulled to backups/npmplus/backup-20260206_171756.tar.gz
Bridge dry-run run-send-cross-chain.sh 0.01 --dry-run — simulated (real run when PRIVATE_KEY/LINK ready)
.env NPM NPM_URL/NPM_HOST set to 192.168.11.167:81 (use .167 if .166 refuses)
Copy to host Scripts copied to root@192.168.11.11:/tmp/proxmox-scripts-run (wave0, backup, secure-validator-keys, create-missing-containers, schedule cron scripts, daily-weekly-checks)
Wave 0 on host Ran on r630-01: W0-1 (19 NPMplus proxy hosts updated), W0-3 (backup); backup also on host at .../backups/npmplus/backup-20260206_171756.tar.gz
Backup pulled Host backup copied to local backups/npmplus/backup-20260206_171756.tar.gz
Validator keys secure-validator-keys.sh --dry-run run on host — 10001002 would be secured; 10031004 not running, skipped. Use --apply on host when ready.
Cron scripts on host schedule-npmplus-backup-cron.sh and schedule-daily-weekly-cron.sh (and daily-weekly-checks.sh) copied; use --show then --install from /tmp/proxmox-scripts-run if you want cron there (note: /tmp may be cleared on reboot; for permanent cron, clone repo to a persistent path on the host).
Cron installed on host NPMplus backup cron (03:00) and daily/weekly cron (08:00 daily, Sun 09:00 weekly) installed on root@192.168.11.11. Logs: /tmp/proxmox-scripts-run/logs/npmplus-backup.log, daily-weekly-checks.log.
Validator keys applied secure-validator-keys.sh run on host (no --dry-run): VMIDs 1000, 1001, 1002 secured (chmod 600/700, chown besu); 1003, 1004 not running, skipped.

Wave 0 — Gates

W0-2: sendCrossChain (real)

When: PRIVATE_KEY and LINK (or fee token) approved in .env; you are ready to broadcast.

cd /path/to/proxmox
# Optional: dry-run first
bash scripts/bridge/run-send-cross-chain.sh 0.01 --dry-run
# Real (no --dry-run)
bash scripts/bridge/run-send-cross-chain.sh 0.01
# Or with recipient:
bash scripts/bridge/run-send-cross-chain.sh 0.01 0xYourRecipientAddress

Bridge contract (reference): 0xcacfd227A040002e49e2e01626363071324f820a. Ensure CCIPWETH9_BRIDGE_CHAIN138 and RPC_URL_138/CHAIN138_RPC in .env.

W0-3: NPMplus backup (re-run anytime)

Backup already ran once; re-run when NPMplus is up and you want a fresh backup:

cd /path/to/proxmox
bash scripts/verify/backup-npmplus.sh

From a host without NPM API access, use: bash scripts/run-via-proxmox-ssh.sh wave0 --host 192.168.11.11 (r630-01) to run W0-1 + W0-3 on the host.


Crontab (install on jump host or Proxmox node)

cd /path/to/proxmox
# Show lines
bash scripts/maintenance/schedule-npmplus-backup-cron.sh --show
bash scripts/maintenance/schedule-daily-weekly-cron.sh --show
# Install
bash scripts/maintenance/schedule-npmplus-backup-cron.sh --install
bash scripts/maintenance/schedule-daily-weekly-cron.sh --install

Wave 1 — Security (run on each Proxmox host or via SSH)

W1-1: SSH key-based auth (disable password)

Pre-requisite: Deploy SSH keys to all hosts (ssh-copy-id root@<host>); test login; have break-glass access.

cd /path/to/proxmox
# On each Proxmox host (or: ssh root@192.168.11.11 'cd /path/to/proxmox && bash scripts/security/setup-ssh-key-auth.sh --apply')
bash scripts/security/setup-ssh-key-auth.sh --apply

W1-2: Firewall — restrict Proxmox API port 8006

Pre-requisite: Run on host where UFW is used (or apply equivalent iptables). Default CIDR: 192.168.11.0/24.

cd /path/to/proxmox
# Dry-run (already done)
bash scripts/security/firewall-proxmox-8006.sh --dry-run
# Apply (allow only ADMIN_CIDR)
bash scripts/security/firewall-proxmox-8006.sh --apply
# Or with custom CIDR:
bash scripts/security/firewall-proxmox-8006.sh --apply 192.168.11.0/24

Then verify: https://<proxmox-ip>:8006 only from allowed IPs.

W1-19: Secure validator keys (on Proxmox host as root)

cd /path/to/proxmox
bash scripts/secure-validator-keys.sh --dry-run   # review
bash scripts/secure-validator-keys.sh            # apply (chmod 600, chown besu)


VMIDs 2506, 2507, 2508 — Destroyed 2026-02-08

Containers 2506, 2507, 2508 were removed and destroyed on all Proxmox hosts. Script: scripts/destroy-vmids-2506-2508.sh. Besu RPC range is 25002505 only. See MISSING_CONTAINERS_LIST.md.


Dev/Codespaces (76.53.10.40) — Full completion

Single ordered checklist: 04-configuration/DEV_CODESPACES_NEXT_STEPS_CHECKLIST.md — Phases 17 (fourth NPMplus, dev VM, UDM port forward, Cloudflare tunnel, NPMplus proxy hosts, projects/dotenv, verification).

Key commands (after fourth NPMplus and dev VM exist):

Step Command
Create fourth NPMplus LXC (10236 @ 192.168.11.170) bash scripts/npmplus/create-npmplus-fourth-container.sh
Create dev VM (5700 @ 192.168.11.59) bash scripts/create-dev-vm-5700.sh
Setup dev VM users + Gitea ssh root@192.168.11.11 "pct exec 5700 -- bash -s" < scripts/setup-dev-vm-users-and-gitea.sh
Tunnel + DNS (set CLOUDFLARE_TUNNEL_ID_DEV_CODESPACES in .env first) bash scripts/cloudflare/configure-dev-codespaces-tunnel-and-dns.sh
Fourth NPMplus proxy hosts NPM_URL=https://192.168.11.170:81 NPM_PASSWORD='...' bash scripts/nginx-proxy-manager/update-npmplus-fourth-proxy-hosts.sh

UDM Pro: add port forward 76.53.10.40 → 192.168.11.170 (80/81/443), optional 22 → 192.168.11.59. See UDM_PRO_DEV_CODESPACES_PORT_FORWARD.md.


Wave 2 & Wave 3 — Full checklist

Use the ordered checklist:

Summary:

Wave Tasks
W2-1 Monitoring stack (Prometheus, Grafana, Loki, Alertmanager)
W2-2 Grafana via Cloudflare Access; alerts
W2-3 VLAN enablement (UDM Pro, Proxmox bridge)
W2-4 Phase 3 CCIP: Ops/Admin (54005401); NAT; scripts
W2-5 Phase 4 sovereign tenant VLANs
W2-6 25062508 Destroyed 2026-02-08 (RPC 25002505 only)
W2-7 DBIS services (1010010151)
W2-8 NPMplus HA (optional)
W3-1 CCIP Fleet (commit/execute/RMN nodes)
W3-2 Phase 4 tenant isolation enforcement

Explorer SSL (manual)

If explorer.d-bis.org shows "Your connection isn't private":

  1. Open NPMplus: https://192.168.11.167:81 (credentials: NPM_EMAIL, NPM_PASSWORD from .env).
  2. SSL Certificates → Add Let's Encrypt for explorer.d-bis.org (DNS Challenge + Cloudflare credential if needed).
  3. Proxy Hosts → explorer.d-bis.org → SSL tab → assign cert, Force SSL, Save.

See EXPLORER_TROUBLESHOOTING.md.


E2E 502s (when public domains return 502)

From LAN (SSH to Proxmox + reach NPMplus):

Goal Command
Fix all 502 backends + NPMplus proxy + RPC diagnostics ./scripts/maintenance/address-all-remaining-502s.sh
Also Besu config fix + E2E at end ./scripts/maintenance/address-all-remaining-502s.sh --run-besu-fix --e2e
Re-run E2E only ./scripts/verify/verify-end-to-end-routing.sh

Runbook: 502_DEEP_DIVE_ROOT_CAUSES_AND_FIXES.md.


Remaining (operator only)

  • W0-2 — sendCrossChain real (when PRIVATE_KEY/LINK ready).
  • W1-1 / W1-2 — SSH key auth and firewall 8006 --apply on each Proxmox host (after keys deployed / CIDR decided).
  • Cron Installed on root@192.168.11.11 (NPMplus 03:00; daily 08:00; weekly Sun 09:00). Re-install if you move repo to a permanent path.
  • Validator keys Applied on host for 10001002; 10031004 skipped (not running). Re-run when 1003/1004 are up if needed.
  • 25062508 — Destroyed 2026-02-08; no action.
  • Wave 2 / 3 — Monitoring, VLAN, CCIP, NPMplus HA, Phase 4 per WAVE2_WAVE3_OPERATOR_CHECKLIST.
  • Explorer SSL — Let's Encrypt for explorer.d-bis.org in NPMplus UI (see above). One-time (and after NPMplus restore if certs lost).
  • Explorer VM 5000 thin pool — If thin1-r630-02 is >85% or full, migrate VMID 5000 to thin5 per BLOCKSCOUT_FIX_RUNBOOK.md § "Fix: Migrate VM 5000 to thin5". Weekly cron now checks thin pool (138a); act when it warns or fails.
  • NPMplus cert 134 (cross-all.defi-oracle.io) — If verification reports "cert files missing" for cert ID 134: in NPMplus at https://192.168.11.167:81 → SSL Certificates → find cross-all.defi-oracle.io → re-save or request Let's Encrypt again to restore cert files on disk.
  • Dev/Codespaces (76.53.10.40) — Complete all phases in DEV_CODESPACES_NEXT_STEPS_CHECKLIST.md: fourth NPMplus (10236), dev VM (5700), UDM port forward, Cloudflare tunnel, NPMplus fourth proxy hosts, Let's Encrypt, rsync/dotenv, verification.

After running "complete all next steps"

  1. Automated (workspace): bash scripts/run-all-next-steps.sh — report in docs/04-configuration/verification-evidence/NEXT_STEPS_RUN_*.md.
  2. Validators + tx-pool: bash scripts/fix-all-validators-and-txpool.sh (requires SSH to .10, .11).
  3. Flush stuck tx (if any): bash scripts/flush-stuck-tx-rpc-and-validators.sh --full (clears RPC 2101 + validators 10001004).
  4. Verify from LAN: From a host on 192.168.11.x run bash scripts/monitoring/monitor-blockchain-health.sh and bash scripts/skip-stuck-transactions.sh. See NEXT_STEPS_COMPLETION_RUN_20260208.md § Verify from LAN.

Quick command index

Goal Command
Run all automated next steps bash scripts/run-all-next-steps.sh (validation, E2E, explorer check, dry-runs; report in verification-evidence/NEXT_STEPS_RUN_*.md)
W0-2 real bash scripts/bridge/run-send-cross-chain.sh 0.01
W0-3 backup bash scripts/verify/backup-npmplus.sh
W0 from LAN bash scripts/run-wave0-from-lan.sh
W1-1 apply bash scripts/security/setup-ssh-key-auth.sh --apply (on each host)
W1-2 apply bash scripts/security/firewall-proxmox-8006.sh --apply
NPMplus cron bash scripts/maintenance/schedule-npmplus-backup-cron.sh --install
Daily/weekly cron bash scripts/maintenance/schedule-daily-weekly-cron.sh --install
Validator keys On Proxmox: bash scripts/secure-validator-keys.sh (after --dry-run)
Wave 0 via SSH bash scripts/run-via-proxmox-ssh.sh wave0 --host 192.168.11.11
Request cert (via SSH) bash scripts/run-via-proxmox-ssh.sh request-cert --host 192.168.11.11
Fourth NPMplus container bash scripts/npmplus/create-npmplus-fourth-container.sh
Dev VM create bash scripts/create-dev-vm-5700.sh
Dev/Codespaces tunnel+DNS bash scripts/cloudflare/configure-dev-codespaces-tunnel-and-dns.sh (set CLOUDFLARE_TUNNEL_ID_DEV_CODESPACES in .env)
Fourth NPMplus proxy hosts NPM_URL=https://192.168.11.170:81 NPM_PASSWORD='...' bash scripts/nginx-proxy-manager/update-npmplus-fourth-proxy-hosts.sh
Address all 502s (LAN) ./scripts/maintenance/address-all-remaining-502s.sh (use --run-besu-fix --e2e for full flow)
E2E routing (after NPMplus/DNS change) bash scripts/verify/verify-end-to-end-routing.sh
Explorer E2E from LAN (after frontend/Blockscout deploy) bash explorer-monorepo/scripts/e2e-test-explorer.sh
Blockscout migrations (version/config change) On r630-02: bash scripts/fix-blockscout-ssl-and-migrations.sh — see BLOCKSCOUT_FIX_RUNBOOK.md
When decommissioning RPC used by explorer Update Blockscout RPC URL on VM 5000; restart Blockscout — see OPERATIONAL_RUNBOOKS.md § "When decommissioning or changing RPC nodes"