Files
proxmox/docs/00-meta/OPERATOR_HANDOFF_2026_04_24.md
defiQUG 60b8fb9ddc
Some checks failed
Deploy to Phoenix / validate (push) Failing after 30s
Deploy to Phoenix / deploy (push) Has been skipped
Deploy to Phoenix / deploy-atomic-swap-dapp (push) Has been skipped
Deploy to Phoenix / cloudflare (push) Has been skipped
ops: schedule gitea tls expiry monitoring
2026-04-24 18:25:28 -07:00

3.6 KiB

Operator Handoff — 2026-04-24

Purpose: concise handoff for the Chain 138 Besu hardening, repo cleanup, and Git/Gitea recovery work completed on 2026-04-24.

What changed

  • Chain 138 block production and validator peer health were restored and hardened.
  • Strict future-txpool handling is now part of the standard incident path.
  • Duplicate legacy Besu RPC CTs were first retired, then destroyed after the canonical fleet was verified healthy.
  • Besu inventory was reconciled across all 5 Proxmox nodes, including r630-03 and r630-04.
  • 1509 and 1510 were promoted into the canonical Besu inventory and checked-in allowlists/templates.
  • A cluster-wide Besu inventory audit was added so host-placement ambiguity is caught mechanically.
  • Surgical repo cleanup was completed, nested repos were cleaned and pushed, and the parent repo was reconciled across Gitea and GitHub.
  • gitea.d-bis.org TLS was repaired after an expired certificate blocked HTTPS pushes.

Current live status

As of the final 2026-04-24 checks:

  • bash scripts/monitoring/monitor-blockchain-health.sh
    • block production active
    • all 5 validators active
    • RPC peer count healthy
    • global txpool empty
    • overall status HEALTHY
  • bash scripts/verify/check-cluster-besu-inventory.sh --json
    • all 5 Proxmox nodes online
    • missing_canonical_vmids = []
    • unexpected_besu_resources = []

Canonical Chain 138 incident sequence

Use this exact sequence when block production stalls, pending hashes keep reappearing, or future-nonce residue survives a normal txpool clear:

bash scripts/fix-all-validators-and-txpool.sh
bash scripts/maintenance/apply-chain138-strict-future-tx-pool.sh
bash scripts/clear-all-transaction-pools.sh
bash scripts/monitoring/monitor-blockchain-health.sh

Gitea TLS follow-up

The immediate HTTPS push blocker was an expired certificate on gitea.d-bis.org. The certificate was renewed and reattached through NPMplus #4, and the endpoint now verifies cleanly again.

Root cause of the short warning window: the live NPMplus certbot renewal config for npm-7 included required_profile = shortlived, which forced a 7-day Let's Encrypt certificate instead of the normal 90-day issuance.

That live config was corrected on 2026-04-24 and gitea.d-bis.org was reissued successfully. The current live certificate now expires on 2026-07-24.

Use this to check expiry before it becomes an outage:

bash scripts/verify/check-gitea-certificate-expiry.sh
WARN_DAYS=30 bash scripts/verify/check-gitea-certificate-expiry.sh
bash scripts/maintenance/schedule-gitea-cert-check-cron.sh --install

Checkpoint commits

Key parent-repo commits in the final reconciliation chain:

  • a4738c1 merge of gitea/master into cleaned local master
  • c23fdf4 explorer submodule alignment to a remote-backed commit
  • 7e2d9c5 the-order hook fix pointer update
  • a1eacd3 duplicate Besu CT destruction + cluster inventory audit
  • 780648a thirdweb sentries added to checked-in allowlists/templates
  • 219247b Besu verifier gaps and monitor noise cleanup

Key nested-repo commits:

  • cross-chain-pmm-lps 1cf845c
  • explorer-monorepo remote already contained the equivalent live deploy workflow
  • smom-dbis-138 f3d2961
  • the-order 702a836
  • Run bash scripts/verify/check-cluster-besu-inventory.sh --json after major topology or host-placement changes.
  • Run bash scripts/verify/check-gitea-certificate-expiry.sh periodically or wire it into a cron/monitoring path.
  • Keep parent-repo submodule pointer pushes behind successful child-repo pushes so no local-only hashes leak into the parent history.