diff --git a/docs/00-meta/OPERATOR_HANDOFF_2026_04_24.md b/docs/00-meta/OPERATOR_HANDOFF_2026_04_24.md new file mode 100644 index 00000000..e335fa7f --- /dev/null +++ b/docs/00-meta/OPERATOR_HANDOFF_2026_04_24.md @@ -0,0 +1,77 @@ +# Operator Handoff — 2026-04-24 + +Purpose: concise handoff for the Chain 138 Besu hardening, repo cleanup, and Git/Gitea recovery work completed on 2026-04-24. + +## What changed + +- Chain 138 block production and validator peer health were restored and hardened. +- Strict future-txpool handling is now part of the standard incident path. +- Duplicate legacy Besu RPC CTs were first retired, then destroyed after the canonical fleet was verified healthy. +- Besu inventory was reconciled across all 5 Proxmox nodes, including `r630-03` and `r630-04`. +- `1509` and `1510` were promoted into the canonical Besu inventory and checked-in allowlists/templates. +- A cluster-wide Besu inventory audit was added so host-placement ambiguity is caught mechanically. +- Surgical repo cleanup was completed, nested repos were cleaned and pushed, and the parent repo was reconciled across Gitea and GitHub. +- `gitea.d-bis.org` TLS was repaired after an expired certificate blocked HTTPS pushes. + +## Current live status + +As of the final 2026-04-24 checks: + +- `bash scripts/monitoring/monitor-blockchain-health.sh` + - block production active + - all 5 validators active + - RPC peer count healthy + - global txpool empty + - overall status `HEALTHY` +- `bash scripts/verify/check-cluster-besu-inventory.sh --json` + - all 5 Proxmox nodes online + - `missing_canonical_vmids = []` + - `unexpected_besu_resources = []` + +## Canonical Chain 138 incident sequence + +Use this exact sequence when block production stalls, pending hashes keep reappearing, or future-nonce residue survives a normal txpool clear: + +```bash +bash scripts/fix-all-validators-and-txpool.sh +bash scripts/maintenance/apply-chain138-strict-future-tx-pool.sh +bash scripts/clear-all-transaction-pools.sh +bash scripts/monitoring/monitor-blockchain-health.sh +``` + +## Gitea TLS follow-up + +The immediate HTTPS push blocker was an expired certificate on `gitea.d-bis.org`. The certificate was renewed and reattached through NPMplus #4, and the endpoint now verifies cleanly again. + +Important: the live replacement certificate observed during the final checks expires on `2026-05-01`, so this needs near-term follow-through rather than “set and forget.” + +Use this to check expiry before it becomes an outage: + +```bash +bash scripts/verify/check-gitea-certificate-expiry.sh +WARN_DAYS=30 bash scripts/verify/check-gitea-certificate-expiry.sh +``` + +## Checkpoint commits + +Key parent-repo commits in the final reconciliation chain: + +- `a4738c1` merge of `gitea/master` into cleaned local `master` +- `c23fdf4` explorer submodule alignment to a remote-backed commit +- `7e2d9c5` the-order hook fix pointer update +- `a1eacd3` duplicate Besu CT destruction + cluster inventory audit +- `780648a` thirdweb sentries added to checked-in allowlists/templates +- `219247b` Besu verifier gaps and monitor noise cleanup + +Key nested-repo commits: + +- `cross-chain-pmm-lps` `1cf845c` +- `explorer-monorepo` remote already contained the equivalent live deploy workflow +- `smom-dbis-138` `f3d2961` +- `the-order` `702a836` + +## Recommended operator habits + +- Run `bash scripts/verify/check-cluster-besu-inventory.sh --json` after major topology or host-placement changes. +- Run `bash scripts/verify/check-gitea-certificate-expiry.sh` periodically or wire it into a cron/monitoring path. +- Keep parent-repo submodule pointer pushes behind successful child-repo pushes so no local-only hashes leak into the parent history. diff --git a/docs/00-meta/OPERATOR_READY_CHECKLIST.md b/docs/00-meta/OPERATOR_READY_CHECKLIST.md index 3c76bcff..d359dd34 100644 --- a/docs/00-meta/OPERATOR_READY_CHECKLIST.md +++ b/docs/00-meta/OPERATOR_READY_CHECKLIST.md @@ -17,6 +17,8 @@ **Chain 138 txpool incident standard path:** `bash scripts/fix-all-validators-and-txpool.sh` then `bash scripts/maintenance/apply-chain138-strict-future-tx-pool.sh` then `bash scripts/clear-all-transaction-pools.sh` then `bash scripts/monitoring/monitor-blockchain-health.sh`. Use this sequence when block production stalls, pending hashes keep reappearing, or future-nonce residue survives a normal txpool clear. +**Gitea HTTPS push safeguard:** `bash scripts/verify/check-gitea-certificate-expiry.sh` (optional: `WARN_DAYS=30 bash scripts/verify/check-gitea-certificate-expiry.sh`). Use this when Git over HTTPS starts failing, or run it proactively before major push/deploy windows. + --- ## Completed in this session (2026-03-26) diff --git a/docs/MASTER_INDEX.md b/docs/MASTER_INDEX.md index 40ba23e1..666f5912 100644 --- a/docs/MASTER_INDEX.md +++ b/docs/MASTER_INDEX.md @@ -20,7 +20,9 @@ | **Your personal checklist** | [00-meta/NEXT_STEPS_FOR_YOU.md](00-meta/NEXT_STEPS_FOR_YOU.md) | | **Operator runbook (LAN/creds)** | [00-meta/NEXT_STEPS_OPERATOR.md](00-meta/NEXT_STEPS_OPERATOR.md) | | **Operator copy-paste commands** | [00-meta/OPERATOR_READY_CHECKLIST.md](00-meta/OPERATOR_READY_CHECKLIST.md) — exact commands for Blockscout, NPMplus, CCIP, 502 fix, backup, deploy | +| **2026-04-24 operator handoff** | [00-meta/OPERATOR_HANDOFF_2026_04_24.md](00-meta/OPERATOR_HANDOFF_2026_04_24.md) — Besu hardening, duplicate RPC retirement, Gitea TLS repair, and remote reconciliation | | **Chain 138 txpool incident recovery** | `bash scripts/fix-all-validators-and-txpool.sh` → `bash scripts/maintenance/apply-chain138-strict-future-tx-pool.sh` → `bash scripts/clear-all-transaction-pools.sh` → `bash scripts/monitoring/monitor-blockchain-health.sh` | +| **Gitea TLS expiry check** | `bash scripts/verify/check-gitea-certificate-expiry.sh` — warns before `gitea.d-bis.org` cert expiry blocks HTTPS pushes | | **TsunamiSwap DEX plan** | [00-meta/AAVE_CHAIN138_AND_MARIONETTE_TSUNAMISWAP_PLAN.md](00-meta/AAVE_CHAIN138_AND_MARIONETTE_TSUNAMISWAP_PLAN.md) — canonical TsunamiSwap VM `5010` plan, current DEX link, and publish checklist | | **Required / optional / recommended (full plan)** | [00-meta/COMPLETE_REQUIRED_OPTIONAL_RECOMMENDED_INDEX.md](00-meta/COMPLETE_REQUIRED_OPTIONAL_RECOMMENDED_INDEX.md) | | **Single task list** | [00-meta/TODOS_CONSOLIDATED.md](00-meta/TODOS_CONSOLIDATED.md) | diff --git a/scripts/verify/README.md b/scripts/verify/README.md index 52e92abc..cf62bb2c 100644 --- a/scripts/verify/README.md +++ b/scripts/verify/README.md @@ -39,6 +39,7 @@ One-line install (Debian/Ubuntu): `sudo apt install -y sshpass rsync dnsutils ip - `check-deployer-balance-blockscout-vs-rpc.sh` - Compare deployer native balance from Blockscout API vs RPC (to verify index matches current chain); see [EXPLORER_AND_BLOCKSCAN_REFERENCE](../../docs/11-references/EXPLORER_AND_BLOCKSCAN_REFERENCE.md) - `check-dependencies.sh` - Verify required tools (bash, curl, jq, openssl, ssh) - `check-cluster-besu-inventory.sh` - Cluster-wide Besu inventory audit using `pvesh /cluster/resources` via a Proxmox cluster node so host placement on `r630-03` / `r630-04` is not missed. Prints VMID, type, node, status, name, IP, canonical-vs-extra classification, and any missing canonical VMIDs. Use `--json` for machine-readable output. +- `check-gitea-certificate-expiry.sh` - Read-only TLS expiry check for `gitea.d-bis.org` (or another host passed as arg). Exits `0` when outside the warning window, `1` when within `WARN_DAYS` (default `14`), and `2` on expiry or probe failure. - `check-pnpm-workspace-lockfile.sh` - Ensures every path in `pnpm-workspace.yaml` has an `importer` in `pnpm-lock.yaml` (run `pnpm install` at root if it fails; avoids broken `pnpm outdated -r`) - `export-cloudflare-dns-records.sh` - Export Cloudflare DNS records - `export-npmplus-config.sh` - Export NPMplus proxy hosts and certificates via API diff --git a/scripts/verify/check-gitea-certificate-expiry.sh b/scripts/verify/check-gitea-certificate-expiry.sh new file mode 100755 index 00000000..75f46f25 --- /dev/null +++ b/scripts/verify/check-gitea-certificate-expiry.sh @@ -0,0 +1,81 @@ +#!/usr/bin/env bash +set -euo pipefail + +HOST="${1:-gitea.d-bis.org}" +PORT="${PORT:-443}" +WARN_DAYS="${WARN_DAYS:-14}" +TIMEOUT_SECS="${TIMEOUT_SECS:-15}" + +if ! [[ "$WARN_DAYS" =~ ^[0-9]+$ ]]; then + echo "ERROR: WARN_DAYS must be an integer, got: $WARN_DAYS" >&2 + exit 2 +fi + +if ! command -v openssl >/dev/null 2>&1; then + echo "ERROR: openssl is required" >&2 + exit 2 +fi + +if ! command -v python3 >/dev/null 2>&1; then + echo "ERROR: python3 is required" >&2 + exit 2 +fi + +echo "Checking TLS certificate expiry for ${HOST}:${PORT}" + +cert_text="$( + timeout "$TIMEOUT_SECS" openssl s_client -servername "$HOST" -connect "${HOST}:${PORT}" /dev/null \ + | openssl x509 -noout -issuer -subject -dates +)" + +if [[ -z "$cert_text" ]]; then + echo "ERROR: could not read certificate from ${HOST}:${PORT}" >&2 + exit 2 +fi + +not_after="$(printf '%s\n' "$cert_text" | sed -n 's/^notAfter=//p')" +issuer="$(printf '%s\n' "$cert_text" | sed -n 's/^issuer=//p')" +subject="$(printf '%s\n' "$cert_text" | sed -n 's/^subject=//p')" + +if [[ -z "$not_after" ]]; then + echo "ERROR: certificate did not include notAfter" >&2 + exit 2 +fi + +days_left="$( + python3 - "$not_after" <<'PY' +import sys +from datetime import datetime, timezone + +not_after = sys.argv[1].strip() +expiry = datetime.strptime(not_after, "%b %d %H:%M:%S %Y %Z").replace(tzinfo=timezone.utc) +now = datetime.now(timezone.utc) +delta = expiry - now +print(delta.total_seconds() / 86400) +PY +)" + +days_left_int="$(python3 - "$days_left" <<'PY' +import math +import sys + +print(math.floor(float(sys.argv[1]))) +PY +)" + +echo "Issuer: ${issuer}" +echo "Subject: ${subject}" +echo "Expiry: ${not_after}" +echo "Days left: ${days_left_int}" + +if (( days_left_int < 0 )); then + echo "CRITICAL: certificate for ${HOST} already expired" >&2 + exit 2 +fi + +if (( days_left_int < WARN_DAYS )); then + echo "WARNING: certificate for ${HOST} expires in fewer than ${WARN_DAYS} days" >&2 + exit 1 +fi + +echo "OK: certificate expiry is outside the ${WARN_DAYS}-day warning window"