ops: add gitea tls handoff and expiry check
Some checks failed
Deploy to Phoenix / validate (push) Failing after 29s
Deploy to Phoenix / deploy (push) Has been skipped
Deploy to Phoenix / deploy-atomic-swap-dapp (push) Has been skipped
Deploy to Phoenix / cloudflare (push) Has been skipped

This commit is contained in:
defiQUG
2026-04-24 16:30:18 -07:00
parent a4738c1376
commit 01c36a5489
5 changed files with 163 additions and 0 deletions

View File

@@ -0,0 +1,77 @@
# Operator Handoff — 2026-04-24
Purpose: concise handoff for the Chain 138 Besu hardening, repo cleanup, and Git/Gitea recovery work completed on 2026-04-24.
## What changed
- Chain 138 block production and validator peer health were restored and hardened.
- Strict future-txpool handling is now part of the standard incident path.
- Duplicate legacy Besu RPC CTs were first retired, then destroyed after the canonical fleet was verified healthy.
- Besu inventory was reconciled across all 5 Proxmox nodes, including `r630-03` and `r630-04`.
- `1509` and `1510` were promoted into the canonical Besu inventory and checked-in allowlists/templates.
- A cluster-wide Besu inventory audit was added so host-placement ambiguity is caught mechanically.
- Surgical repo cleanup was completed, nested repos were cleaned and pushed, and the parent repo was reconciled across Gitea and GitHub.
- `gitea.d-bis.org` TLS was repaired after an expired certificate blocked HTTPS pushes.
## Current live status
As of the final 2026-04-24 checks:
- `bash scripts/monitoring/monitor-blockchain-health.sh`
- block production active
- all 5 validators active
- RPC peer count healthy
- global txpool empty
- overall status `HEALTHY`
- `bash scripts/verify/check-cluster-besu-inventory.sh --json`
- all 5 Proxmox nodes online
- `missing_canonical_vmids = []`
- `unexpected_besu_resources = []`
## Canonical Chain 138 incident sequence
Use this exact sequence when block production stalls, pending hashes keep reappearing, or future-nonce residue survives a normal txpool clear:
```bash
bash scripts/fix-all-validators-and-txpool.sh
bash scripts/maintenance/apply-chain138-strict-future-tx-pool.sh
bash scripts/clear-all-transaction-pools.sh
bash scripts/monitoring/monitor-blockchain-health.sh
```
## Gitea TLS follow-up
The immediate HTTPS push blocker was an expired certificate on `gitea.d-bis.org`. The certificate was renewed and reattached through NPMplus #4, and the endpoint now verifies cleanly again.
Important: the live replacement certificate observed during the final checks expires on `2026-05-01`, so this needs near-term follow-through rather than “set and forget.”
Use this to check expiry before it becomes an outage:
```bash
bash scripts/verify/check-gitea-certificate-expiry.sh
WARN_DAYS=30 bash scripts/verify/check-gitea-certificate-expiry.sh
```
## Checkpoint commits
Key parent-repo commits in the final reconciliation chain:
- `a4738c1` merge of `gitea/master` into cleaned local `master`
- `c23fdf4` explorer submodule alignment to a remote-backed commit
- `7e2d9c5` the-order hook fix pointer update
- `a1eacd3` duplicate Besu CT destruction + cluster inventory audit
- `780648a` thirdweb sentries added to checked-in allowlists/templates
- `219247b` Besu verifier gaps and monitor noise cleanup
Key nested-repo commits:
- `cross-chain-pmm-lps` `1cf845c`
- `explorer-monorepo` remote already contained the equivalent live deploy workflow
- `smom-dbis-138` `f3d2961`
- `the-order` `702a836`
## Recommended operator habits
- Run `bash scripts/verify/check-cluster-besu-inventory.sh --json` after major topology or host-placement changes.
- Run `bash scripts/verify/check-gitea-certificate-expiry.sh` periodically or wire it into a cron/monitoring path.
- Keep parent-repo submodule pointer pushes behind successful child-repo pushes so no local-only hashes leak into the parent history.

View File

@@ -17,6 +17,8 @@
**Chain 138 txpool incident standard path:** `bash scripts/fix-all-validators-and-txpool.sh` then `bash scripts/maintenance/apply-chain138-strict-future-tx-pool.sh` then `bash scripts/clear-all-transaction-pools.sh` then `bash scripts/monitoring/monitor-blockchain-health.sh`. Use this sequence when block production stalls, pending hashes keep reappearing, or future-nonce residue survives a normal txpool clear.
**Gitea HTTPS push safeguard:** `bash scripts/verify/check-gitea-certificate-expiry.sh` (optional: `WARN_DAYS=30 bash scripts/verify/check-gitea-certificate-expiry.sh`). Use this when Git over HTTPS starts failing, or run it proactively before major push/deploy windows.
---
## Completed in this session (2026-03-26)

View File

@@ -20,7 +20,9 @@
| **Your personal checklist** | [00-meta/NEXT_STEPS_FOR_YOU.md](00-meta/NEXT_STEPS_FOR_YOU.md) |
| **Operator runbook (LAN/creds)** | [00-meta/NEXT_STEPS_OPERATOR.md](00-meta/NEXT_STEPS_OPERATOR.md) |
| **Operator copy-paste commands** | [00-meta/OPERATOR_READY_CHECKLIST.md](00-meta/OPERATOR_READY_CHECKLIST.md) — exact commands for Blockscout, NPMplus, CCIP, 502 fix, backup, deploy |
| **2026-04-24 operator handoff** | [00-meta/OPERATOR_HANDOFF_2026_04_24.md](00-meta/OPERATOR_HANDOFF_2026_04_24.md) — Besu hardening, duplicate RPC retirement, Gitea TLS repair, and remote reconciliation |
| **Chain 138 txpool incident recovery** | `bash scripts/fix-all-validators-and-txpool.sh``bash scripts/maintenance/apply-chain138-strict-future-tx-pool.sh``bash scripts/clear-all-transaction-pools.sh``bash scripts/monitoring/monitor-blockchain-health.sh` |
| **Gitea TLS expiry check** | `bash scripts/verify/check-gitea-certificate-expiry.sh` — warns before `gitea.d-bis.org` cert expiry blocks HTTPS pushes |
| **TsunamiSwap DEX plan** | [00-meta/AAVE_CHAIN138_AND_MARIONETTE_TSUNAMISWAP_PLAN.md](00-meta/AAVE_CHAIN138_AND_MARIONETTE_TSUNAMISWAP_PLAN.md) — canonical TsunamiSwap VM `5010` plan, current DEX link, and publish checklist |
| **Required / optional / recommended (full plan)** | [00-meta/COMPLETE_REQUIRED_OPTIONAL_RECOMMENDED_INDEX.md](00-meta/COMPLETE_REQUIRED_OPTIONAL_RECOMMENDED_INDEX.md) |
| **Single task list** | [00-meta/TODOS_CONSOLIDATED.md](00-meta/TODOS_CONSOLIDATED.md) |

View File

@@ -39,6 +39,7 @@ One-line install (Debian/Ubuntu): `sudo apt install -y sshpass rsync dnsutils ip
- `check-deployer-balance-blockscout-vs-rpc.sh` - Compare deployer native balance from Blockscout API vs RPC (to verify index matches current chain); see [EXPLORER_AND_BLOCKSCAN_REFERENCE](../../docs/11-references/EXPLORER_AND_BLOCKSCAN_REFERENCE.md)
- `check-dependencies.sh` - Verify required tools (bash, curl, jq, openssl, ssh)
- `check-cluster-besu-inventory.sh` - Cluster-wide Besu inventory audit using `pvesh /cluster/resources` via a Proxmox cluster node so host placement on `r630-03` / `r630-04` is not missed. Prints VMID, type, node, status, name, IP, canonical-vs-extra classification, and any missing canonical VMIDs. Use `--json` for machine-readable output.
- `check-gitea-certificate-expiry.sh` - Read-only TLS expiry check for `gitea.d-bis.org` (or another host passed as arg). Exits `0` when outside the warning window, `1` when within `WARN_DAYS` (default `14`), and `2` on expiry or probe failure.
- `check-pnpm-workspace-lockfile.sh` - Ensures every path in `pnpm-workspace.yaml` has an `importer` in `pnpm-lock.yaml` (run `pnpm install` at root if it fails; avoids broken `pnpm outdated -r`)
- `export-cloudflare-dns-records.sh` - Export Cloudflare DNS records
- `export-npmplus-config.sh` - Export NPMplus proxy hosts and certificates via API

View File

@@ -0,0 +1,81 @@
#!/usr/bin/env bash
set -euo pipefail
HOST="${1:-gitea.d-bis.org}"
PORT="${PORT:-443}"
WARN_DAYS="${WARN_DAYS:-14}"
TIMEOUT_SECS="${TIMEOUT_SECS:-15}"
if ! [[ "$WARN_DAYS" =~ ^[0-9]+$ ]]; then
echo "ERROR: WARN_DAYS must be an integer, got: $WARN_DAYS" >&2
exit 2
fi
if ! command -v openssl >/dev/null 2>&1; then
echo "ERROR: openssl is required" >&2
exit 2
fi
if ! command -v python3 >/dev/null 2>&1; then
echo "ERROR: python3 is required" >&2
exit 2
fi
echo "Checking TLS certificate expiry for ${HOST}:${PORT}"
cert_text="$(
timeout "$TIMEOUT_SECS" openssl s_client -servername "$HOST" -connect "${HOST}:${PORT}" </dev/null 2>/dev/null \
| openssl x509 -noout -issuer -subject -dates
)"
if [[ -z "$cert_text" ]]; then
echo "ERROR: could not read certificate from ${HOST}:${PORT}" >&2
exit 2
fi
not_after="$(printf '%s\n' "$cert_text" | sed -n 's/^notAfter=//p')"
issuer="$(printf '%s\n' "$cert_text" | sed -n 's/^issuer=//p')"
subject="$(printf '%s\n' "$cert_text" | sed -n 's/^subject=//p')"
if [[ -z "$not_after" ]]; then
echo "ERROR: certificate did not include notAfter" >&2
exit 2
fi
days_left="$(
python3 - "$not_after" <<'PY'
import sys
from datetime import datetime, timezone
not_after = sys.argv[1].strip()
expiry = datetime.strptime(not_after, "%b %d %H:%M:%S %Y %Z").replace(tzinfo=timezone.utc)
now = datetime.now(timezone.utc)
delta = expiry - now
print(delta.total_seconds() / 86400)
PY
)"
days_left_int="$(python3 - "$days_left" <<'PY'
import math
import sys
print(math.floor(float(sys.argv[1])))
PY
)"
echo "Issuer: ${issuer}"
echo "Subject: ${subject}"
echo "Expiry: ${not_after}"
echo "Days left: ${days_left_int}"
if (( days_left_int < 0 )); then
echo "CRITICAL: certificate for ${HOST} already expired" >&2
exit 2
fi
if (( days_left_int < WARN_DAYS )); then
echo "WARNING: certificate for ${HOST} expires in fewer than ${WARN_DAYS} days" >&2
exit 1
fi
echo "OK: certificate expiry is outside the ${WARN_DAYS}-day warning window"