ops: add gitea tls handoff and expiry check
This commit is contained in:
77
docs/00-meta/OPERATOR_HANDOFF_2026_04_24.md
Normal file
77
docs/00-meta/OPERATOR_HANDOFF_2026_04_24.md
Normal file
@@ -0,0 +1,77 @@
|
||||
# Operator Handoff — 2026-04-24
|
||||
|
||||
Purpose: concise handoff for the Chain 138 Besu hardening, repo cleanup, and Git/Gitea recovery work completed on 2026-04-24.
|
||||
|
||||
## What changed
|
||||
|
||||
- Chain 138 block production and validator peer health were restored and hardened.
|
||||
- Strict future-txpool handling is now part of the standard incident path.
|
||||
- Duplicate legacy Besu RPC CTs were first retired, then destroyed after the canonical fleet was verified healthy.
|
||||
- Besu inventory was reconciled across all 5 Proxmox nodes, including `r630-03` and `r630-04`.
|
||||
- `1509` and `1510` were promoted into the canonical Besu inventory and checked-in allowlists/templates.
|
||||
- A cluster-wide Besu inventory audit was added so host-placement ambiguity is caught mechanically.
|
||||
- Surgical repo cleanup was completed, nested repos were cleaned and pushed, and the parent repo was reconciled across Gitea and GitHub.
|
||||
- `gitea.d-bis.org` TLS was repaired after an expired certificate blocked HTTPS pushes.
|
||||
|
||||
## Current live status
|
||||
|
||||
As of the final 2026-04-24 checks:
|
||||
|
||||
- `bash scripts/monitoring/monitor-blockchain-health.sh`
|
||||
- block production active
|
||||
- all 5 validators active
|
||||
- RPC peer count healthy
|
||||
- global txpool empty
|
||||
- overall status `HEALTHY`
|
||||
- `bash scripts/verify/check-cluster-besu-inventory.sh --json`
|
||||
- all 5 Proxmox nodes online
|
||||
- `missing_canonical_vmids = []`
|
||||
- `unexpected_besu_resources = []`
|
||||
|
||||
## Canonical Chain 138 incident sequence
|
||||
|
||||
Use this exact sequence when block production stalls, pending hashes keep reappearing, or future-nonce residue survives a normal txpool clear:
|
||||
|
||||
```bash
|
||||
bash scripts/fix-all-validators-and-txpool.sh
|
||||
bash scripts/maintenance/apply-chain138-strict-future-tx-pool.sh
|
||||
bash scripts/clear-all-transaction-pools.sh
|
||||
bash scripts/monitoring/monitor-blockchain-health.sh
|
||||
```
|
||||
|
||||
## Gitea TLS follow-up
|
||||
|
||||
The immediate HTTPS push blocker was an expired certificate on `gitea.d-bis.org`. The certificate was renewed and reattached through NPMplus #4, and the endpoint now verifies cleanly again.
|
||||
|
||||
Important: the live replacement certificate observed during the final checks expires on `2026-05-01`, so this needs near-term follow-through rather than “set and forget.”
|
||||
|
||||
Use this to check expiry before it becomes an outage:
|
||||
|
||||
```bash
|
||||
bash scripts/verify/check-gitea-certificate-expiry.sh
|
||||
WARN_DAYS=30 bash scripts/verify/check-gitea-certificate-expiry.sh
|
||||
```
|
||||
|
||||
## Checkpoint commits
|
||||
|
||||
Key parent-repo commits in the final reconciliation chain:
|
||||
|
||||
- `a4738c1` merge of `gitea/master` into cleaned local `master`
|
||||
- `c23fdf4` explorer submodule alignment to a remote-backed commit
|
||||
- `7e2d9c5` the-order hook fix pointer update
|
||||
- `a1eacd3` duplicate Besu CT destruction + cluster inventory audit
|
||||
- `780648a` thirdweb sentries added to checked-in allowlists/templates
|
||||
- `219247b` Besu verifier gaps and monitor noise cleanup
|
||||
|
||||
Key nested-repo commits:
|
||||
|
||||
- `cross-chain-pmm-lps` `1cf845c`
|
||||
- `explorer-monorepo` remote already contained the equivalent live deploy workflow
|
||||
- `smom-dbis-138` `f3d2961`
|
||||
- `the-order` `702a836`
|
||||
|
||||
## Recommended operator habits
|
||||
|
||||
- Run `bash scripts/verify/check-cluster-besu-inventory.sh --json` after major topology or host-placement changes.
|
||||
- Run `bash scripts/verify/check-gitea-certificate-expiry.sh` periodically or wire it into a cron/monitoring path.
|
||||
- Keep parent-repo submodule pointer pushes behind successful child-repo pushes so no local-only hashes leak into the parent history.
|
||||
@@ -17,6 +17,8 @@
|
||||
|
||||
**Chain 138 txpool incident standard path:** `bash scripts/fix-all-validators-and-txpool.sh` then `bash scripts/maintenance/apply-chain138-strict-future-tx-pool.sh` then `bash scripts/clear-all-transaction-pools.sh` then `bash scripts/monitoring/monitor-blockchain-health.sh`. Use this sequence when block production stalls, pending hashes keep reappearing, or future-nonce residue survives a normal txpool clear.
|
||||
|
||||
**Gitea HTTPS push safeguard:** `bash scripts/verify/check-gitea-certificate-expiry.sh` (optional: `WARN_DAYS=30 bash scripts/verify/check-gitea-certificate-expiry.sh`). Use this when Git over HTTPS starts failing, or run it proactively before major push/deploy windows.
|
||||
|
||||
---
|
||||
|
||||
## Completed in this session (2026-03-26)
|
||||
|
||||
@@ -20,7 +20,9 @@
|
||||
| **Your personal checklist** | [00-meta/NEXT_STEPS_FOR_YOU.md](00-meta/NEXT_STEPS_FOR_YOU.md) |
|
||||
| **Operator runbook (LAN/creds)** | [00-meta/NEXT_STEPS_OPERATOR.md](00-meta/NEXT_STEPS_OPERATOR.md) |
|
||||
| **Operator copy-paste commands** | [00-meta/OPERATOR_READY_CHECKLIST.md](00-meta/OPERATOR_READY_CHECKLIST.md) — exact commands for Blockscout, NPMplus, CCIP, 502 fix, backup, deploy |
|
||||
| **2026-04-24 operator handoff** | [00-meta/OPERATOR_HANDOFF_2026_04_24.md](00-meta/OPERATOR_HANDOFF_2026_04_24.md) — Besu hardening, duplicate RPC retirement, Gitea TLS repair, and remote reconciliation |
|
||||
| **Chain 138 txpool incident recovery** | `bash scripts/fix-all-validators-and-txpool.sh` → `bash scripts/maintenance/apply-chain138-strict-future-tx-pool.sh` → `bash scripts/clear-all-transaction-pools.sh` → `bash scripts/monitoring/monitor-blockchain-health.sh` |
|
||||
| **Gitea TLS expiry check** | `bash scripts/verify/check-gitea-certificate-expiry.sh` — warns before `gitea.d-bis.org` cert expiry blocks HTTPS pushes |
|
||||
| **TsunamiSwap DEX plan** | [00-meta/AAVE_CHAIN138_AND_MARIONETTE_TSUNAMISWAP_PLAN.md](00-meta/AAVE_CHAIN138_AND_MARIONETTE_TSUNAMISWAP_PLAN.md) — canonical TsunamiSwap VM `5010` plan, current DEX link, and publish checklist |
|
||||
| **Required / optional / recommended (full plan)** | [00-meta/COMPLETE_REQUIRED_OPTIONAL_RECOMMENDED_INDEX.md](00-meta/COMPLETE_REQUIRED_OPTIONAL_RECOMMENDED_INDEX.md) |
|
||||
| **Single task list** | [00-meta/TODOS_CONSOLIDATED.md](00-meta/TODOS_CONSOLIDATED.md) |
|
||||
|
||||
@@ -39,6 +39,7 @@ One-line install (Debian/Ubuntu): `sudo apt install -y sshpass rsync dnsutils ip
|
||||
- `check-deployer-balance-blockscout-vs-rpc.sh` - Compare deployer native balance from Blockscout API vs RPC (to verify index matches current chain); see [EXPLORER_AND_BLOCKSCAN_REFERENCE](../../docs/11-references/EXPLORER_AND_BLOCKSCAN_REFERENCE.md)
|
||||
- `check-dependencies.sh` - Verify required tools (bash, curl, jq, openssl, ssh)
|
||||
- `check-cluster-besu-inventory.sh` - Cluster-wide Besu inventory audit using `pvesh /cluster/resources` via a Proxmox cluster node so host placement on `r630-03` / `r630-04` is not missed. Prints VMID, type, node, status, name, IP, canonical-vs-extra classification, and any missing canonical VMIDs. Use `--json` for machine-readable output.
|
||||
- `check-gitea-certificate-expiry.sh` - Read-only TLS expiry check for `gitea.d-bis.org` (or another host passed as arg). Exits `0` when outside the warning window, `1` when within `WARN_DAYS` (default `14`), and `2` on expiry or probe failure.
|
||||
- `check-pnpm-workspace-lockfile.sh` - Ensures every path in `pnpm-workspace.yaml` has an `importer` in `pnpm-lock.yaml` (run `pnpm install` at root if it fails; avoids broken `pnpm outdated -r`)
|
||||
- `export-cloudflare-dns-records.sh` - Export Cloudflare DNS records
|
||||
- `export-npmplus-config.sh` - Export NPMplus proxy hosts and certificates via API
|
||||
|
||||
81
scripts/verify/check-gitea-certificate-expiry.sh
Executable file
81
scripts/verify/check-gitea-certificate-expiry.sh
Executable file
@@ -0,0 +1,81 @@
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
HOST="${1:-gitea.d-bis.org}"
|
||||
PORT="${PORT:-443}"
|
||||
WARN_DAYS="${WARN_DAYS:-14}"
|
||||
TIMEOUT_SECS="${TIMEOUT_SECS:-15}"
|
||||
|
||||
if ! [[ "$WARN_DAYS" =~ ^[0-9]+$ ]]; then
|
||||
echo "ERROR: WARN_DAYS must be an integer, got: $WARN_DAYS" >&2
|
||||
exit 2
|
||||
fi
|
||||
|
||||
if ! command -v openssl >/dev/null 2>&1; then
|
||||
echo "ERROR: openssl is required" >&2
|
||||
exit 2
|
||||
fi
|
||||
|
||||
if ! command -v python3 >/dev/null 2>&1; then
|
||||
echo "ERROR: python3 is required" >&2
|
||||
exit 2
|
||||
fi
|
||||
|
||||
echo "Checking TLS certificate expiry for ${HOST}:${PORT}"
|
||||
|
||||
cert_text="$(
|
||||
timeout "$TIMEOUT_SECS" openssl s_client -servername "$HOST" -connect "${HOST}:${PORT}" </dev/null 2>/dev/null \
|
||||
| openssl x509 -noout -issuer -subject -dates
|
||||
)"
|
||||
|
||||
if [[ -z "$cert_text" ]]; then
|
||||
echo "ERROR: could not read certificate from ${HOST}:${PORT}" >&2
|
||||
exit 2
|
||||
fi
|
||||
|
||||
not_after="$(printf '%s\n' "$cert_text" | sed -n 's/^notAfter=//p')"
|
||||
issuer="$(printf '%s\n' "$cert_text" | sed -n 's/^issuer=//p')"
|
||||
subject="$(printf '%s\n' "$cert_text" | sed -n 's/^subject=//p')"
|
||||
|
||||
if [[ -z "$not_after" ]]; then
|
||||
echo "ERROR: certificate did not include notAfter" >&2
|
||||
exit 2
|
||||
fi
|
||||
|
||||
days_left="$(
|
||||
python3 - "$not_after" <<'PY'
|
||||
import sys
|
||||
from datetime import datetime, timezone
|
||||
|
||||
not_after = sys.argv[1].strip()
|
||||
expiry = datetime.strptime(not_after, "%b %d %H:%M:%S %Y %Z").replace(tzinfo=timezone.utc)
|
||||
now = datetime.now(timezone.utc)
|
||||
delta = expiry - now
|
||||
print(delta.total_seconds() / 86400)
|
||||
PY
|
||||
)"
|
||||
|
||||
days_left_int="$(python3 - "$days_left" <<'PY'
|
||||
import math
|
||||
import sys
|
||||
|
||||
print(math.floor(float(sys.argv[1])))
|
||||
PY
|
||||
)"
|
||||
|
||||
echo "Issuer: ${issuer}"
|
||||
echo "Subject: ${subject}"
|
||||
echo "Expiry: ${not_after}"
|
||||
echo "Days left: ${days_left_int}"
|
||||
|
||||
if (( days_left_int < 0 )); then
|
||||
echo "CRITICAL: certificate for ${HOST} already expired" >&2
|
||||
exit 2
|
||||
fi
|
||||
|
||||
if (( days_left_int < WARN_DAYS )); then
|
||||
echo "WARNING: certificate for ${HOST} expires in fewer than ${WARN_DAYS} days" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "OK: certificate expiry is outside the ${WARN_DAYS}-day warning window"
|
||||
Reference in New Issue
Block a user