PR AA follow-up: manual-rollback loud-failure summary + keep-min-5 backup-prune cron + root-only initial-keys handoff file
Some checks failed
CI / Frontend Lint (pull_request) Failing after 7s
CI / Frontend Type Check (pull_request) Failing after 7s
CI / Frontend Build (pull_request) Failing after 6s
CI / Frontend E2E Tests (pull_request) Failing after 7s
CI / Orchestrator Build (pull_request) Failing after 7s
CI / Orchestrator Unit Tests (pull_request) Failing after 6s
CI / Orchestrator E2E (Testcontainers) (pull_request) Has been skipped
CI / Contracts Compile (pull_request) Failing after 5s
CI / Contracts Test (pull_request) Failing after 6s
Code Quality / SonarQube Analysis (pull_request) Failing after 20s
Code Quality / Code Quality Checks (pull_request) Failing after 7s
Security Scan / Dependency Vulnerability Scan (pull_request) Failing after 4s
Security Scan / OWASP ZAP Scan (pull_request) Failing after 4s
Some checks failed
CI / Frontend Lint (pull_request) Failing after 7s
CI / Frontend Type Check (pull_request) Failing after 7s
CI / Frontend Build (pull_request) Failing after 6s
CI / Frontend E2E Tests (pull_request) Failing after 7s
CI / Orchestrator Build (pull_request) Failing after 7s
CI / Orchestrator Unit Tests (pull_request) Failing after 6s
CI / Orchestrator E2E (Testcontainers) (pull_request) Has been skipped
CI / Contracts Compile (pull_request) Failing after 5s
CI / Contracts Test (pull_request) Failing after 6s
Code Quality / SonarQube Analysis (pull_request) Failing after 20s
Code Quality / Code Quality Checks (pull_request) Failing after 7s
Security Scan / Dependency Vulnerability Scan (pull_request) Failing after 4s
Security Scan / OWASP ZAP Scan (pull_request) Failing after 4s
- deploy-currencicombo-8604.sh: on readiness timeout, print loud failure summary (journalctl tails + exact --rollback command with specific backup path) instead of silently exiting. Deliberately does NOT auto-rollback; first cutovers often fail because of env/migration mistakes and auto-restore hides the failure state ops needs. - install.sh: on first run, write the three API keys + EVENT_SIGNING_SECRET to /root/currencicombo-first-keys.txt (0600, root:root) as a handoff copy. Canonical values still live in /etc/currencicombo/orchestrator.env. Log one pointer line (not the secrets themselves) to journald. Handoff file is NOT regenerated if orchestrator.env already exists. - install-prune-cron.sh (new, opt-in): installs /etc/cron.daily/ currencicombo-prune-backups that deletes entries older than 30 days from /var/lib/currencicombo/backups/ WHILE always keeping the newest 5 regardless of age. Enforced via newest-first sort + i<KEEP_MIN skip. - webapp-nginx.conf: drop the misleading /events/* 421 guard-rail. The orchestrator's SSE endpoint is /api/plans/:id/events/stream (under /api/), so one /api/* guard-rail covers both normal REST and SSE. - README.md: corrected NPMplus rule table to TWO rules (/api/* with SSE-friendly proxy_buffering=off + 24h read_timeout + Connection "" + http/1.1, and /); added post-cutover smoke checks section with a concrete SSE streaming test that catches silent proxy_buffering=on misconfig; documented the /root/currencicombo-first-keys.txt handoff and the install-prune-cron.sh workflow; replaced stale 'not auto-pruned' note. Verification: - shellcheck --severity=warning: clean on all 3 scripts. - bash -n: clean on install-prune-cron.sh. - install-prune-cron.sh --dry-run: prints the pruner body with resolved env values as expected. - install.sh --dry-run: walks through user/dirs/nginx-apt steps, then fails fast on missing psql (expected on a build box without Postgres). Co-Authored-By: Nakamoto, S <defi@defi-oracle.io>
This commit is contained in:
@@ -22,7 +22,7 @@ the repo.
|
||||
│ │
|
||||
▼ ▼
|
||||
curucombo.曼李.com/* (default) curucombo.曼李.com/api/*
|
||||
curucombo.曼李.com/events/* (SSE) ← swap ─ correctly routed to :8080
|
||||
(incl. SSE /api/plans/*/events/stream)
|
||||
│ │
|
||||
CT 8604 │10.160.0.14:3000 CT 8604 │10.160.0.14:8080
|
||||
▼ ▼
|
||||
@@ -46,7 +46,8 @@ the repo.
|
||||
| `systemd/currencicombo-webapp.service` | nginx serving the Vite SPA on `:3000` |
|
||||
| `webapp-nginx.conf` | full nginx.conf for the webapp unit |
|
||||
| `.env.prod.example` | env template installed to `/etc/currencicombo/orchestrator.env` |
|
||||
| `install.sh` | one-shot host setup: user / dirs / DB role / systemd units |
|
||||
| `install.sh` | one-shot host setup: user / dirs / DB role / systemd units / first-run key handoff file |
|
||||
| `install-prune-cron.sh` | opt-in daily cron that prunes `/var/lib/currencicombo/backups/` (30-day retention, keep-min 5) |
|
||||
| `deploy-currencicombo-8604.sh` | build-and-swap deploy driver (the script Phoenix/proxmox deploy-api calls) |
|
||||
| `README.md` | you're reading it |
|
||||
|
||||
@@ -71,12 +72,21 @@ All commands run as **root** inside the CT.
|
||||
On success you'll see:
|
||||
```
|
||||
[install] generated EVENT_SIGNING_SECRET (64 hex)
|
||||
[install] generated 3 API keys (initiator/settler/auditor) — grep /etc/currencicombo/orchestrator.env
|
||||
[install] generated 3 API keys (initiator/settler/auditor)
|
||||
[install] initial secrets written to /root/currencicombo-first-keys.txt (0600) — record in password manager, then 'shred -u /root/currencicombo-first-keys.txt'
|
||||
[install] install complete.
|
||||
```
|
||||
**Grab the three API keys from `/etc/currencicombo/orchestrator.env`** and put them in your password manager — they authenticate initiator / settler / auditor calls.
|
||||
4. If you need to resolve any `EXT-*` blocker (e.g. point at a real dbis_core), edit `/etc/currencicombo/orchestrator.env` before the first deploy.
|
||||
5. First build-and-start:
|
||||
`install.sh` writes the three API keys + `EVENT_SIGNING_SECRET` to **two** places:
|
||||
- `/etc/currencicombo/orchestrator.env` — canonical, read by systemd (`0640`, owned by `currencicombo`).
|
||||
- `/root/currencicombo-first-keys.txt` — **root-only handoff file** (`0600`). Grab it once, record the values in your password manager, then `shred -u` it.
|
||||
The handoff file is **not** regenerated on re-run — if `orchestrator.env` already exists, `install.sh` does not produce new secrets.
|
||||
4. (Optional) Install the backup-pruning cron:
|
||||
```
|
||||
bash /var/lib/currencicombo/repo/scripts/deployment/install-prune-cron.sh
|
||||
```
|
||||
Drops a `/etc/cron.daily/currencicombo-prune-backups` that deletes anything under `/var/lib/currencicombo/backups/` older than 30 days while **always keeping the newest 5** regardless of age. Safe on re-run; opt out with `sudo rm /etc/cron.daily/currencicombo-prune-backups`.
|
||||
5. If you need to resolve any `EXT-*` blocker (e.g. point at a real dbis_core), edit `/etc/currencicombo/orchestrator.env` before the first deploy.
|
||||
6. First build-and-start:
|
||||
```
|
||||
bash /var/lib/currencicombo/repo/scripts/deployment/deploy-currencicombo-8604.sh
|
||||
```
|
||||
@@ -96,16 +106,17 @@ All commands run as **root** inside the CT.
|
||||
## NPMplus ingress changes required at cutover
|
||||
|
||||
`curucombo.曼李.com` today proxies 100% to `10.160.0.14:3000`. After
|
||||
cutover it must become a **single-origin path-routed proxy**:
|
||||
cutover it must become a **single-origin path-routed proxy** with **two**
|
||||
rules (the SSE endpoint lives at `/api/plans/:id/events/stream`, so it's
|
||||
already under `/api/*` — no separate `/events/*` rule is needed):
|
||||
|
||||
| location | upstream | notes |
|
||||
| location | upstream | proxy settings |
|
||||
|---|---|---|
|
||||
| `/api/*` | `http://10.160.0.14:8080` | orchestrator API. Forward `Host`, `X-Real-IP`, `X-Forwarded-*`. `proxy_read_timeout 60s`. |
|
||||
| `/events/*` | `http://10.160.0.14:8080` | **SSE** — must set `proxy_buffering off;` and `proxy_read_timeout 24h;`. |
|
||||
| `/` | `http://10.160.0.14:3000` | Vite SPA. Default upstream. |
|
||||
| `/api/*` | `http://10.160.0.14:8080` | **SSE-friendly settings apply here because the SSE route `/api/plans/:id/events/stream` is under /api/**. Set: `proxy_http_version 1.1;`, `proxy_set_header Connection "";`, `proxy_buffering off;`, `proxy_cache off;`, `proxy_read_timeout 24h;`, `proxy_send_timeout 24h;`. Standard forwarding: `proxy_set_header Host $host;`, `X-Real-IP $remote_addr;`, `X-Forwarded-For $proxy_add_x_forwarded_for;`, `X-Forwarded-Proto $scheme;`. The slight overhead of `proxy_buffering off` on plain REST calls is negligible for this workload. |
|
||||
| `/` | `http://10.160.0.14:3000` | Vite SPA. Default upstream. No special settings. |
|
||||
|
||||
If you skip the `/api` + `/events` rules, the nginx in `webapp-nginx.conf`
|
||||
intentionally returns `HTTP 421` for those paths — a clean "upstream is
|
||||
If you skip the `/api/*` rule, the nginx in `webapp-nginx.conf`
|
||||
intentionally returns `HTTP 421` for that path — a clean "upstream is
|
||||
misconfigured" signal instead of silently returning `index.html` and
|
||||
breaking the browser with a JSON parse error.
|
||||
|
||||
@@ -125,19 +136,79 @@ Flags:
|
||||
- `--rollback` — restore the most recent `/var/lib/currencicombo/backups/<ts>/` and restart units. Does **not** git-pull or rebuild.
|
||||
|
||||
Every deploy writes a timestamped backup to
|
||||
`/var/lib/currencicombo/backups/<YYYYmmdd-HHMMSS>/` before swapping. Old
|
||||
backups are not auto-pruned — `find /var/lib/currencicombo/backups -maxdepth 1 -mtime +30 -exec rm -rf {} +` on a cron.
|
||||
`/var/lib/currencicombo/backups/<YYYYmmdd-HHMMSS>/` before swapping. Pruning is opt-in via `install-prune-cron.sh` (30-day retention, keep-min 5). Without the cron, backups accumulate forever — quietly filling `/var/lib` is how the next outage starts.
|
||||
|
||||
## Rollback (if a deploy goes sideways)
|
||||
## Failure handling on deploy
|
||||
|
||||
**Rollback is manual.** `deploy-currencicombo-8604.sh` **does not** auto-restore the previous backup if the orchestrator fails to become ready. First cutovers typically fail because of env typos or migration mistakes, and auto-restoring hides the failure state ops needs.
|
||||
|
||||
Instead, on a readiness timeout the deploy script prints:
|
||||
- last 40 lines of `journalctl -u currencicombo-orchestrator`
|
||||
- last 20 lines of `journalctl -u currencicombo-webapp`
|
||||
- **the exact `--rollback` command with the specific backup path filled in**
|
||||
|
||||
Example tail on failure:
|
||||
```
|
||||
================================================================
|
||||
DEPLOY FAILED: orchestrator did not become ready after 60s
|
||||
================================================================
|
||||
|
||||
## currencicombo-orchestrator (last 40 lines):
|
||||
... env validation error: EVENT_SIGNING_SECRET is required ...
|
||||
|
||||
## Units are in whatever state deploy left them. To restore
|
||||
## the previous build (does NOT revert DB migrations):
|
||||
|
||||
sudo /var/lib/currencicombo/repo/scripts/deployment/deploy-currencicombo-8604.sh --rollback
|
||||
# (will restore /var/lib/currencicombo/backups/20260423-140215)
|
||||
|
||||
================================================================
|
||||
```
|
||||
|
||||
Rollback one-liner (when ops has decided to restore):
|
||||
```
|
||||
sudo /var/lib/currencicombo/repo/scripts/deployment/deploy-currencicombo-8604.sh --rollback
|
||||
```
|
||||
|
||||
Restores the most recent backup and restarts both units. Does not touch
|
||||
the DB. If the deploy that failed applied a new migration, a DB rollback
|
||||
is a manual `psql` task — we don't attempt generic `down` migrations
|
||||
because the orchestrator's migration runner only emits `up()` paths.
|
||||
Rollback restores the most recent backup and restarts both units. It **does not** touch the DB. If the failed deploy applied a new migration, DB rollback is a manual `psql` task — the orchestrator's migration runner only emits `up()` paths.
|
||||
|
||||
## Post-cutover smoke checks through NPMplus
|
||||
|
||||
Once the NPMplus `/api/*` rule is live, from a workstation (not the CT):
|
||||
|
||||
```
|
||||
# 1. Front-door TLS is healthy
|
||||
curl -skI https://curucombo.xn--vov0g.com/ | head -3
|
||||
# expect: HTTP/2 200
|
||||
# expect: NO 'x-nextjs-prerender' header (that was the old Next.js build)
|
||||
|
||||
# 2. SPA is the new Vite portal
|
||||
curl -sk https://curucombo.xn--vov0g.com/ | grep -oE '<title>[^<]+</title>'
|
||||
# expect: <title>Solace Bank Group PLC — Treasury Management Portal</title>
|
||||
|
||||
# 3. Orchestrator ready through NPMplus
|
||||
curl -sk https://curucombo.xn--vov0g.com/api/ready | head -1
|
||||
# expect: {"ready":true} (not HTML)
|
||||
|
||||
# 4. Orchestrator blocker log (through CT shell, not NPMplus)
|
||||
ssh root@10.160.0.14 'journalctl -u currencicombo-orchestrator -n 200 | grep -E "ExternalBlockers|EXT-"'
|
||||
# expect: [ExternalBlockers] 6 active, 1 resolved
|
||||
# expect: one line per EXT-* id
|
||||
|
||||
# 5. SSE actually streams (catches silent NPMplus proxy_buffering=on misconfig)
|
||||
curl -sk -N --max-time 5 -H 'Accept: text/event-stream' \
|
||||
https://curucombo.xn--vov0g.com/api/plans/demo-pay-014/events/stream \
|
||||
| head -20 || true
|
||||
# expect: HTTP/2 200 with Content-Type: text/event-stream
|
||||
# expect: at least one 'data: {...}\n\n' frame to arrive WITHIN ~1s
|
||||
# if you see nothing for 3-5s and then everything dumps at once:
|
||||
# NPMplus has proxy_buffering=on. Fix: proxy_buffering off; proxy_http_version 1.1; proxy_set_header Connection "";
|
||||
# if the ping is 401/403: expected — SSE is auth-gated; the point is to
|
||||
# prove the request REACHED the orchestrator (content-type header +
|
||||
# chunked response headers) rather than hitting the Vite SPA.
|
||||
```
|
||||
|
||||
A plain `HTTP/2 200` with a `Content-Type: text/html` body on `/api/ready` means NPMplus is silently falling back to the `/` rule — the `/api/*` rule is missing or ordered wrong. The `webapp-nginx.conf` in this repo returns `HTTP 421` for `/api/*` to make that case obvious when debugging CT-locally, but at the NPMplus edge nginx serves whatever NPMplus routes to it.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
|
||||
Reference in New Issue
Block a user