Some checks failed
Deploy to Phoenix / deploy (push) Failing after 5s
phoenix-deploy Deploy failed: Command failed: bash scripts/deployment/phoenix-deploy-currencicombo-from-workspace.sh
[currencicombo-phoenix] packing s
255 lines
14 KiB
Markdown
255 lines
14 KiB
Markdown
# CurrenciCombo — Phoenix / systemd deployment
|
|
|
|
This directory holds everything needed to deploy CurrenciCombo onto a
|
|
systemd host — starting with Phoenix CT 8604 on `r630-01`, but any
|
|
Debian/Ubuntu (or Alpine) host with Postgres + Redis available works.
|
|
|
|
The files here are **target-agnostic**. They hardcode no IPs, hostnames,
|
|
or VLANs. Environment-specific values — `curucombo.曼李.com`, the
|
|
`10.160.0.14` VIP, the NPMplus reverse proxy — are applied at the
|
|
edge (NPMplus) and at `/etc/currencicombo/orchestrator.env`, never in
|
|
the repo.
|
|
|
|
## Architecture on CT 8604
|
|
|
|
```
|
|
┌────────────────────┐
|
|
curucombo.曼李.com ──▶ NPMplus │192.168.11.167 │
|
|
(Cloudflare-proxied) │ TLS terminates here│
|
|
└─────────┬──────────┘
|
|
│
|
|
┌──────────────────────┴──────────────────────┐
|
|
│ │
|
|
▼ ▼
|
|
curucombo.曼李.com/* (default) curucombo.曼李.com/api/*
|
|
(incl. SSE /api/plans/*/events/stream)
|
|
│ │
|
|
CT 8604 │10.160.0.14:3000 CT 8604 │10.160.0.14:8080
|
|
▼ ▼
|
|
┌─────────────────────────────┐ ┌─────────────────────────────┐
|
|
│ currencicombo-webapp.service │ │ currencicombo-orchestrator │
|
|
│ nginx → /opt/currencicombo/ │ │ .service (systemd) │
|
|
│ webapp/dist/ │ │ node dist/index.js │
|
|
└─────────────────────────────┘ │ env /etc/currencicombo/ │
|
|
│ orchestrator.env │
|
|
└──────────────┬──────────────┘
|
|
│
|
|
▼
|
|
postgresql + redis (same CT, local)
|
|
```
|
|
|
|
## Files
|
|
|
|
| path | purpose |
|
|
|---|---|
|
|
| `systemd/currencicombo-orchestrator.service` | Node orchestrator, reads `/etc/currencicombo/orchestrator.env` |
|
|
| `systemd/currencicombo-webapp.service` | nginx serving the Vite SPA on `:3000` |
|
|
| `webapp-nginx.conf` | full nginx.conf for the webapp unit |
|
|
| `.env.prod.example` | env template installed to `/etc/currencicombo/orchestrator.env` |
|
|
| `install.sh` | one-shot host setup: user / dirs / DB role / systemd units / first-run key handoff file |
|
|
| `install-prune-cron.sh` | opt-in daily cron that prunes `/var/lib/currencicombo/backups/` (30-day retention, keep-min 5) |
|
|
| `deploy-currencicombo-8604.sh` | build-and-swap deploy driver (the script Phoenix/proxmox deploy-api calls) |
|
|
| `README.md` | you're reading it |
|
|
|
|
## First-time setup on CT 8604
|
|
|
|
All commands run as **root** inside the CT.
|
|
|
|
1. Ensure Postgres + Redis are installed and running:
|
|
```
|
|
apt-get install -y postgresql redis-server
|
|
systemctl enable --now postgresql redis-server
|
|
```
|
|
2. Clone the repo into its staging location (once):
|
|
```
|
|
install -d -o root -g root /var/lib/currencicombo
|
|
git clone https://gitea.d-bis.org/d-bis/CurrenciCombo.git /var/lib/currencicombo/repo
|
|
```
|
|
3. Run `install.sh` (creates user, DB, systemd units, env file):
|
|
```
|
|
bash /var/lib/currencicombo/repo/scripts/deployment/install.sh
|
|
```
|
|
On success you'll see:
|
|
```
|
|
[install] generated EVENT_SIGNING_SECRET (64 hex)
|
|
[install] generated 3 API keys (initiator/settler/auditor)
|
|
[install] initial secrets written to /root/currencicombo-first-keys.txt (0600) — record in password manager, then 'shred -u /root/currencicombo-first-keys.txt'
|
|
[install] install complete.
|
|
```
|
|
`install.sh` writes the three API keys + `EVENT_SIGNING_SECRET` to **two** places:
|
|
- `/etc/currencicombo/orchestrator.env` — canonical, read by systemd (`0640`, owned by `currencicombo`).
|
|
- `/root/currencicombo-first-keys.txt` — **root-only handoff file** (`0600`). Grab it once, record the values in your password manager, then `shred -u` it.
|
|
The handoff file is **not** regenerated on re-run — if `orchestrator.env` already exists, `install.sh` does not produce new secrets.
|
|
4. (Optional) Install the backup-pruning cron:
|
|
```
|
|
bash /var/lib/currencicombo/repo/scripts/deployment/install-prune-cron.sh
|
|
```
|
|
Drops a `/etc/cron.daily/currencicombo-prune-backups` that deletes anything under `/var/lib/currencicombo/backups/` older than 30 days while **always keeping the newest 5** regardless of age. Safe on re-run; opt out with `sudo rm /etc/cron.daily/currencicombo-prune-backups`.
|
|
5. If you need to resolve any `EXT-*` blocker (e.g. point at a real dbis_core), edit `/etc/currencicombo/orchestrator.env` before the first deploy.
|
|
6. First build-and-start:
|
|
```
|
|
bash /var/lib/currencicombo/repo/scripts/deployment/deploy-currencicombo-8604.sh
|
|
```
|
|
Expected tail:
|
|
```
|
|
[deploy] orchestrator ready: {"ready":true}
|
|
[deploy] portal OK (HTTP 200)
|
|
[deploy] EXT-* blocker summary from orchestrator boot log:
|
|
[ExternalBlockers] 6 active, 1 resolved
|
|
id: EXT-DBIS-CORE
|
|
id: EXT-CC-PAYMENT-ADAPTERS
|
|
...
|
|
id: EXT-CHAIN138-CI-RPC (resolved)
|
|
[deploy] deploy complete. ref=main sha=<short> ts=<timestamp>
|
|
```
|
|
|
|
## NPMplus ingress changes required at cutover
|
|
|
|
`curucombo.曼李.com` today proxies 100% to `10.160.0.14:3000`. After
|
|
cutover it must become a **single-origin path-routed proxy** with **two**
|
|
rules (the SSE endpoint lives at `/api/plans/:id/events/stream`, so it's
|
|
already under `/api/*` — no separate `/events/*` rule is needed):
|
|
|
|
| location | upstream | proxy settings |
|
|
|---|---|---|
|
|
| `/api/*` | `http://10.160.0.14:8080` | **SSE-friendly settings apply here because the SSE route `/api/plans/:id/events/stream` is under /api/**. Use `proxy_pass http://10.160.0.14:8080;` with **no trailing slash** so `/api/...` reaches the orchestrator unchanged. Set: `proxy_http_version 1.1;`, `proxy_set_header Connection "";`, `proxy_buffering off;`, `proxy_cache off;`, `proxy_read_timeout 24h;`, `proxy_send_timeout 24h;`. Standard forwarding: `proxy_set_header Host $host;`, `X-Real-IP $remote_addr;`, `X-Forwarded-For $proxy_add_x_forwarded_for;`, `X-Forwarded-Proto $scheme;`. The slight overhead of `proxy_buffering off` on plain REST calls is negligible for this workload. |
|
|
| `/` | `http://10.160.0.14:3000` | Vite SPA. Default upstream. No special settings. |
|
|
|
|
If you skip the `/api/*` rule, the nginx in `webapp-nginx.conf`
|
|
intentionally returns `HTTP 421` for that path — a clean "upstream is
|
|
misconfigured" signal instead of silently returning `index.html` and
|
|
breaking the browser with a JSON parse error.
|
|
|
|
## Subsequent deploys
|
|
|
|
Every deploy after the first is just:
|
|
|
|
```
|
|
sudo /var/lib/currencicombo/repo/scripts/deployment/deploy-currencicombo-8604.sh
|
|
```
|
|
|
|
Flags:
|
|
- `--ref=<branch-or-sha>` — deploy something other than `main`.
|
|
- `--dry-run` — print what would happen, don't touch anything.
|
|
- `--skip-migrate` — hotfix deploys that don't change the schema.
|
|
- `--skip-build` — reuse the build from the previous run (debugging only).
|
|
- `--rollback` — restore the most recent `/var/lib/currencicombo/backups/<ts>/` and restart units. Does **not** git-pull or rebuild.
|
|
|
|
Every deploy writes a timestamped backup to
|
|
`/var/lib/currencicombo/backups/<YYYYmmdd-HHMMSS>/` before swapping. Pruning is opt-in via `install-prune-cron.sh` (30-day retention, keep-min 5). Without the cron, backups accumulate forever — quietly filling `/var/lib` is how the next outage starts.
|
|
|
|
## Failure handling on deploy
|
|
|
|
**Rollback is manual.** `deploy-currencicombo-8604.sh` **does not** auto-restore the previous backup if the orchestrator fails to become ready. First cutovers typically fail because of env typos or migration mistakes, and auto-restoring hides the failure state ops needs.
|
|
|
|
Instead, on a readiness timeout the deploy script prints:
|
|
- last 40 lines of `journalctl -u currencicombo-orchestrator`
|
|
- last 20 lines of `journalctl -u currencicombo-webapp`
|
|
- **the exact `--rollback` command with the specific backup path filled in**
|
|
|
|
Example tail on failure:
|
|
```
|
|
================================================================
|
|
DEPLOY FAILED: orchestrator did not become ready after 60s
|
|
================================================================
|
|
|
|
## currencicombo-orchestrator (last 40 lines):
|
|
... env validation error: EVENT_SIGNING_SECRET is required ...
|
|
|
|
## Units are in whatever state deploy left them. To restore
|
|
## the previous build (does NOT revert DB migrations):
|
|
|
|
sudo /var/lib/currencicombo/repo/scripts/deployment/deploy-currencicombo-8604.sh --rollback
|
|
# (will restore /var/lib/currencicombo/backups/20260423-140215)
|
|
|
|
================================================================
|
|
```
|
|
|
|
Rollback one-liner (when ops has decided to restore):
|
|
```
|
|
sudo /var/lib/currencicombo/repo/scripts/deployment/deploy-currencicombo-8604.sh --rollback
|
|
```
|
|
|
|
Rollback restores the most recent backup and restarts both units. It **does not** touch the DB. If the failed deploy applied a new migration, DB rollback is a manual `psql` task — the orchestrator's migration runner only emits `up()` paths.
|
|
|
|
## Post-cutover smoke checks through NPMplus
|
|
|
|
Once the NPMplus `/api/*` rule is live, from a workstation (not the CT):
|
|
|
|
```
|
|
# 1. Front-door TLS is healthy
|
|
curl -skI https://curucombo.xn--vov0g.com/ | head -3
|
|
# expect: HTTP/2 200
|
|
# expect: NO 'x-nextjs-prerender' header (that was the old Next.js build)
|
|
|
|
# 2. SPA is the new Vite portal
|
|
curl -sk https://curucombo.xn--vov0g.com/ | grep -oE '<title>[^<]+</title>'
|
|
# expect: <title>Solace Bank Group PLC — Treasury Management Portal</title>
|
|
|
|
# 3. Orchestrator ready through NPMplus
|
|
curl -sk https://curucombo.xn--vov0g.com/api/ready | head -1
|
|
# expect: {"ready":true} (not HTML)
|
|
|
|
# 4. Orchestrator blocker log (through CT shell, not NPMplus)
|
|
ssh root@10.160.0.14 'journalctl -u currencicombo-orchestrator -n 200 | grep -E "ExternalBlockers|EXT-"'
|
|
# expect: [ExternalBlockers] 6 active, 1 resolved
|
|
# expect: one line per EXT-* id
|
|
|
|
# 5. SSE actually streams (catches silent NPMplus proxy_buffering=on misconfig)
|
|
curl -sk -N --max-time 5 -H 'Accept: text/event-stream' \
|
|
https://curucombo.xn--vov0g.com/api/plans/demo-pay-014/events/stream \
|
|
| head -20 || true
|
|
# expect: HTTP/2 200 with Content-Type: text/event-stream
|
|
# expect: at least one 'data: {...}\n\n' frame to arrive WITHIN ~1s
|
|
# if you see nothing for 3-5s and then everything dumps at once:
|
|
# NPMplus has proxy_buffering=on. Fix: proxy_buffering off; proxy_http_version 1.1; proxy_set_header Connection "";
|
|
# if the ping is 401/403: expected — SSE is auth-gated; the point is to
|
|
# prove the request REACHED the orchestrator (content-type header +
|
|
# chunked response headers) rather than hitting the Vite SPA.
|
|
```
|
|
|
|
A plain `HTTP/2 200` with a `Content-Type: text/html` body on `/api/ready` means NPMplus is silently falling back to the `/` rule — the `/api/*` rule is missing or ordered wrong. The `webapp-nginx.conf` in this repo returns `HTTP 421` for `/api/*` to make that case obvious when debugging CT-locally, but at the NPMplus edge nginx serves whatever NPMplus routes to it.
|
|
|
|
## Troubleshooting
|
|
|
|
| symptom | cause / check |
|
|
|---|---|
|
|
| `/api/*` returns `421 NPMplus is misconfigured` | NPMplus `/api/*` rule missing or wrong upstream. |
|
|
| `/events/*` connects then disconnects after ~60s | NPMplus forgot `proxy_buffering off` + high `proxy_read_timeout`. |
|
|
| orchestrator unit enters `activating (auto-restart)` loop | `journalctl -u currencicombo-orchestrator -n 80` — usually a zod env-validation error. The boot-time assertion message names the missing/invalid var. |
|
|
| orchestrator boot log says `[ExternalBlockers] N active` where N > 6 | you added an `EXT-*` env var without also updating the central registry in `orchestrator/src/config/externalBlockers.ts`. |
|
|
| `/health` returns 503 but `/ready` is 200 | memory `critical` is a separate signal from readiness. Inspect CT memory; this happens on constrained builders and is not a deploy bug. |
|
|
| portal page loads but MetaMask login does nothing | the portal couldn't reach `/api/auth/*`. Walk back up the NPMplus rule chain. |
|
|
|
|
## Cutting over from the pre-existing Next.js build
|
|
|
|
Phoenix previously had an older Next.js "ISO-20022 Combo Flow" app in
|
|
`/opt/currencicombo/webapp`. The cutover sequence on CT 8604 is:
|
|
|
|
1. **Backup the old install** out-of-band:
|
|
```
|
|
tar czf /root/currencicombo-preRepo-$(date +%s).tgz /opt/currencicombo /etc/currencicombo 2>/dev/null || true
|
|
```
|
|
2. **Disable the pre-existing systemd units** (they're the same names but point at the old tree):
|
|
```
|
|
systemctl stop currencicombo-webapp currencicombo-orchestrator
|
|
systemctl disable currencicombo-webapp currencicombo-orchestrator
|
|
```
|
|
3. Run `install.sh` (writes the new units, new nginx, new env). On an already-set-up host this is idempotent: it preserves `/etc/currencicombo/orchestrator.env` if it already exists.
|
|
4. Run `deploy-currencicombo-8604.sh`.
|
|
5. Apply the NPMplus `/api` + `/` path rules.
|
|
6. Smoke from outside the CT: `curl -skI https://curucombo.xn--vov0g.com/ && curl -sk https://curucombo.xn--vov0g.com/api/ready`.
|
|
|
|
## Proxmox-side follow-up (not in this PR)
|
|
|
|
After this PR merges and the above cutover runs cleanly, the
|
|
`/home/intlc/projects/proxmox` repo needs a separate commit to:
|
|
|
|
- Update `phoenix-deploy-api/deploy-targets.json` to point at:
|
|
- repo: `d-bis/CurrenciCombo`
|
|
- branch: `main`
|
|
- target: `default`
|
|
- deploy entrypoint: `scripts/deployment/deploy-currencicombo-8604.sh`
|
|
- Remove any stale `/opt/currencicombo/webapp` Next.js references.
|
|
- Drop any description of `ignoreBuildErrors: true` in `webapp/next.config.ts` — the new webapp is Vite+tsc-strict, no build-error suppression.
|