Files
CurrenciCombo/scripts/deployment/README.md
defiQUG 4a1f69a8e5
Some checks failed
Deploy to Phoenix / deploy (push) Failing after 5s
phoenix-deploy Deploy failed: Command failed: bash scripts/deployment/phoenix-deploy-currencicombo-from-workspace.sh [currencicombo-phoenix] packing s
deploy: make Phoenix redeploys archive-safe
2026-04-22 20:05:35 -07:00

14 KiB

CurrenciCombo — Phoenix / systemd deployment

This directory holds everything needed to deploy CurrenciCombo onto a systemd host — starting with Phoenix CT 8604 on r630-01, but any Debian/Ubuntu (or Alpine) host with Postgres + Redis available works.

The files here are target-agnostic. They hardcode no IPs, hostnames, or VLANs. Environment-specific values — curucombo.曼李.com, the 10.160.0.14 VIP, the NPMplus reverse proxy — are applied at the edge (NPMplus) and at /etc/currencicombo/orchestrator.env, never in the repo.

Architecture on CT 8604

                                   ┌────────────────────┐
   curucombo.曼李.com  ──▶ NPMplus │192.168.11.167      │
   (Cloudflare-proxied)            │ TLS terminates here│
                                   └─────────┬──────────┘
                                             │
                      ┌──────────────────────┴──────────────────────┐
                      │                                             │
                      ▼                                             ▼
       curucombo.曼李.com/* (default)                   curucombo.曼李.com/api/*
                                                (incl. SSE /api/plans/*/events/stream)
                      │                                             │
             CT 8604  │10.160.0.14:3000                    CT 8604  │10.160.0.14:8080
                      ▼                                             ▼
       ┌─────────────────────────────┐               ┌─────────────────────────────┐
       │ currencicombo-webapp.service │               │ currencicombo-orchestrator  │
       │  nginx → /opt/currencicombo/ │               │  .service (systemd)          │
       │          webapp/dist/        │               │  node dist/index.js          │
       └─────────────────────────────┘               │  env /etc/currencicombo/     │
                                                      │       orchestrator.env       │
                                                      └──────────────┬──────────────┘
                                                                     │
                                                                     ▼
                                                 postgresql + redis (same CT, local)

Files

path purpose
systemd/currencicombo-orchestrator.service Node orchestrator, reads /etc/currencicombo/orchestrator.env
systemd/currencicombo-webapp.service nginx serving the Vite SPA on :3000
webapp-nginx.conf full nginx.conf for the webapp unit
.env.prod.example env template installed to /etc/currencicombo/orchestrator.env
install.sh one-shot host setup: user / dirs / DB role / systemd units / first-run key handoff file
install-prune-cron.sh opt-in daily cron that prunes /var/lib/currencicombo/backups/ (30-day retention, keep-min 5)
deploy-currencicombo-8604.sh build-and-swap deploy driver (the script Phoenix/proxmox deploy-api calls)
README.md you're reading it

First-time setup on CT 8604

All commands run as root inside the CT.

  1. Ensure Postgres + Redis are installed and running:
    apt-get install -y postgresql redis-server
    systemctl enable --now postgresql redis-server
    
  2. Clone the repo into its staging location (once):
    install -d -o root -g root /var/lib/currencicombo
    git clone https://gitea.d-bis.org/d-bis/CurrenciCombo.git /var/lib/currencicombo/repo
    
  3. Run install.sh (creates user, DB, systemd units, env file):
    bash /var/lib/currencicombo/repo/scripts/deployment/install.sh
    
    On success you'll see:
    [install] generated EVENT_SIGNING_SECRET (64 hex)
    [install] generated 3 API keys (initiator/settler/auditor)
    [install] initial secrets written to /root/currencicombo-first-keys.txt (0600) — record in password manager, then 'shred -u /root/currencicombo-first-keys.txt'
    [install] install complete.
    
    install.sh writes the three API keys + EVENT_SIGNING_SECRET to two places:
    • /etc/currencicombo/orchestrator.env — canonical, read by systemd (0640, owned by currencicombo).
    • /root/currencicombo-first-keys.txtroot-only handoff file (0600). Grab it once, record the values in your password manager, then shred -u it. The handoff file is not regenerated on re-run — if orchestrator.env already exists, install.sh does not produce new secrets.
  4. (Optional) Install the backup-pruning cron:
    bash /var/lib/currencicombo/repo/scripts/deployment/install-prune-cron.sh
    
    Drops a /etc/cron.daily/currencicombo-prune-backups that deletes anything under /var/lib/currencicombo/backups/ older than 30 days while always keeping the newest 5 regardless of age. Safe on re-run; opt out with sudo rm /etc/cron.daily/currencicombo-prune-backups.
  5. If you need to resolve any EXT-* blocker (e.g. point at a real dbis_core), edit /etc/currencicombo/orchestrator.env before the first deploy.
  6. First build-and-start:
    bash /var/lib/currencicombo/repo/scripts/deployment/deploy-currencicombo-8604.sh
    
    Expected tail:
    [deploy] orchestrator ready: {"ready":true}
    [deploy] portal OK (HTTP 200)
    [deploy] EXT-* blocker summary from orchestrator boot log:
               [ExternalBlockers] 6 active, 1 resolved
               id: EXT-DBIS-CORE
               id: EXT-CC-PAYMENT-ADAPTERS
               ...
               id: EXT-CHAIN138-CI-RPC  (resolved)
    [deploy] deploy complete. ref=main sha=<short> ts=<timestamp>
    

NPMplus ingress changes required at cutover

curucombo.曼李.com today proxies 100% to 10.160.0.14:3000. After cutover it must become a single-origin path-routed proxy with two rules (the SSE endpoint lives at /api/plans/:id/events/stream, so it's already under /api/* — no separate /events/* rule is needed):

location upstream proxy settings
/api/* http://10.160.0.14:8080 SSE-friendly settings apply here because the SSE route /api/plans/:id/events/stream is under /api/. Use proxy_pass http://10.160.0.14:8080; with no trailing slash so /api/... reaches the orchestrator unchanged. Set: proxy_http_version 1.1;, proxy_set_header Connection "";, proxy_buffering off;, proxy_cache off;, proxy_read_timeout 24h;, proxy_send_timeout 24h;. Standard forwarding: proxy_set_header Host $host;, X-Real-IP $remote_addr;, X-Forwarded-For $proxy_add_x_forwarded_for;, X-Forwarded-Proto $scheme;. The slight overhead of proxy_buffering off on plain REST calls is negligible for this workload.
/ http://10.160.0.14:3000 Vite SPA. Default upstream. No special settings.

If you skip the /api/* rule, the nginx in webapp-nginx.conf intentionally returns HTTP 421 for that path — a clean "upstream is misconfigured" signal instead of silently returning index.html and breaking the browser with a JSON parse error.

Subsequent deploys

Every deploy after the first is just:

sudo /var/lib/currencicombo/repo/scripts/deployment/deploy-currencicombo-8604.sh

Flags:

  • --ref=<branch-or-sha> — deploy something other than main.
  • --dry-run — print what would happen, don't touch anything.
  • --skip-migrate — hotfix deploys that don't change the schema.
  • --skip-build — reuse the build from the previous run (debugging only).
  • --rollback — restore the most recent /var/lib/currencicombo/backups/<ts>/ and restart units. Does not git-pull or rebuild.

Every deploy writes a timestamped backup to /var/lib/currencicombo/backups/<YYYYmmdd-HHMMSS>/ before swapping. Pruning is opt-in via install-prune-cron.sh (30-day retention, keep-min 5). Without the cron, backups accumulate forever — quietly filling /var/lib is how the next outage starts.

Failure handling on deploy

Rollback is manual. deploy-currencicombo-8604.sh does not auto-restore the previous backup if the orchestrator fails to become ready. First cutovers typically fail because of env typos or migration mistakes, and auto-restoring hides the failure state ops needs.

Instead, on a readiness timeout the deploy script prints:

  • last 40 lines of journalctl -u currencicombo-orchestrator
  • last 20 lines of journalctl -u currencicombo-webapp
  • the exact --rollback command with the specific backup path filled in

Example tail on failure:

================================================================
DEPLOY FAILED: orchestrator did not become ready after 60s
================================================================

## currencicombo-orchestrator (last 40 lines):
... env validation error: EVENT_SIGNING_SECRET is required ...

## Units are in whatever state deploy left them. To restore
## the previous build (does NOT revert DB migrations):

    sudo /var/lib/currencicombo/repo/scripts/deployment/deploy-currencicombo-8604.sh --rollback
    # (will restore /var/lib/currencicombo/backups/20260423-140215)

================================================================

Rollback one-liner (when ops has decided to restore):

sudo /var/lib/currencicombo/repo/scripts/deployment/deploy-currencicombo-8604.sh --rollback

Rollback restores the most recent backup and restarts both units. It does not touch the DB. If the failed deploy applied a new migration, DB rollback is a manual psql task — the orchestrator's migration runner only emits up() paths.

Post-cutover smoke checks through NPMplus

Once the NPMplus /api/* rule is live, from a workstation (not the CT):

# 1. Front-door TLS is healthy
curl -skI https://curucombo.xn--vov0g.com/ | head -3
#   expect: HTTP/2 200
#   expect: NO 'x-nextjs-prerender' header (that was the old Next.js build)

# 2. SPA is the new Vite portal
curl -sk https://curucombo.xn--vov0g.com/ | grep -oE '<title>[^<]+</title>'
#   expect: <title>Solace Bank Group PLC — Treasury Management Portal</title>

# 3. Orchestrator ready through NPMplus
curl -sk https://curucombo.xn--vov0g.com/api/ready | head -1
#   expect: {"ready":true}   (not HTML)

# 4. Orchestrator blocker log (through CT shell, not NPMplus)
ssh root@10.160.0.14 'journalctl -u currencicombo-orchestrator -n 200 | grep -E "ExternalBlockers|EXT-"'
#   expect: [ExternalBlockers] 6 active, 1 resolved
#   expect: one line per EXT-* id

# 5. SSE actually streams (catches silent NPMplus proxy_buffering=on misconfig)
curl -sk -N --max-time 5 -H 'Accept: text/event-stream' \
  https://curucombo.xn--vov0g.com/api/plans/demo-pay-014/events/stream \
  | head -20 || true
#   expect: HTTP/2 200 with Content-Type: text/event-stream
#   expect: at least one 'data: {...}\n\n' frame to arrive WITHIN ~1s
#   if you see nothing for 3-5s and then everything dumps at once:
#     NPMplus has proxy_buffering=on. Fix: proxy_buffering off; proxy_http_version 1.1; proxy_set_header Connection "";
#   if the ping is 401/403: expected — SSE is auth-gated; the point is to
#     prove the request REACHED the orchestrator (content-type header +
#     chunked response headers) rather than hitting the Vite SPA.

A plain HTTP/2 200 with a Content-Type: text/html body on /api/ready means NPMplus is silently falling back to the / rule — the /api/* rule is missing or ordered wrong. The webapp-nginx.conf in this repo returns HTTP 421 for /api/* to make that case obvious when debugging CT-locally, but at the NPMplus edge nginx serves whatever NPMplus routes to it.

Troubleshooting

symptom cause / check
/api/* returns 421 NPMplus is misconfigured NPMplus /api/* rule missing or wrong upstream.
/events/* connects then disconnects after ~60s NPMplus forgot proxy_buffering off + high proxy_read_timeout.
orchestrator unit enters activating (auto-restart) loop journalctl -u currencicombo-orchestrator -n 80 — usually a zod env-validation error. The boot-time assertion message names the missing/invalid var.
orchestrator boot log says [ExternalBlockers] N active where N > 6 you added an EXT-* env var without also updating the central registry in orchestrator/src/config/externalBlockers.ts.
/health returns 503 but /ready is 200 memory critical is a separate signal from readiness. Inspect CT memory; this happens on constrained builders and is not a deploy bug.
portal page loads but MetaMask login does nothing the portal couldn't reach /api/auth/*. Walk back up the NPMplus rule chain.

Cutting over from the pre-existing Next.js build

Phoenix previously had an older Next.js "ISO-20022 Combo Flow" app in /opt/currencicombo/webapp. The cutover sequence on CT 8604 is:

  1. Backup the old install out-of-band:
    tar czf /root/currencicombo-preRepo-$(date +%s).tgz /opt/currencicombo /etc/currencicombo 2>/dev/null || true
    
  2. Disable the pre-existing systemd units (they're the same names but point at the old tree):
    systemctl stop currencicombo-webapp currencicombo-orchestrator
    systemctl disable currencicombo-webapp currencicombo-orchestrator
    
  3. Run install.sh (writes the new units, new nginx, new env). On an already-set-up host this is idempotent: it preserves /etc/currencicombo/orchestrator.env if it already exists.
  4. Run deploy-currencicombo-8604.sh.
  5. Apply the NPMplus /api + / path rules.
  6. Smoke from outside the CT: curl -skI https://curucombo.xn--vov0g.com/ && curl -sk https://curucombo.xn--vov0g.com/api/ready.

Proxmox-side follow-up (not in this PR)

After this PR merges and the above cutover runs cleanly, the /home/intlc/projects/proxmox repo needs a separate commit to:

  • Update phoenix-deploy-api/deploy-targets.json to point at:
    • repo: d-bis/CurrenciCombo
    • branch: main
    • target: default
    • deploy entrypoint: scripts/deployment/deploy-currencicombo-8604.sh
  • Remove any stale /opt/currencicombo/webapp Next.js references.
  • Drop any description of ignoreBuildErrors: true in webapp/next.config.ts — the new webapp is Vite+tsc-strict, no build-error suppression.