ci(phoenix): workflow_dispatch reinstall for phoenix-deploy-api on CT 5700

Closes the gap where phoenix-deploy-api/server.js on master is the real implementation, but the running service on CT 5700 is the older stub that returns 'Deploy request queued (stub)' for every target. The new workflow .gitea/workflows/bootstrap-phoenix-deploy-api.yml is manual-only (workflow_dispatch). When triggered it: 1. Validates the repo layout (phoenix-deploy-api/server.js MUST NOT contain the stub string). 2. Tars phoenix-deploy-api/ + config/public-sector-program-manifest.json into a deploy bundle. 3. scp's the bundle to the PVE node that hosts CT 5700 using a dedicated deploy SSH key (PHOENIX_PVE_SSH_KEY repo secret). 4. pct push / pct exec the bundle into the CT and runs the existing phoenix-deploy-api/scripts/install-systemd.sh which already drops /opt/phoenix-deploy-api/, writes the systemd unit, and restarts the service. 5. Health-checks GET http://<dev-vm>:4001/health (with retry). 6. Posts a non-stub probe: POST /api/deploy with target __bootstrap_probe__ + the deploy bearer token. Fails the workflow if the response body still contains 'Deploy request queued (stub)' or any auth-rejection signal. That gives an unambiguous post-bootstrap health signal in CI logs without depending on a successful real deploy. Required new secrets (documented in docs/04-configuration/DEVIN_GITEA_PROXMOX_CICD.md section 3a): PHOENIX_PVE_HOST, PHOENIX_PVE_USER (default root), PHOENIX_PVE_SSH_KEY, PHOENIX_PVE_KNOWN_HOSTS (optional), PHOENIX_DEV_VM_VMID (default 5700), PHOENIX_DEPLOY_DEV_VM_IP (default 192.168.11.59). Triggered manually only — bootstrap is sensitive enough that we do NOT fire on every master push. Once the running service on CT 5700 is post-stub, the existing deploy job in deploy-to-phoenix.yml will actually execute scripts/deployment/deploy-atomic-swap-dapp-5801.sh on each push instead of returning a 202 stub. Co-Authored-By: Nakamoto, S <defi@defi-oracle.io>
2026-04-28 19:05:36 +00:00
2 changed files with 244 additions and 0 deletions
--- a/.gitea/workflows/bootstrap-phoenix-deploy-api.yml
+++ b/.gitea/workflows/bootstrap-phoenix-deploy-api.yml
@@ -0,0 +1,210 @@
+name: Bootstrap Phoenix Deploy API
+
+# Reinstalls phoenix-deploy-api on the dev VM (CT 5700) with the latest server.js
+# from master. This is the missing link between "code on master is the real
+# implementation" and "running service on CT 5700 still has the stub". Run this
+# workflow_dispatch job whenever phoenix-deploy-api/server.js, deploy-targets.json
+# or related scripts change and you need the running service to pick up the change
+# without a manual LAN visit.
+#
+# Required Gitea repo secrets (Settings -> Secrets):
+#   PHOENIX_PVE_HOST       PVE node IP that hosts CT 5700 (e.g. 192.168.11.12)
+#   PHOENIX_PVE_USER       SSH user on the PVE node (default: root)
+#   PHOENIX_PVE_SSH_KEY    Private SSH key (PEM, OpenSSH format) authorised on the PVE node
+#   PHOENIX_PVE_KNOWN_HOSTS  Pre-populated known_hosts entry for the PVE node (avoids strict-host prompt)
+#   PHOENIX_DEV_VM_VMID    Container VMID (default: 5700)
+#   PHOENIX_DEPLOY_DEV_VM_IP  IP of the dev VM for the post-install health check (default: 192.168.11.59)
+#   PHOENIX_DEPLOY_URL     Phoenix deploy webhook URL (already used by deploy job)
+#   PHOENIX_DEPLOY_TOKEN   Bearer token for the webhook (already used by deploy job)
+#
+# Trigger only via Gitea UI (Actions tab -> "Bootstrap Phoenix Deploy API" -> Run
+# workflow). NOT triggered on push: reinstalling the deploy service is sensitive
+# enough that we want it gated behind a manual click.
+
+on:
+  workflow_dispatch:
+    inputs:
+      verify_only:
+        description: "If true, only run the post-install /health + auth probe and skip the reinstall step."
+        type: boolean
+        required: false
+        default: false
+
+jobs:
+  bootstrap:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Checkout proxmox repo
+        uses: actions/checkout@v4
+
+      - name: Validate repo layout
+        run: |
+          set -euo pipefail
+          test -d phoenix-deploy-api || { echo "phoenix-deploy-api/ missing" >&2; exit 1; }
+          test -f phoenix-deploy-api/server.js
+          test -f phoenix-deploy-api/scripts/install-systemd.sh
+          test -f phoenix-deploy-api/deploy-targets.json
+          # Manifest is optional; warn if missing but do not fail.
+          if [ ! -f config/public-sector-program-manifest.json ]; then
+            echo "::warning::config/public-sector-program-manifest.json missing — install will warn on CT"
+          fi
+          # Make sure the running master implementation is NOT the stub message
+          # that triggered this whole bootstrap thread.
+          if grep -q "Deploy request queued (stub)" phoenix-deploy-api/server.js; then
+            echo "phoenix-deploy-api/server.js still contains the stub string — refusing to bootstrap." >&2
+            exit 1
+          fi
+
+      - name: Install SSH key for PVE access
+        if: ${{ github.event.inputs.verify_only != 'true' }}
+        run: |
+          set -euo pipefail
+          mkdir -p "$HOME/.ssh"
+          chmod 700 "$HOME/.ssh"
+          umask 077
+          printf '%s\n' "${{ secrets.PHOENIX_PVE_SSH_KEY }}" > "$HOME/.ssh/id_pve"
+          chmod 600 "$HOME/.ssh/id_pve"
+          if [ -n "${{ secrets.PHOENIX_PVE_KNOWN_HOSTS }}" ]; then
+            printf '%s\n' "${{ secrets.PHOENIX_PVE_KNOWN_HOSTS }}" > "$HOME/.ssh/known_hosts"
+            chmod 644 "$HOME/.ssh/known_hosts"
+          else
+            # Fall back to accept-new on first connect; subsequent connects pin.
+            touch "$HOME/.ssh/known_hosts"
+            chmod 644 "$HOME/.ssh/known_hosts"
+          fi
+
+      - name: Build deploy bundle
+        if: ${{ github.event.inputs.verify_only != 'true' }}
+        run: |
+          set -euo pipefail
+          mkdir -p .out
+          if [ -f config/public-sector-program-manifest.json ]; then
+            tar czf .out/pda-deploy-bundle.tar.gz \
+              phoenix-deploy-api \
+              config/public-sector-program-manifest.json
+          else
+            tar czf .out/pda-deploy-bundle.tar.gz phoenix-deploy-api
+          fi
+          ls -lh .out/pda-deploy-bundle.tar.gz
+
+      - name: scp bundle to PVE host
+        if: ${{ github.event.inputs.verify_only != 'true' }}
+        env:
+          PVE_HOST: ${{ secrets.PHOENIX_PVE_HOST }}
+          PVE_USER: ${{ secrets.PHOENIX_PVE_USER }}
+        run: |
+          set -euo pipefail
+          : "${PVE_HOST:?PHOENIX_PVE_HOST not set in repo secrets}"
+          PVE_USER_VAL="${PVE_USER:-root}"
+          KNOWN_HOSTS_OPT="-o UserKnownHostsFile=$HOME/.ssh/known_hosts"
+          if [ ! -s "$HOME/.ssh/known_hosts" ]; then
+            KNOWN_HOSTS_OPT="$KNOWN_HOSTS_OPT -o StrictHostKeyChecking=accept-new"
+          else
+            KNOWN_HOSTS_OPT="$KNOWN_HOSTS_OPT -o StrictHostKeyChecking=yes"
+          fi
+          scp -i "$HOME/.ssh/id_pve" $KNOWN_HOSTS_OPT \
+            -o ConnectTimeout=20 \
+            .out/pda-deploy-bundle.tar.gz \
+            "${PVE_USER_VAL}@${PVE_HOST}:/tmp/pda-deploy-bundle.tar.gz"
+
+      - name: pct push + install-systemd on CT
+        if: ${{ github.event.inputs.verify_only != 'true' }}
+        env:
+          PVE_HOST: ${{ secrets.PHOENIX_PVE_HOST }}
+          PVE_USER: ${{ secrets.PHOENIX_PVE_USER }}
+          VMID: ${{ secrets.PHOENIX_DEV_VM_VMID }}
+        run: |
+          set -euo pipefail
+          : "${PVE_HOST:?PHOENIX_PVE_HOST not set in repo secrets}"
+          PVE_USER_VAL="${PVE_USER:-root}"
+          VMID_VAL="${VMID:-5700}"
+          KNOWN_HOSTS_OPT="-o UserKnownHostsFile=$HOME/.ssh/known_hosts"
+          if [ ! -s "$HOME/.ssh/known_hosts" ]; then
+            KNOWN_HOSTS_OPT="$KNOWN_HOSTS_OPT -o StrictHostKeyChecking=accept-new"
+          else
+            KNOWN_HOSTS_OPT="$KNOWN_HOSTS_OPT -o StrictHostKeyChecking=yes"
+          fi
+          ssh -i "$HOME/.ssh/id_pve" $KNOWN_HOSTS_OPT \
+            -o ConnectTimeout=20 \
+            "${PVE_USER_VAL}@${PVE_HOST}" "VMID=${VMID_VAL} bash -s" <<'REMOTE_EOF'
+          set -euo pipefail
+          : "${VMID:?}"
+          # Verify CT exists and is running.
+          if ! pct status "${VMID}" >/dev/null 2>&1; then
+            echo "CT ${VMID} not found on this PVE node." >&2
+            exit 1
+          fi
+          if ! pct exec "${VMID}" -- true 2>/dev/null; then
+            echo "CT ${VMID} not running. Start it first: pct start ${VMID}" >&2
+            exit 1
+          fi
+          STAGE="/tmp/proxmox-pda-stage"
+          pct push "${VMID}" /tmp/pda-deploy-bundle.tar.gz /root/pda-deploy.tar.gz
+          pct exec "${VMID}" -- bash -c "
+            set -euo pipefail
+            rm -rf '${STAGE}'
+            mkdir -p '${STAGE}'
+            tar xzf /root/pda-deploy.tar.gz -C '${STAGE}'
+            cd '${STAGE}'
+            bash phoenix-deploy-api/scripts/install-systemd.sh
+            rm -f /root/pda-deploy.tar.gz
+          "
+          rm -f /tmp/pda-deploy-bundle.tar.gz
+          REMOTE_EOF
+
+      - name: Health check (no auth)
+        env:
+          DEV_VM_IP: ${{ secrets.PHOENIX_DEPLOY_DEV_VM_IP }}
+        run: |
+          set -euo pipefail
+          IP="${DEV_VM_IP:-192.168.11.59}"
+          # Service may take a moment to come up after install; retry briefly.
+          for i in 1 2 3 4 5 6; do
+            if curl -sSf -m 5 "http://${IP}:4001/health" -o /tmp/health.json; then
+              echo "Health check OK on attempt ${i}"
+              cat /tmp/health.json || true
+              echo
+              break
+            fi
+            echo "Health check attempt ${i}/6 failed; sleeping 3s"
+            sleep 3
+            if [ "${i}" = "6" ]; then
+              echo "Phoenix Deploy API /health unreachable after install." >&2
+              exit 1
+            fi
+          done
+
+      - name: Auth + non-stub probe (POST with bogus target)
+        env:
+          PHOENIX_DEPLOY_URL: ${{ secrets.PHOENIX_DEPLOY_URL }}
+          PHOENIX_DEPLOY_TOKEN: ${{ secrets.PHOENIX_DEPLOY_TOKEN }}
+        run: |
+          set -euo pipefail
+          : "${PHOENIX_DEPLOY_URL:?}"
+          : "${PHOENIX_DEPLOY_TOKEN:?}"
+          # POST a bogus target. The post-bootstrap server should:
+          #   - accept the bearer token (NOT 401)
+          #   - reject the unknown target with a non-stub error
+          # The pre-bootstrap stub returned 202 with "Deploy request queued (stub)"
+          # for ANY target. So we explicitly check the response body does NOT
+          # contain that stub phrase.
+          BODY="$(curl -sS -m 10 -X POST "${PHOENIX_DEPLOY_URL}" \
+            -H "Authorization: Bearer ${PHOENIX_DEPLOY_TOKEN}" \
+            -H "Content-Type: application/json" \
+            -d '{"repo":"d-bis/proxmox","sha":"HEAD","branch":"master","target":"__bootstrap_probe__"}' || true)"
+          echo "Response body:"
+          echo "${BODY}"
+          if echo "${BODY}" | grep -q 'Deploy request queued (stub)'; then
+            echo "::error::Phoenix Deploy API still returning stub response — bootstrap did not take effect."
+            exit 1
+          fi
+          if echo "${BODY}" | grep -qi 'unauthorized\|invalid token\|401'; then
+            echo "::error::Phoenix Deploy API rejected the bearer token. PHOENIX_DEPLOY_TOKEN is out of sync with PHOENIX_DEPLOY_SECRET on the CT."
+            exit 1
+          fi
+          echo "Phoenix Deploy API is post-stub and authenticating correctly."
+
+      - name: Cleanup secrets
+        if: always()
+        run: |
+          rm -f "$HOME/.ssh/id_pve" "$HOME/.ssh/known_hosts" || true
--- a/docs/04-configuration/DEVIN_GITEA_PROXMOX_CICD.md
+++ b/docs/04-configuration/DEVIN_GITEA_PROXMOX_CICD.md
@@ -119,6 +119,40 @@ For webhook signing, the bootstrap/helper path also expects:

 Do not enable both repo Actions deploys and webhook deploys for the same repo unless you intentionally want duplicate deploy attempts.

+### 3a. Bootstrap workflow secrets (one-time per CT)
+
+The reinstall workflow `.gitea/workflows/bootstrap-phoenix-deploy-api.yml`
+ships the latest `phoenix-deploy-api/` from `master` to CT 5700 via
+scp + `pct push` and re-runs `install-systemd.sh`. This is the path you
+take when the running service on the CT is older than the code on
+`master` (e.g. it still returns the "Deploy request queued (stub)"
+message). Trigger via the Gitea Actions UI → "Bootstrap Phoenix Deploy
+API" → Run workflow.
+
+Required secrets (in addition to the deploy secrets above):
+
+- `PHOENIX_PVE_HOST` — PVE node IP that hosts CT 5700 (e.g.
+  `192.168.11.12` for `r630-02`).
+- `PHOENIX_PVE_USER` — SSH user on the PVE node (default `root`).
+- `PHOENIX_PVE_SSH_KEY` — Private SSH key (OpenSSH format) authorised
+  on the PVE node. Use a dedicated deploy key, not your personal key.
+- `PHOENIX_PVE_KNOWN_HOSTS` — Pre-populated `known_hosts` line for the
+  PVE host (skip strict-host-key prompt). Optional; if absent the
+  workflow uses `accept-new` on first connect.
+- `PHOENIX_DEV_VM_VMID` — Container VMID (default `5700`).
+- `PHOENIX_DEPLOY_DEV_VM_IP` — IP of the dev VM for the post-install
+  health check (default `192.168.11.59`).
+
+After a successful run the workflow performs a non-stub probe: it POSTs
+`{ "target": "__bootstrap_probe__" }` with the deploy bearer token and
+fails the workflow if the response body still contains
+`Deploy request queued (stub)` or any auth-rejection signal. That gives
+you an unambiguous "the running service on CT 5700 is now post-stub"
+signal in CI logs.
+
+The workflow only triggers on `workflow_dispatch` (never on push) so
+deploy-service reinstalls remain a deliberate manual step.
+
 ## Adding more repos or VM targets

 Extend [deploy-targets.json](/home/intlc/projects/proxmox/phoenix-deploy-api/deploy-targets.json) with another entry.