furtka/docs/smoke-vm.md
Daniel Maksymilian Syrnicki d499907613
Some checks failed
Build ISO / smoke-vm (push) Blocked by required conditions
Build ISO / build-iso (push) Successful in 24m28s
CI / test (push) Successful in 3m1s
CI / validate-json (push) Successful in 55s
CI / markdown-links (push) Successful in 37s
CI / lint (push) Failing after 13m19s
feat(ci): auto-boot every main-ISO in smoke VM on .165 Proxmox
After build-iso, a new smoke-vm job uploads the freshly built ISO to
the test Proxmox at 192.168.178.165 via PVE API token, boots it in a
fresh VM (VMID range 9000-9099, MAC derived from commit SHA so the
runner can find the DHCP IP by scanning the LAN), and curls :5000 to
confirm the webinstaller answers HTTP 200. Last 5 smoke VMs + their
ISOs are kept for post-mortem; older ones are purged. continue-on-error
on the smoke job so a VM-side flake doesn't mark the ISO build red.

Shortens the feedback loop on ISO regressions from "next manual VM
test session" (days) to "next push" (minutes) — the 2026-04-15/16 VM
sessions each found real boot-time bugs that unit tests missed.

Docs at docs/smoke-vm.md. Requires Forgejo secrets PVE_TEST_HOST and
PVE_TEST_TOKEN (dedicated smoke@pve!ci PVE token, privilege-separated).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 11:41:44 +02:00

4.3 KiB
Raw Permalink Blame History

Smoke VM on Proxmox Test Host

Every push to main builds a fresh ISO (build-iso.yml) and then boots it in a throwaway VM on the Proxmox test host — currently 192.168.178.165 — to confirm the live ISO boots and the webinstaller responds on :5000. If the smoke step fails, the ISO artifact is still uploaded and the VM is left running for post-mortem.

The heavy lifting lives in scripts/smoke-vm.sh; the workflow just downloads the artifact and shells out.

Where smoke VMs live

  • Node: whatever the test host reports as its node name (auto-detected)
  • VMID range: 90009099 (PVE_TEST_VMID_MIN / PVE_TEST_VMID_MAX)
  • Name: furtka-smoke-<12-char-sha>
  • Tags: furtka, smoke, sha-<12-char-sha>
  • MAC: BC:24:11:<first-6-hex-of-sha> (Proxmox's OUI; lets the runner find the VM by scanning the LAN — the live ISO has no guest agent)
  • ISO on test host: local:iso/furtka-<short-sha>.iso

Five most recent VMs (and their ISOs) are kept; anything older is stopped and purged (destroy-unreferenced-disks=1) on the next run. Tune via PVE_TEST_KEEP.

Poking a failed smoke VM

  1. Find it in the Proxmox WebUI — look for furtka-smoke-<sha> in the 9000-range. The VM is still running.
  2. Console: Console tab in the WebUI (SPICE or noVNC). The webinstaller logs to journalctl -u furtka-webinstaller.service on the live ISO.
  3. SSH: the live Arch ISO ships sshd enabled with no root password. Normally SSH as a LAN-reachable user is not possible without creds — use the WebUI console instead. (The installed system, post-wizard, has the server user with the password the wizard set.)
  4. Fetch the short-sha from the VM name → cross-reference against git log to see exactly which commit built the failing ISO.

Running a smoke test locally

Needs LAN access to the test Proxmox and an API token with VM perms.

PVE_TEST_HOST=192.168.178.165 \
PVE_TEST_TOKEN='user@pve!smoke=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' \
./scripts/smoke-vm.sh iso/out/furtka-*.iso

The script exits 0 on success, non-zero if the VM never served http://<ip>:5000. Pruning runs either way.

Clearing the 9000-range by hand

If smoke tests wedge or you want a clean slate:

# List smoke VMs
curl -sSk -H "Authorization: PVEAPIToken=${PVE_TEST_TOKEN}" \
  https://192.168.178.165:8006/api2/json/nodes/<node>/qemu \
  | python3 -c 'import json,sys; [print(v["vmid"],v["name"]) for v in json.load(sys.stdin)["data"] if 9000<=int(v["vmid"])<=9099]'

# Destroy one
curl -sSk -X POST -H "Authorization: PVEAPIToken=${PVE_TEST_TOKEN}" \
  https://192.168.178.165:8006/api2/json/nodes/<node>/qemu/<vmid>/status/stop
curl -sSk -X DELETE -H "Authorization: PVEAPIToken=${PVE_TEST_TOKEN}" \
  "https://192.168.178.165:8006/api2/json/nodes/<node>/qemu/<vmid>?purge=1&destroy-unreferenced-disks=1"

Or just run scripts/smoke-vm.sh with PVE_TEST_KEEP=0 and any ISO — the prune step will sweep everything in the range except the one it just created.

Proxmox API token setup (one-time)

  1. WebUI → Datacenter → Permissions → API Tokens → Add
  2. User: root@pam (or a dedicated smoke@pve user — see below)
  3. Token ID: smoke
  4. Uncheck Privilege Separation for the quick path, or keep it separated and grant explicit perms below
  5. Save the displayed secret once — it's shown only here

Minimum perms on / (if privilege-separated): VM.Allocate, VM.Config.Disk, VM.Config.CPU, VM.Config.Memory, VM.Config.Network, VM.Config.Options, VM.Config.HWType, VM.Config.CDROM, VM.PowerMgmt, VM.Audit, Datastore.AllocateTemplate (for ISO upload/delete on the local content store).

Set the result as Forgejo secret PVE_TEST_TOKEN in the format:

user@realm!tokenid=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx

…and PVE_TEST_HOST as 192.168.178.165. That's all the workflow needs.

Assumptions

  • Runner has L2 reachability to 192.168.178.0/24 (MAC→IP discovery uses arp-scan from the runner).
  • Test host uses default storage names: local for ISOs, local-lvm for disks. Override via PVE_TEST_ISO_STORAGE / PVE_TEST_DISK_STORAGE.
  • Bridge vmbr0 carries LAN DHCP. Override via PVE_TEST_BRIDGE.

If any of those don't match, set the corresponding env var in build-iso.yml (via env: on the smoke step) or override on the CLI when running locally.