furtka/docs/smoke-vm.md
Daniel Maksymilian Syrnicki d499907613
Some checks failed
Build ISO / smoke-vm (push) Blocked by required conditions
Build ISO / build-iso (push) Successful in 24m28s
CI / test (push) Successful in 3m1s
CI / validate-json (push) Successful in 55s
CI / markdown-links (push) Successful in 37s
CI / lint (push) Failing after 13m19s
feat(ci): auto-boot every main-ISO in smoke VM on .165 Proxmox
After build-iso, a new smoke-vm job uploads the freshly built ISO to
the test Proxmox at 192.168.178.165 via PVE API token, boots it in a
fresh VM (VMID range 9000-9099, MAC derived from commit SHA so the
runner can find the DHCP IP by scanning the LAN), and curls :5000 to
confirm the webinstaller answers HTTP 200. Last 5 smoke VMs + their
ISOs are kept for post-mortem; older ones are purged. continue-on-error
on the smoke job so a VM-side flake doesn't mark the ISO build red.

Shortens the feedback loop on ISO regressions from "next manual VM
test session" (days) to "next push" (minutes) — the 2026-04-15/16 VM
sessions each found real boot-time bugs that unit tests missed.

Docs at docs/smoke-vm.md. Requires Forgejo secrets PVE_TEST_HOST and
PVE_TEST_TOKEN (dedicated smoke@pve!ci PVE token, privilege-separated).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 11:41:44 +02:00

106 lines
4.3 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Smoke VM on Proxmox Test Host
Every push to `main` builds a fresh ISO (`build-iso.yml`) and then boots
it in a throwaway VM on the Proxmox test host — currently
`192.168.178.165` — to confirm the live ISO boots and the webinstaller
responds on `:5000`. If the smoke step fails, the ISO artifact is still
uploaded and the VM is left running for post-mortem.
The heavy lifting lives in [`scripts/smoke-vm.sh`](../scripts/smoke-vm.sh);
the workflow just downloads the artifact and shells out.
## Where smoke VMs live
- Node: whatever the test host reports as its node name (auto-detected)
- VMID range: `90009099` (`PVE_TEST_VMID_MIN` / `PVE_TEST_VMID_MAX`)
- Name: `furtka-smoke-<12-char-sha>`
- Tags: `furtka`, `smoke`, `sha-<12-char-sha>`
- MAC: `BC:24:11:<first-6-hex-of-sha>` (Proxmox's OUI; lets the runner
find the VM by scanning the LAN — the live ISO has no guest agent)
- ISO on test host: `local:iso/furtka-<short-sha>.iso`
Five most recent VMs (and their ISOs) are kept; anything older is stopped
and purged (`destroy-unreferenced-disks=1`) on the next run. Tune via
`PVE_TEST_KEEP`.
## Poking a failed smoke VM
1. Find it in the Proxmox WebUI — look for `furtka-smoke-<sha>` in the
9000-range. The VM is still running.
2. Console: **Console** tab in the WebUI (SPICE or noVNC). The webinstaller
logs to `journalctl -u furtka-webinstaller.service` on the live ISO.
3. SSH: the live Arch ISO ships `sshd` enabled with no root password.
Normally SSH as a LAN-reachable user is not possible without creds —
use the WebUI console instead. (The **installed** system, post-wizard,
has the `server` user with the password the wizard set.)
4. Fetch the short-sha from the VM name → cross-reference against
`git log` to see exactly which commit built the failing ISO.
## Running a smoke test locally
Needs LAN access to the test Proxmox and an API token with VM perms.
```bash
PVE_TEST_HOST=192.168.178.165 \
PVE_TEST_TOKEN='user@pve!smoke=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx' \
./scripts/smoke-vm.sh iso/out/furtka-*.iso
```
The script exits 0 on success, non-zero if the VM never served
`http://<ip>:5000`. Pruning runs either way.
## Clearing the 9000-range by hand
If smoke tests wedge or you want a clean slate:
```bash
# List smoke VMs
curl -sSk -H "Authorization: PVEAPIToken=${PVE_TEST_TOKEN}" \
https://192.168.178.165:8006/api2/json/nodes/<node>/qemu \
| python3 -c 'import json,sys; [print(v["vmid"],v["name"]) for v in json.load(sys.stdin)["data"] if 9000<=int(v["vmid"])<=9099]'
# Destroy one
curl -sSk -X POST -H "Authorization: PVEAPIToken=${PVE_TEST_TOKEN}" \
https://192.168.178.165:8006/api2/json/nodes/<node>/qemu/<vmid>/status/stop
curl -sSk -X DELETE -H "Authorization: PVEAPIToken=${PVE_TEST_TOKEN}" \
"https://192.168.178.165:8006/api2/json/nodes/<node>/qemu/<vmid>?purge=1&destroy-unreferenced-disks=1"
```
Or just run `scripts/smoke-vm.sh` with `PVE_TEST_KEEP=0` and any ISO —
the prune step will sweep everything in the range except the one it
just created.
## Proxmox API token setup (one-time)
1. WebUI → **Datacenter → Permissions → API Tokens → Add**
2. User: `root@pam` (or a dedicated `smoke@pve` user — see below)
3. Token ID: `smoke`
4. Uncheck **Privilege Separation** for the quick path, or keep it
separated and grant explicit perms below
5. Save the displayed secret once — it's shown only here
Minimum perms on `/` (if privilege-separated):
`VM.Allocate`, `VM.Config.Disk`, `VM.Config.CPU`, `VM.Config.Memory`,
`VM.Config.Network`, `VM.Config.Options`, `VM.Config.HWType`,
`VM.Config.CDROM`, `VM.PowerMgmt`, `VM.Audit`, `Datastore.AllocateTemplate`
(for ISO upload/delete on the `local` content store).
Set the result as Forgejo secret `PVE_TEST_TOKEN` in the format:
```
user@realm!tokenid=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
```
…and `PVE_TEST_HOST` as `192.168.178.165`. That's all the workflow needs.
## Assumptions
- Runner has L2 reachability to `192.168.178.0/24` (MAC→IP discovery
uses `arp-scan` from the runner).
- Test host uses default storage names: `local` for ISOs, `local-lvm` for
disks. Override via `PVE_TEST_ISO_STORAGE` / `PVE_TEST_DISK_STORAGE`.
- Bridge `vmbr0` carries LAN DHCP. Override via `PVE_TEST_BRIDGE`.
If any of those don't match, set the corresponding env var in
`build-iso.yml` (via `env:` on the smoke step) or override on the CLI
when running locally.