Commit graph

11 commits

Author SHA1 Message Date
7b894f096f fix(ci): inline smoke-vm as a step instead of a downstream job
All checks were successful
CI / lint (pull_request) Successful in 1m6s
CI / test (pull_request) Successful in 1m23s
CI / validate-json (pull_request) Successful in 56s
CI / markdown-links (pull_request) Successful in 24s
The separate smoke-vm job with `needs: build-iso` required round-tripping
the 1.5 GB ISO through actions/upload-artifact + download-artifact. v3
on Forgejo has a known issue where large artifacts stall at 0.0% in the
download step — the smoke run hung today with endless "Total file count:
1 ---- Processed file #0 (0.0%)" output.

Since both jobs run on the same self-hosted runner (host mode, same
workspace available), there was never a real need for the artifact
indirection. Inlining as a step after the artifact upload reuses the
ISO already in iso/out/ and skips the download entirely.

step-level continue-on-error preserves the original guarantee that a
VM-side flake doesn't mark the ISO build red.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 12:20:58 +02:00
d499907613 feat(ci): auto-boot every main-ISO in smoke VM on .165 Proxmox
Some checks failed
Build ISO / smoke-vm (push) Blocked by required conditions
Build ISO / build-iso (push) Successful in 24m28s
CI / test (push) Successful in 3m1s
CI / validate-json (push) Successful in 55s
CI / markdown-links (push) Successful in 37s
CI / lint (push) Failing after 13m19s
After build-iso, a new smoke-vm job uploads the freshly built ISO to
the test Proxmox at 192.168.178.165 via PVE API token, boots it in a
fresh VM (VMID range 9000-9099, MAC derived from commit SHA so the
runner can find the DHCP IP by scanning the LAN), and curls :5000 to
confirm the webinstaller answers HTTP 200. Last 5 smoke VMs + their
ISOs are kept for post-mortem; older ones are purged. continue-on-error
on the smoke job so a VM-side flake doesn't mark the ISO build red.

Shortens the feedback loop on ISO regressions from "next manual VM
test session" (days) to "next push" (minutes) — the 2026-04-15/16 VM
sessions each found real boot-time bugs that unit tests missed.

Docs at docs/smoke-vm.md. Requires Forgejo secrets PVE_TEST_HOST and
PVE_TEST_TOKEN (dedicated smoke@pve!ci PVE token, privilege-separated).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 11:41:44 +02:00
dfdbdd69aa docs: sync README roadmap, runner-setup, and ops/ to today's reality
All checks were successful
Build ISO / build-iso (push) Successful in 17m13s
CI / lint (push) Successful in 26s
CI / test (push) Successful in 32s
CI / validate-json (push) Successful in 22s
CI / markdown-links (push) Successful in 13s
A lot moved since the last docs sweep. Catching everything up in one
batch so a newcomer (or future us) reading the repo isn't lied to.

**README.md roadmap:**
- Walking-skeleton live ISO: upgraded from "screens 1-3 work
  end-to-end" to "install runs to completion on a VM and the installed
  system logs in and runs `docker ps` without sudo".
- 26.0-alpha release: dropped the "deferred" note — its blocker
  (archinstall not completing) is gone; just needs a re-tag when we
  like the installer copy.
- Added an explicit "ISO-build in CI" line for the new
  `.forgejo/workflows/build-iso.yml`.
- Split the old "mDNS + local CA" item: mDNS is live (hostname baked
  in, avahi/nss-mdns in the image), HTTPS via local CA still open.
- Noted post-install reboot button, progress bar, archinstall 4.x
  schema work, console welcome, custom_commands docker group join in
  the wizard milestone bullet.

**docs/runner-setup.md:**
- Full rewrite for the docker-outside-of-docker architecture we
  actually run now (was still describing the DinD sidecar setup).
- Documents the `/data` symlink on the host that makes host-mode
  `-v /data/…:/work` resolve — the non-obvious piece that took the
  longest to nail down today.
- Describes the two runtime modes (`ubuntu-latest:docker://…` for CI,
  `self-hosted:host` for build-iso) and why each exists.
- Adds the `upload-artifact@v3` pin note — v4+ fails on Forgejo with
  `GHESNotSupportedError`.

**ops/forgejo-runner/compose.yml + config.yml:**
- Compose now matches what's actually running: DooD (no DinD sidecar),
  runs as root so apk can install nodejs + docker-cli at startup,
  /var/run/docker.sock bind-mounted.
- Config gets the three explicit label mappings and DooD
  `docker_host` + `valid_volumes`.

**.forgejo/workflows/build-iso.yml:**
- Added `paths-ignore` for docs/website/*.md so doc-only commits don't
  kick off 5-min ISO rebuilds. Code + ISO overlay changes still
  trigger.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 19:28:33 +02:00
05ef50f74e ci: pin upload-artifact to v3 — v4+ unsupported on forgejo
All checks were successful
Build ISO / build-iso (push) Successful in 17m29s
CI / lint (push) Successful in 25s
CI / test (push) Successful in 32s
CI / validate-json (push) Successful in 24s
CI / markdown-links (push) Successful in 12s
Forgejo Actions only speaks the GHES-compatible @actions/artifact
protocol; upload-artifact@v4+ insists on the newer API and fails with
`GHESNotSupportedError`. Pin to v3, which uses the old protocol that
Forgejo implements.

Good news: the ISO itself built end-to-end in ~5m on the runner
(DooD + /data symlink resolved the path-mismatch). Only the upload
failed, and this pins it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 19:10:16 +02:00
e9e8bd3319 ci: run build-iso on the runner host (DooD path fix)
Some checks failed
CI / test (push) Waiting to run
Build ISO / build-iso (push) Failing after 6s
CI / lint (push) Successful in 25s
CI / validate-json (push) Successful in 23s
CI / markdown-links (push) Has been cancelled
Now that the runner uses docker-outside-of-docker, volume mounts in
`build.sh` (`docker run -v \$REPO_ROOT:/work ...`) are interpreted by
host docker — so `\$REPO_ROOT` must be a real host path. When the job
runs inside a job container, `\$REPO_ROOT` is only valid in the job
container's filesystem namespace and host docker can't find it, hence
`bash: /work/iso/build.sh: No such file or directory`.

Fix: switch `runs-on` to `self-hosted`. Forgejo-runner exposes that
label out of the box and, with no matching container image mapping,
runs steps directly on the runner VM. Checkout writes to a real host
path; `docker run -v …` then mounts a path both the outer CLI and
host docker agree on.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 18:50:47 +02:00
a6cccc67c1 ci: drop duplicate docker.sock mount in build-iso
Some checks failed
Build ISO / build-iso (push) Failing after 6s
CI / lint (push) Successful in 26s
CI / test (push) Successful in 33s
CI / validate-json (push) Successful in 23s
CI / markdown-links (push) Has been cancelled
Forgejo-runner's valid_volumes already injects /var/run/docker.sock
into every job container, so the explicit `container.volumes` mount
in the workflow triggered 'Duplicate mount point' and the job never
started. Removed — DOCKER_HOST env is enough.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 18:49:12 +02:00
0f0308bf68 ci: switch build-iso to docker-outside-of-docker
Some checks failed
Build ISO / build-iso (push) Failing after 46s
CI / lint (push) Successful in 25s
CI / test (push) Successful in 32s
CI / validate-json (push) Successful in 24s
CI / markdown-links (push) Successful in 14s
The DinD setup was the wrong tool here: forgejo-runner runs on host
docker, but it spawned jobs via the DinD sidecar — meaning jobs
were isolated inside DinD's own docker namespace and couldn't reach
`docker-in-docker` by hostname, and couldn't see the
`forgejo-runner_default` network (which only exists on host docker).

Switched the runner (compose.yml + data/config.yml) to talk directly
to host docker via `/var/run/docker.sock` and added it to the host
`docker` group (GID 988) so the non-root runner user can use the
socket. `valid_volumes` now whitelists the socket so job containers
can mount it too.

Workflow now mounts /var/run/docker.sock into the job container and
points DOCKER_HOST at that unix socket. `./iso/build.sh` then runs
its inner `docker run --privileged archlinux:latest` against the
host daemon — no nested docker.

Tradeoff: this is less isolated than DinD (jobs have full host docker
access — they could spawn arbitrary containers), but on a dedicated
single-user build VM the DooD simplification is worth it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 18:45:32 +02:00
4c5a00a0e0 ci: drop ineffective container.options override for build-iso
Some checks failed
Build ISO / build-iso (push) Failing after 1s
CI / lint (push) Failing after 1s
CI / test (push) Failing after 1s
CI / validate-json (push) Failing after 1s
CI / markdown-links (push) Failing after 1s
forgejo-runner 6.4 filters `--network` out of `container.options`, so
the workflow-level override was silently ignored and the job kept
landing on a per-task network where `docker-in-docker` didn't resolve.
Fixed at the right level by editing the runner's `/data/config.yml`
(`container.network: "forgejo-runner_default"`) and restarting the
forgejo-runner container — every job now joins the shared network so
DOCKER_HOST=tcp://docker-in-docker:2375 just works.

Workflow trimmed back to only what's needed: DOCKER_HOST env pin. The
default runner image (catthehacker/ubuntu:act-latest) already has the
docker CLI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 18:40:16 +02:00
ba36bb4741 ci: attach build-iso job to DinD network, pin lychee-action source
Some checks failed
Build ISO / build-iso (push) Failing after 5s
CI / lint (push) Successful in 27s
CI / test (push) Successful in 44s
CI / markdown-links (push) Failing after 1s
CI / validate-json (push) Failing after 10m34s
- build-iso: the job container was on a per-job docker network, so
  `docker-in-docker` (the DinD sidecar hostname on
  `forgejo-runner_default`) didn't resolve. Pin the container to that
  shared network via `container.options: --network forgejo-runner_default`.
  catthehacker/ubuntu:act-latest already has the docker CLI, so drop
  the apt-get step.

- ci.yml markdown-links: forgejo's action mirror at data.forgejo.org
  doesn't carry `lycheeverse/lychee-action`, so `uses:` was 404ing
  before the step could even run (rendering continue-on-error moot).
  Fully-qualified GitHub URL bypasses the mirror.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 18:37:54 +02:00
a777efd4c0 ci: green the pipeline — tests match 4.x schema, build-iso hits DinD, lint clean
Some checks failed
Build ISO / build-iso (push) Failing after 20s
CI / lint (push) Successful in 26s
CI / test (push) Successful in 31s
CI / validate-json (push) Successful in 23s
CI / markdown-links (push) Failing after 2s
Three things are broken on origin/main as of 6114cb2, all found in one
red CI run:

- build-iso workflow couldn't reach docker. forgejo-runner's config
  sets `docker_host: tcp://docker-in-docker:2375` but that env doesn't
  propagate into job containers on `runs-on: ubuntu-latest`, and the
  default job image has no docker CLI. Fix: pin `DOCKER_HOST` on the
  job and apt-install `docker.io` before invoking `iso/build.sh`.

- Two tests asserted on the pre-4.x archinstall schema:
  `creds["root_password"]` (now `!root-password`) and
  `cfg["disk_config"]["device"]` / `cfg["users"]` (users moved to
  creds; disk_config is now a full `default_layout` dict). Rewrote
  the tests to reflect 4.x reality and monkeypatched `build_disk_config`
  since its real body imports archinstall, which isn't on CI.

- Ruff flagged one line of `PROGRESS_PHASES` at 107 chars — collapsed
  the column alignment. `ruff format` pulled in a couple of cosmetic
  expansions in spawn_archinstall and the tests that had been drifting.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 18:29:42 +02:00
6114cb2f27 ci: build the live ISO on push-to-main and publish as artifact
Some checks failed
Build ISO / build-iso (push) Failing after 19s
CI / lint (push) Failing after 27s
CI / test (push) Failing after 41s
CI / validate-json (push) Successful in 24s
CI / markdown-links (push) Failing after 2s
Adds `.forgejo/workflows/build-iso.yml` that runs `./iso/build.sh` and
uploads the resulting ISO as a `furtka-iso` artifact (retained 14 days).
Triggers on `push: branches: [main]` and `workflow_dispatch` only —
feature branches don't pay the 15-20 min build cost. `concurrency`
cancels older runs of the same ref so only the most recent push
produces an artifact.

This is what Robert asked for: push change → download ISO from the
Forgejo run → test without needing a laptop to build.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 18:13:15 +02:00