From dfdbdd69aa474cb57f7795e70a6418388fb37155 Mon Sep 17 00:00:00 2001
From: Daniel Maksymilian Syrnicki <info@danielmaksymilian.com>
Date: Tue, 14 Apr 2026 19:28:33 +0200
Subject: [PATCH] docs: sync README roadmap, runner-setup, and ops/ to today's
 reality
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

A lot moved since the last docs sweep. Catching everything up in one
batch so a newcomer (or future us) reading the repo isn't lied to.

**README.md roadmap:**
- Walking-skeleton live ISO: upgraded from "screens 1-3 work
  end-to-end" to "install runs to completion on a VM and the installed
  system logs in and runs `docker ps` without sudo".
- 26.0-alpha release: dropped the "deferred" note — its blocker
  (archinstall not completing) is gone; just needs a re-tag when we
  like the installer copy.
- Added an explicit "ISO-build in CI" line for the new
  `.forgejo/workflows/build-iso.yml`.
- Split the old "mDNS + local CA" item: mDNS is live (hostname baked
  in, avahi/nss-mdns in the image), HTTPS via local CA still open.
- Noted post-install reboot button, progress bar, archinstall 4.x
  schema work, console welcome, custom_commands docker group join in
  the wizard milestone bullet.

**docs/runner-setup.md:**
- Full rewrite for the docker-outside-of-docker architecture we
  actually run now (was still describing the DinD sidecar setup).
- Documents the `/data` symlink on the host that makes host-mode
  `-v /data/…:/work` resolve — the non-obvious piece that took the
  longest to nail down today.
- Describes the two runtime modes (`ubuntu-latest:docker://…` for CI,
  `self-hosted:host` for build-iso) and why each exists.
- Adds the `upload-artifact@v3` pin note — v4+ fails on Forgejo with
  `GHESNotSupportedError`.

**ops/forgejo-runner/compose.yml + config.yml:**
- Compose now matches what's actually running: DooD (no DinD sidecar),
  runs as root so apk can install nodejs + docker-cli at startup,
  /var/run/docker.sock bind-mounted.
- Config gets the three explicit label mappings and DooD
  `docker_host` + `valid_volumes`.

**.forgejo/workflows/build-iso.yml:**
- Added `paths-ignore` for docs/website/*.md so doc-only commits don't
  kick off 5-min ISO rebuilds. Code + ISO overlay changes still
  trigger.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 .forgejo/workflows/build-iso.yml |  14 ++-
 README.md                        |  16 ++--
 docs/runner-setup.md             | 158 +++++++++++++++++++++----------
 ops/forgejo-runner/compose.yml   |  28 +++---
 ops/forgejo-runner/config.yml    |  21 +++-
 5 files changed, 163 insertions(+), 74 deletions(-)

diff --git a/.forgejo/workflows/build-iso.yml b/.forgejo/workflows/build-iso.yml
index 195f8f9..30b3045 100644
--- a/.forgejo/workflows/build-iso.yml
+++ b/.forgejo/workflows/build-iso.yml
@@ -1,11 +1,19 @@
 name: Build ISO
 
-# Full ISO build is ~15-20 min. Only run on push-to-main and manual
-# dispatch so feature-branch iteration stays fast. See
-# memory/project_ci_branching for the rationale.
+# Full ISO build is ~5-7 min. Only run on push-to-main and manual
+# dispatch so feature-branch iteration stays fast. Docs-only changes
+# skip the build — the `paths-ignore` list below covers *.md files,
+# docs/, and the website (Hugo source). Anything that touches code,
+# the ISO overlay, or the workflow itself still triggers a rebuild.
 on:
   push:
     branches: [main]
+    paths-ignore:
+      - '**/*.md'
+      - 'docs/**'
+      - 'website/**'
+      - 'CHANGELOG.md'
+      - 'RELEASING.md'
   workflow_dispatch:
 
 concurrency:
diff --git a/README.md b/README.md
index 9f2f075..e823c4f 100644
--- a/README.md
+++ b/README.md
@@ -104,15 +104,17 @@ None of these nail the "your dad can set this up" experience. The installer wiza
 - [x] Competitor analysis — see [docs/competitors.md](docs/competitors.md)
 - [x] Wizard flow spec — see [docs/wizard-flow.md](docs/wizard-flow.md)
 - [x] Release process + CI — CalVer tags, conventional commits, Forgejo Actions (ruff, pytest, JSON, link checks), `26.0-alpha` tagged
-- [x] Forgejo runner live on Proxmox VM (`forge-runner-01`, Ubuntu 24.04, Docker + DinD sidecar) — setup captured in [docs/runner-setup.md](docs/runner-setup.md) + [ops/forgejo-runner/](ops/forgejo-runner/)
-- [ ] **Publish `26.0-alpha` Forgejo Release** — deferred. Walking-skeleton ISO boots but doesn't install yet; re-tag once `archinstall` actually completes end-to-end on a VM.
-- [x] **Walking-skeleton live ISO** — `iso/build.sh` produces a hybrid BIOS/UEFI Arch-based ISO that boots in a Proxmox VM, DHCP's onto the LAN, and serves the Flask webinstaller on `:5000`. Screens 1–3 work end-to-end. Build infra in [`iso/`](iso/).
-- [x] **Drop loop/rom devices from drive list** — `webinstaller/drives.py` now filters by `lsblk` `TYPE=disk`, so the live squashfs and CD-ROM no longer appear as install targets.
-- [x] **Rebrand GRUB menu** — `iso/build.sh` rewrites "Arch Linux install medium" → "Furtka Live Installer" across GRUB, syslinux, and systemd-boot configs.
-- [x] **S1 account form + overview → `archinstall`** — S1 collects hostname/user/password/language with validation, S2 picks boot drive, overview confirms, `/install/run` writes `user_configuration.json` + `user_credentials.json` (0600) and execs `archinstall --silent`, log page polls output. `FURTKA_DRY_RUN=1` skips the exec for testing.
+- [x] Forgejo runner live on Proxmox VM (`forge-runner-01`, Ubuntu 24.04) — docker-outside-of-docker with host-mode jobs for ISO builds, setup captured in [docs/runner-setup.md](docs/runner-setup.md) + [ops/forgejo-runner/](ops/forgejo-runner/)
+- [x] **ISO-build in CI** — `.forgejo/workflows/build-iso.yml` runs `iso/build.sh` on every push to `main` and publishes the resulting `.iso` as the `furtka-iso` artifact (14 d retention). Push → green run → download → test.
+- [ ] **Publish `26.0-alpha` Forgejo Release** — blocker is gone (end-to-end install now works on a VM), re-tag when we're happy with the installer copy.
+- [x] **Walking-skeleton live ISO — end to end** — `iso/build.sh` produces a hybrid BIOS/UEFI Arch-based ISO. It boots in a Proxmox VM, DHCPs onto the LAN, shows a console welcome with `http://proksi.local:5000` (+ IP fallback), serves the Flask webinstaller, runs `archinstall --silent`, reboots the VM via a Reboot-now button, and the installed system logs in and runs `docker ps` without sudo. Build infra in [`iso/`](iso/).
+- [x] **Drop loop/rom devices from drive list** — `webinstaller/drives.py` filters by `lsblk` `TYPE=disk`, so the live squashfs and CD-ROM no longer appear as install targets. Boot-USB filtering on bare metal is still TODO; see [iso/README.md](iso/README.md).
+- [x] **Rebrand GRUB menu** — `iso/build.sh` rewrites "Arch Linux install medium" → "Furtka Live Installer" across GRUB, syslinux, and systemd-boot configs; default entry marked `(Recommended)`.
+- [x] **Wizard: account form → drive picker → overview → archinstall** — S1 collects hostname/user/password/language with validation, S2 picks boot drive, overview confirms, `/install/run` writes `user_configuration.json` + `user_credentials.json` (0600) and execs `archinstall --silent` against its 4.x schema (`default_layout` disk_config + `!root-password` / `!password` sentinel keys + `custom_commands` for post-install group joins). Install log page polls a JSON endpoint and renders a phase-based progress bar with a collapsible raw log. `FURTKA_DRY_RUN=1` skips the real exec for testing.
+- [x] **mDNS `proksi.local`** — hostname baked into the live ISO, avahi + nss-mdns in the package list, advertised as soon as network-online fires. The HTTPS + local-CA half of this milestone is still open below.
 - [ ] **Base OS post-install** — what Furtka actually looks like *after* the wizard writes config + reboots: Caddy + Authentik + app store. Robert's area.
 - [ ] Installer wizard screens S3–S7 — per-device purpose, network, domain, SSL, diagnostic. S5/S6 blocked on managed-gateway DNS infra not yet built.
-- [ ] `https://proksi.local` via mDNS + local CA (currently only raw-IP HTTP)
+- [ ] `https://proksi.local` with a local CA (today: plain HTTP at `http://proksi.local:5000`)
 - [ ] Caddy + Authentik wired into first-boot bootstrap
 - [ ] Managed gateway infrastructure — `ns1/ns2.furtka.org` + DNS-01 wildcard automation
 - [ ] First containerized service (Nextcloud?) with auto-SSO + auto-subdomain
diff --git a/docs/runner-setup.md b/docs/runner-setup.md
index bbc3172..f99c48a 100644
--- a/docs/runner-setup.md
+++ b/docs/runner-setup.md
@@ -1,10 +1,12 @@
 # Forgejo Runner Setup
 
-How to stand up a `forgejo-runner` so the CI workflow in `.forgejo/workflows/ci.yml` actually executes on every push.
+How to stand up a `forgejo-runner` so the CI workflows under
+[`.forgejo/workflows/`](../.forgejo/workflows/) — `ci.yml` (lint,
+pytest, JSON & link checks) and `build-iso.yml` (produces the live
+ISO as a downloadable artifact) — run on every push to `main`.
 
-The runner is a long-running daemon that polls the Forgejo instance for queued jobs and runs them in Docker containers.
-
-A ready-to-use bootstrap script and compose file live under [`ops/forgejo-runner/`](../ops/forgejo-runner/).
+Ready-to-use `compose.yml` and `config.yml` live in
+[`ops/forgejo-runner/`](../ops/forgejo-runner/).
 
 ## Choosing a host
 
@@ -14,33 +16,62 @@ A ready-to-use bootstrap script and compose file live under [`ops/forgejo-runner
 | **Home server / NAS** | Free; plenty of capacity | CI blocked if home network / power drops |
 | **Local dev machine** | Quick to set up, fast runs | CI only works while the machine is on |
 
-Recommendation for now: **home server or a cheap VPS**. Don't use a laptop that suspends.
+Recommendation: **home server or a cheap VPS**. Don't use a laptop that suspends.
+
+## Architecture at a glance
+
+The runner uses **docker-outside-of-docker (DooD)**: it mounts the host's
+`/var/run/docker.sock` into itself and spawns job containers on the host
+daemon. We went back and forth on this — the tempting alternative is a
+docker-in-docker (DinD) sidecar for isolation — but DinD makes
+`iso/build.sh` fail: `build.sh` does its own nested `docker run -v …` and
+the path inside a DinD-hosted job isn't visible to host docker. DooD
+trades some isolation for paths that line up everywhere. This runner VM
+is single-purpose, so that trade is fine.
+
+One non-obvious piece: the runner's default internal data directory is
+`/data`. Host-mode jobs (see the `self-hosted:host` label below) tell
+host docker to bind-mount `/data/.cache/act/…/hostexecutor` — which is
+the container's filesystem path, not the host's. The fix is to make
+`/data` exist on the host too, pointing at the same files, via a symlink:
+
+```bash
+sudo ln -s /home/<user>/forgejo-runner/data /data
+```
+
+This one line is what lets `-v /data/…:/work` resolve correctly.
 
 ## Install
 
-Pick either the binary or the Docker container path. Docker is easier to upgrade.
-
-### Path A: Docker Compose (recommended)
-
-Copy `ops/forgejo-runner/compose.yml` and `ops/forgejo-runner/config.yml` from this repo to the host, e.g. into `~/forgejo-runner/` (compose file) and `~/forgejo-runner/data/` (config file). The runner talks to a sidecar Docker-in-Docker container via `tcp://docker-in-docker:2375`, so the host's own Docker socket is not exposed to jobs.
-
-If the host is a fresh Ubuntu VM, run `ops/forgejo-runner/bootstrap.sh` first to install Docker Engine + the Compose plugin from the official repo.
-
-### Path B: Binary
-
-Download the latest release from https://code.forgejo.org/forgejo/runner/releases and drop it somewhere in `$PATH`:
+On a fresh Ubuntu VM:
 
 ```bash
-wget https://code.forgejo.org/forgejo/runner/releases/download/v6.0.0/forgejo-runner-6.0.0-linux-amd64
-chmod +x forgejo-runner-6.0.0-linux-amd64
-sudo mv forgejo-runner-6.0.0-linux-amd64 /usr/local/bin/forgejo-runner
+# Docker Engine + compose plugin (official repo)
+./ops/forgejo-runner/bootstrap.sh
+
+# Node.js on the HOST is not required — the runner container installs
+# it inside itself on startup. But host tools help for debugging.
+```
+
+Copy the reference `compose.yml` and `config.yml` to `~/forgejo-runner/`
+and `~/forgejo-runner/data/` respectively. Create the `/data` symlink:
+
+```bash
+mkdir -p ~/forgejo-runner/data
+cp ops/forgejo-runner/compose.yml ~/forgejo-runner/compose.yml
+cp ops/forgejo-runner/config.yml ~/forgejo-runner/data/config.yml
+sudo ln -s "$HOME/forgejo-runner/data" /data
 ```
 
 ## Register
 
-1. In the Forgejo web UI: go to **Site Administration → Actions → Runners → Create new Runner**. Copy the registration token. (For a repo-scoped runner instead, use **Repo Settings → Actions → Runners**.)
+1. In the Forgejo web UI: **Site Administration → Actions → Runners →
+   Create new Runner** (or **Repo Settings → Actions → Runners** for a
+   repo-scoped runner). Copy the registration token.
 
-2. Register from the runner host by running the registration inside a one-shot container so the output lands in the mounted `data/` directory:
+2. Register from the host by running the registration inside a one-shot
+   container so the resulting `.runner` file lands in the mounted
+   `data/` directory:
 
    ```bash
    cd ~/forgejo-runner
@@ -49,47 +80,78 @@ sudo mv forgejo-runner-6.0.0-linux-amd64 /usr/local/bin/forgejo-runner
        --instance https://forgejo.sourcegate.online \
        --token <TOKEN> \
        --name forge-runner-01 \
-       --labels 'docker:docker://catthehacker/ubuntu:act-latest,ubuntu-latest:docker://catthehacker/ubuntu:act-latest,self-hosted:docker://catthehacker/ubuntu:act-latest' \
        --no-interactive
    ```
 
-   Labels *must* use the `<name>:docker://<image>` form — bare labels (`ubuntu-latest`) get stored as `ubuntu-latest:host`, which tells the runner to execute jobs directly inside the runner container (no Python, no git, nothing). `catthehacker/ubuntu:act-latest` is the common drop-in image with GitHub Actions tooling preinstalled.
+   Note: labels are configured in `config.yml`, not at registration
+   time — `config.yml` has `labels:` populated with the three we use
+   (`ubuntu-latest`, `docker`, `self-hosted`), each mapped to either
+   a container image or `:host` mode.
 
 3. Start the daemon: `docker compose up -d`.
 
-4. Verify the runner shows up as **Idle** in Forgejo's admin Runners page and the log prints `runner: forge-runner-01, ..., declared successfully`.
+4. Verify in Forgejo admin → Actions → Runners that `forge-runner-01`
+   shows as **Idle**, and `docker logs forgejo-runner` prints
+   `runner: forge-runner-01, ..., declared successfully` along with
+   the installed `node` + `docker-cli` versions.
+
+## Two runtime modes
+
+The `config.yml` labels set up two job execution modes:
+
+- **`ubuntu-latest` / `docker` → `docker://catthehacker/ubuntu:act-latest`.**
+  The standard mode. Jobs run in a fresh `catthehacker/ubuntu:act-latest`
+  container. Good isolation, standard GHA-compatible image. Used by
+  `ci.yml` (ruff, pytest, JSON & link checks).
+
+- **`self-hosted` → `:host`.** Steps execute *directly* in the runner
+  container (no per-job wrapping container). Used by `build-iso.yml`
+  because `iso/build.sh` needs `docker run -v $REPO_ROOT:/work` to hit
+  a path host docker can resolve — wrapping in a job container
+  reintroduces the namespace mismatch.
+
+Because host-mode jobs run inside the runner container, that container
+needs tools the jobs invoke — Node (for JS-based actions like
+`actions/checkout@v4`), Git (already in the base image), and the Docker
+CLI (for `iso/build.sh`). The `command:` in `compose.yml` apk-installs
+nodejs + docker-cli before launching the daemon, so those tools are
+always present after container start.
 
 ## First CI run
 
-Push any commit; the Actions tab on the repo should show the workflow running. If nothing happens:
+Push a commit to `main` — the Actions tab should show:
 
-- Confirm the runner is online (Forgejo admin → Actions → Runners).
-- Check the workflow has labels that match the runner (`runs-on: ubuntu-latest` needs a runner registered with that label).
-- Check the runner logs: `docker logs forgejo-runner` or the systemd journal.
+- `CI` workflow (`ci.yml`) running lint, tests, JSON validation, markdown
+  links. Green in ~30 s.
+- `Build ISO` workflow (`build-iso.yml`) running `iso/build.sh` inside
+  the runner container. Takes ~5 min (pacstrap + mkarchiso). The
+  resulting `.iso` lands as a `furtka-iso` artifact on the run page,
+  retained 14 days.
 
-## Systemd unit (for the binary path)
+If the workflow queues forever, check:
 
-```ini
-[Unit]
-Description=Forgejo Actions Runner
-After=docker.service
-Requires=docker.service
+- Runner online in Forgejo admin.
+- `docker logs forgejo-runner` for errors.
+- The workflow's `runs-on:` matches a label the runner advertises.
 
-[Service]
-ExecStart=/usr/local/bin/forgejo-runner daemon
-WorkingDirectory=/var/lib/forgejo-runner
-User=forgejo-runner
-Restart=on-failure
+## Artifact compatibility note
 
-[Install]
-WantedBy=multi-user.target
-```
-
-Save as `/etc/systemd/system/forgejo-runner.service`, then `sudo systemctl enable --now forgejo-runner`.
+Forgejo's Actions API is GHES-compatible (not full GHA), so use
+`actions/upload-artifact@v3` — **v4+ fails with
+`GHESNotSupportedError`** because it needs the newer `@actions/artifact`
+protocol Forgejo hasn't implemented yet.
 
 ## Security notes
 
-- Jobs run inside a Docker-in-Docker sidecar, not against the host's Docker socket. Still, DinD runs privileged — give the runner its own VM, not a shared host.
-- Registration tokens are one-shot; a stolen token can't re-register after the runner is live.
-- Prefer repo-scoped runners over instance-wide if you're sharing the runner with other repos you don't control.
-- Ubuntu's default systemd-resolved makes the host's stub resolver (`127.0.0.53`) inherit a LAN DNS server that Docker containers may not be able to reach. If container DNS fails, set explicit upstream DNS in `/etc/docker/daemon.json` (e.g. `{"dns": ["1.1.1.1", "8.8.8.8"]}`) and `sudo systemctl restart docker`.
+- DooD gives jobs full access to the host's docker daemon — they can
+  spawn arbitrary containers, including `--privileged` ones. Keep the
+  runner VM dedicated to CI; don't run other user workloads on it.
+- The runner container itself runs as root (`user: "0:0"`). This is
+  acceptable because the whole VM is purpose-built, but it's a bigger
+  footgun than the standard non-root runner image default.
+- Registration tokens are one-shot; once a runner is live, the token
+  can't re-register.
+- Ubuntu's `systemd-resolved` stub resolver (`127.0.0.53`) sometimes
+  leaks LAN-only DNS servers that containers can't reach. If container
+  DNS fails, set explicit upstream DNS in `/etc/docker/daemon.json`
+  (e.g. `{"dns": ["1.1.1.1", "8.8.8.8"]}`) and restart docker.
diff --git a/ops/forgejo-runner/compose.yml b/ops/forgejo-runner/compose.yml
index faf6c66..7f33e71 100644
--- a/ops/forgejo-runner/compose.yml
+++ b/ops/forgejo-runner/compose.yml
@@ -3,20 +3,22 @@ services:
     image: code.forgejo.org/forgejo/runner:6
     container_name: forgejo-runner
     restart: unless-stopped
+    # Running as root so (1) apk can install nodejs + docker-cli at
+    # startup (needed by host-mode jobs that execute JS actions and by
+    # `iso/build.sh` which shells out to `docker run`), and (2) access
+    # to the host docker socket doesn't require group juggling.
+    user: "0:0"
     environment:
-      - DOCKER_HOST=tcp://docker-in-docker:2375
+      - DOCKER_HOST=unix:///var/run/docker.sock
       - CONFIG_FILE=/data/config.yml
+    # Mount at /data so the container's data path matches the host path
+    # /data (which is a symlink to this directory — see runner-setup.md).
+    # When a host-mode job does `docker run -v /data/.cache/act/…:/work`,
+    # host docker resolves the source via the symlink instead of failing
+    # with "no such file or directory".
     volumes:
       - ./data:/data
-    depends_on:
-      - docker-in-docker
-    command: /bin/sh -c "sleep 5; forgejo-runner daemon --config /data/config.yml"
-
-  docker-in-docker:
-    image: docker:dind
-    container_name: forgejo-runner-dind
-    restart: unless-stopped
-    privileged: true
-    environment:
-      - DOCKER_TLS_CERTDIR=
-    command: dockerd -H tcp://0.0.0.0:2375 --tls=false
+      - /var/run/docker.sock:/var/run/docker.sock
+    command: >-
+      /bin/sh -c "apk add --no-cache nodejs docker-cli && sleep 5 &&
+      forgejo-runner daemon --config /data/config.yml"
diff --git a/ops/forgejo-runner/config.yml b/ops/forgejo-runner/config.yml
index 849ed2c..ff3bbed 100644
--- a/ops/forgejo-runner/config.yml
+++ b/ops/forgejo-runner/config.yml
@@ -10,7 +10,16 @@ runner:
   fetch_timeout: 5s
   fetch_interval: 2s
   report_interval: 1s
-  labels: []
+  # Label mappings decide how each `runs-on:` value is executed. The
+  # `:host` suffix means "run steps directly in the runner container"
+  # (no wrapping job container). build-iso uses `runs-on: self-hosted`
+  # because its `docker run -v $REPO_ROOT:/work` needs host-visible
+  # paths — nested containers would put the workspace in a namespace
+  # host docker can't see.
+  labels:
+    - "ubuntu-latest:docker://catthehacker/ubuntu:act-latest"
+    - "docker:docker://catthehacker/ubuntu:act-latest"
+    - "self-hosted:host"
 
 cache:
   enabled: true
@@ -22,8 +31,14 @@ cache:
 container:
   network: ""
   privileged: false
-  valid_volumes: []
-  docker_host: "tcp://docker-in-docker:2375"
+  # Docker-outside-of-docker: runner and all job containers share the
+  # host's docker daemon via the unix socket. valid_volumes whitelists
+  # the socket so it can be mounted into job containers (the runner
+  # handles this automatically — don't also mount it from a workflow
+  # or you'll get "duplicate mount point").
+  valid_volumes:
+    - "/var/run/docker.sock"
+  docker_host: "unix:///var/run/docker.sock"
   force_pull: false
 
 host: