From dfdbdd69aa474cb57f7795e70a6418388fb37155 Mon Sep 17 00:00:00 2001 From: Daniel Maksymilian Syrnicki Date: Tue, 14 Apr 2026 19:28:33 +0200 Subject: [PATCH] docs: sync README roadmap, runner-setup, and ops/ to today's reality MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit A lot moved since the last docs sweep. Catching everything up in one batch so a newcomer (or future us) reading the repo isn't lied to. **README.md roadmap:** - Walking-skeleton live ISO: upgraded from "screens 1-3 work end-to-end" to "install runs to completion on a VM and the installed system logs in and runs `docker ps` without sudo". - 26.0-alpha release: dropped the "deferred" note — its blocker (archinstall not completing) is gone; just needs a re-tag when we like the installer copy. - Added an explicit "ISO-build in CI" line for the new `.forgejo/workflows/build-iso.yml`. - Split the old "mDNS + local CA" item: mDNS is live (hostname baked in, avahi/nss-mdns in the image), HTTPS via local CA still open. - Noted post-install reboot button, progress bar, archinstall 4.x schema work, console welcome, custom_commands docker group join in the wizard milestone bullet. **docs/runner-setup.md:** - Full rewrite for the docker-outside-of-docker architecture we actually run now (was still describing the DinD sidecar setup). - Documents the `/data` symlink on the host that makes host-mode `-v /data/…:/work` resolve — the non-obvious piece that took the longest to nail down today. - Describes the two runtime modes (`ubuntu-latest:docker://…` for CI, `self-hosted:host` for build-iso) and why each exists. - Adds the `upload-artifact@v3` pin note — v4+ fails on Forgejo with `GHESNotSupportedError`. **ops/forgejo-runner/compose.yml + config.yml:** - Compose now matches what's actually running: DooD (no DinD sidecar), runs as root so apk can install nodejs + docker-cli at startup, /var/run/docker.sock bind-mounted. - Config gets the three explicit label mappings and DooD `docker_host` + `valid_volumes`. **.forgejo/workflows/build-iso.yml:** - Added `paths-ignore` for docs/website/*.md so doc-only commits don't kick off 5-min ISO rebuilds. Code + ISO overlay changes still trigger. Co-Authored-By: Claude Opus 4.6 (1M context) --- .forgejo/workflows/build-iso.yml | 14 ++- README.md | 16 ++-- docs/runner-setup.md | 158 +++++++++++++++++++++---------- ops/forgejo-runner/compose.yml | 28 +++--- ops/forgejo-runner/config.yml | 21 +++- 5 files changed, 163 insertions(+), 74 deletions(-) diff --git a/.forgejo/workflows/build-iso.yml b/.forgejo/workflows/build-iso.yml index 195f8f9..30b3045 100644 --- a/.forgejo/workflows/build-iso.yml +++ b/.forgejo/workflows/build-iso.yml @@ -1,11 +1,19 @@ name: Build ISO -# Full ISO build is ~15-20 min. Only run on push-to-main and manual -# dispatch so feature-branch iteration stays fast. See -# memory/project_ci_branching for the rationale. +# Full ISO build is ~5-7 min. Only run on push-to-main and manual +# dispatch so feature-branch iteration stays fast. Docs-only changes +# skip the build — the `paths-ignore` list below covers *.md files, +# docs/, and the website (Hugo source). Anything that touches code, +# the ISO overlay, or the workflow itself still triggers a rebuild. on: push: branches: [main] + paths-ignore: + - '**/*.md' + - 'docs/**' + - 'website/**' + - 'CHANGELOG.md' + - 'RELEASING.md' workflow_dispatch: concurrency: diff --git a/README.md b/README.md index 9f2f075..e823c4f 100644 --- a/README.md +++ b/README.md @@ -104,15 +104,17 @@ None of these nail the "your dad can set this up" experience. The installer wiza - [x] Competitor analysis — see [docs/competitors.md](docs/competitors.md) - [x] Wizard flow spec — see [docs/wizard-flow.md](docs/wizard-flow.md) - [x] Release process + CI — CalVer tags, conventional commits, Forgejo Actions (ruff, pytest, JSON, link checks), `26.0-alpha` tagged -- [x] Forgejo runner live on Proxmox VM (`forge-runner-01`, Ubuntu 24.04, Docker + DinD sidecar) — setup captured in [docs/runner-setup.md](docs/runner-setup.md) + [ops/forgejo-runner/](ops/forgejo-runner/) -- [ ] **Publish `26.0-alpha` Forgejo Release** — deferred. Walking-skeleton ISO boots but doesn't install yet; re-tag once `archinstall` actually completes end-to-end on a VM. -- [x] **Walking-skeleton live ISO** — `iso/build.sh` produces a hybrid BIOS/UEFI Arch-based ISO that boots in a Proxmox VM, DHCP's onto the LAN, and serves the Flask webinstaller on `:5000`. Screens 1–3 work end-to-end. Build infra in [`iso/`](iso/). -- [x] **Drop loop/rom devices from drive list** — `webinstaller/drives.py` now filters by `lsblk` `TYPE=disk`, so the live squashfs and CD-ROM no longer appear as install targets. -- [x] **Rebrand GRUB menu** — `iso/build.sh` rewrites "Arch Linux install medium" → "Furtka Live Installer" across GRUB, syslinux, and systemd-boot configs. -- [x] **S1 account form + overview → `archinstall`** — S1 collects hostname/user/password/language with validation, S2 picks boot drive, overview confirms, `/install/run` writes `user_configuration.json` + `user_credentials.json` (0600) and execs `archinstall --silent`, log page polls output. `FURTKA_DRY_RUN=1` skips the exec for testing. +- [x] Forgejo runner live on Proxmox VM (`forge-runner-01`, Ubuntu 24.04) — docker-outside-of-docker with host-mode jobs for ISO builds, setup captured in [docs/runner-setup.md](docs/runner-setup.md) + [ops/forgejo-runner/](ops/forgejo-runner/) +- [x] **ISO-build in CI** — `.forgejo/workflows/build-iso.yml` runs `iso/build.sh` on every push to `main` and publishes the resulting `.iso` as the `furtka-iso` artifact (14 d retention). Push → green run → download → test. +- [ ] **Publish `26.0-alpha` Forgejo Release** — blocker is gone (end-to-end install now works on a VM), re-tag when we're happy with the installer copy. +- [x] **Walking-skeleton live ISO — end to end** — `iso/build.sh` produces a hybrid BIOS/UEFI Arch-based ISO. It boots in a Proxmox VM, DHCPs onto the LAN, shows a console welcome with `http://proksi.local:5000` (+ IP fallback), serves the Flask webinstaller, runs `archinstall --silent`, reboots the VM via a Reboot-now button, and the installed system logs in and runs `docker ps` without sudo. Build infra in [`iso/`](iso/). +- [x] **Drop loop/rom devices from drive list** — `webinstaller/drives.py` filters by `lsblk` `TYPE=disk`, so the live squashfs and CD-ROM no longer appear as install targets. Boot-USB filtering on bare metal is still TODO; see [iso/README.md](iso/README.md). +- [x] **Rebrand GRUB menu** — `iso/build.sh` rewrites "Arch Linux install medium" → "Furtka Live Installer" across GRUB, syslinux, and systemd-boot configs; default entry marked `(Recommended)`. +- [x] **Wizard: account form → drive picker → overview → archinstall** — S1 collects hostname/user/password/language with validation, S2 picks boot drive, overview confirms, `/install/run` writes `user_configuration.json` + `user_credentials.json` (0600) and execs `archinstall --silent` against its 4.x schema (`default_layout` disk_config + `!root-password` / `!password` sentinel keys + `custom_commands` for post-install group joins). Install log page polls a JSON endpoint and renders a phase-based progress bar with a collapsible raw log. `FURTKA_DRY_RUN=1` skips the real exec for testing. +- [x] **mDNS `proksi.local`** — hostname baked into the live ISO, avahi + nss-mdns in the package list, advertised as soon as network-online fires. The HTTPS + local-CA half of this milestone is still open below. - [ ] **Base OS post-install** — what Furtka actually looks like *after* the wizard writes config + reboots: Caddy + Authentik + app store. Robert's area. - [ ] Installer wizard screens S3–S7 — per-device purpose, network, domain, SSL, diagnostic. S5/S6 blocked on managed-gateway DNS infra not yet built. -- [ ] `https://proksi.local` via mDNS + local CA (currently only raw-IP HTTP) +- [ ] `https://proksi.local` with a local CA (today: plain HTTP at `http://proksi.local:5000`) - [ ] Caddy + Authentik wired into first-boot bootstrap - [ ] Managed gateway infrastructure — `ns1/ns2.furtka.org` + DNS-01 wildcard automation - [ ] First containerized service (Nextcloud?) with auto-SSO + auto-subdomain diff --git a/docs/runner-setup.md b/docs/runner-setup.md index bbc3172..f99c48a 100644 --- a/docs/runner-setup.md +++ b/docs/runner-setup.md @@ -1,10 +1,12 @@ # Forgejo Runner Setup -How to stand up a `forgejo-runner` so the CI workflow in `.forgejo/workflows/ci.yml` actually executes on every push. +How to stand up a `forgejo-runner` so the CI workflows under +[`.forgejo/workflows/`](../.forgejo/workflows/) — `ci.yml` (lint, +pytest, JSON & link checks) and `build-iso.yml` (produces the live +ISO as a downloadable artifact) — run on every push to `main`. -The runner is a long-running daemon that polls the Forgejo instance for queued jobs and runs them in Docker containers. - -A ready-to-use bootstrap script and compose file live under [`ops/forgejo-runner/`](../ops/forgejo-runner/). +Ready-to-use `compose.yml` and `config.yml` live in +[`ops/forgejo-runner/`](../ops/forgejo-runner/). ## Choosing a host @@ -14,33 +16,62 @@ A ready-to-use bootstrap script and compose file live under [`ops/forgejo-runner | **Home server / NAS** | Free; plenty of capacity | CI blocked if home network / power drops | | **Local dev machine** | Quick to set up, fast runs | CI only works while the machine is on | -Recommendation for now: **home server or a cheap VPS**. Don't use a laptop that suspends. +Recommendation: **home server or a cheap VPS**. Don't use a laptop that suspends. + +## Architecture at a glance + +The runner uses **docker-outside-of-docker (DooD)**: it mounts the host's +`/var/run/docker.sock` into itself and spawns job containers on the host +daemon. We went back and forth on this — the tempting alternative is a +docker-in-docker (DinD) sidecar for isolation — but DinD makes +`iso/build.sh` fail: `build.sh` does its own nested `docker run -v …` and +the path inside a DinD-hosted job isn't visible to host docker. DooD +trades some isolation for paths that line up everywhere. This runner VM +is single-purpose, so that trade is fine. + +One non-obvious piece: the runner's default internal data directory is +`/data`. Host-mode jobs (see the `self-hosted:host` label below) tell +host docker to bind-mount `/data/.cache/act/…/hostexecutor` — which is +the container's filesystem path, not the host's. The fix is to make +`/data` exist on the host too, pointing at the same files, via a symlink: + +```bash +sudo ln -s /home//forgejo-runner/data /data +``` + +This one line is what lets `-v /data/…:/work` resolve correctly. ## Install -Pick either the binary or the Docker container path. Docker is easier to upgrade. - -### Path A: Docker Compose (recommended) - -Copy `ops/forgejo-runner/compose.yml` and `ops/forgejo-runner/config.yml` from this repo to the host, e.g. into `~/forgejo-runner/` (compose file) and `~/forgejo-runner/data/` (config file). The runner talks to a sidecar Docker-in-Docker container via `tcp://docker-in-docker:2375`, so the host's own Docker socket is not exposed to jobs. - -If the host is a fresh Ubuntu VM, run `ops/forgejo-runner/bootstrap.sh` first to install Docker Engine + the Compose plugin from the official repo. - -### Path B: Binary - -Download the latest release from https://code.forgejo.org/forgejo/runner/releases and drop it somewhere in `$PATH`: +On a fresh Ubuntu VM: ```bash -wget https://code.forgejo.org/forgejo/runner/releases/download/v6.0.0/forgejo-runner-6.0.0-linux-amd64 -chmod +x forgejo-runner-6.0.0-linux-amd64 -sudo mv forgejo-runner-6.0.0-linux-amd64 /usr/local/bin/forgejo-runner +# Docker Engine + compose plugin (official repo) +./ops/forgejo-runner/bootstrap.sh + +# Node.js on the HOST is not required — the runner container installs +# it inside itself on startup. But host tools help for debugging. +``` + +Copy the reference `compose.yml` and `config.yml` to `~/forgejo-runner/` +and `~/forgejo-runner/data/` respectively. Create the `/data` symlink: + +```bash +mkdir -p ~/forgejo-runner/data +cp ops/forgejo-runner/compose.yml ~/forgejo-runner/compose.yml +cp ops/forgejo-runner/config.yml ~/forgejo-runner/data/config.yml +sudo ln -s "$HOME/forgejo-runner/data" /data ``` ## Register -1. In the Forgejo web UI: go to **Site Administration → Actions → Runners → Create new Runner**. Copy the registration token. (For a repo-scoped runner instead, use **Repo Settings → Actions → Runners**.) +1. In the Forgejo web UI: **Site Administration → Actions → Runners → + Create new Runner** (or **Repo Settings → Actions → Runners** for a + repo-scoped runner). Copy the registration token. -2. Register from the runner host by running the registration inside a one-shot container so the output lands in the mounted `data/` directory: +2. Register from the host by running the registration inside a one-shot + container so the resulting `.runner` file lands in the mounted + `data/` directory: ```bash cd ~/forgejo-runner @@ -49,47 +80,78 @@ sudo mv forgejo-runner-6.0.0-linux-amd64 /usr/local/bin/forgejo-runner --instance https://forgejo.sourcegate.online \ --token \ --name forge-runner-01 \ - --labels 'docker:docker://catthehacker/ubuntu:act-latest,ubuntu-latest:docker://catthehacker/ubuntu:act-latest,self-hosted:docker://catthehacker/ubuntu:act-latest' \ --no-interactive ``` - Labels *must* use the `:docker://` form — bare labels (`ubuntu-latest`) get stored as `ubuntu-latest:host`, which tells the runner to execute jobs directly inside the runner container (no Python, no git, nothing). `catthehacker/ubuntu:act-latest` is the common drop-in image with GitHub Actions tooling preinstalled. + Note: labels are configured in `config.yml`, not at registration + time — `config.yml` has `labels:` populated with the three we use + (`ubuntu-latest`, `docker`, `self-hosted`), each mapped to either + a container image or `:host` mode. 3. Start the daemon: `docker compose up -d`. -4. Verify the runner shows up as **Idle** in Forgejo's admin Runners page and the log prints `runner: forge-runner-01, ..., declared successfully`. +4. Verify in Forgejo admin → Actions → Runners that `forge-runner-01` + shows as **Idle**, and `docker logs forgejo-runner` prints + `runner: forge-runner-01, ..., declared successfully` along with + the installed `node` + `docker-cli` versions. + +## Two runtime modes + +The `config.yml` labels set up two job execution modes: + +- **`ubuntu-latest` / `docker` → `docker://catthehacker/ubuntu:act-latest`.** + The standard mode. Jobs run in a fresh `catthehacker/ubuntu:act-latest` + container. Good isolation, standard GHA-compatible image. Used by + `ci.yml` (ruff, pytest, JSON & link checks). + +- **`self-hosted` → `:host`.** Steps execute *directly* in the runner + container (no per-job wrapping container). Used by `build-iso.yml` + because `iso/build.sh` needs `docker run -v $REPO_ROOT:/work` to hit + a path host docker can resolve — wrapping in a job container + reintroduces the namespace mismatch. + +Because host-mode jobs run inside the runner container, that container +needs tools the jobs invoke — Node (for JS-based actions like +`actions/checkout@v4`), Git (already in the base image), and the Docker +CLI (for `iso/build.sh`). The `command:` in `compose.yml` apk-installs +nodejs + docker-cli before launching the daemon, so those tools are +always present after container start. ## First CI run -Push any commit; the Actions tab on the repo should show the workflow running. If nothing happens: +Push a commit to `main` — the Actions tab should show: -- Confirm the runner is online (Forgejo admin → Actions → Runners). -- Check the workflow has labels that match the runner (`runs-on: ubuntu-latest` needs a runner registered with that label). -- Check the runner logs: `docker logs forgejo-runner` or the systemd journal. +- `CI` workflow (`ci.yml`) running lint, tests, JSON validation, markdown + links. Green in ~30 s. +- `Build ISO` workflow (`build-iso.yml`) running `iso/build.sh` inside + the runner container. Takes ~5 min (pacstrap + mkarchiso). The + resulting `.iso` lands as a `furtka-iso` artifact on the run page, + retained 14 days. -## Systemd unit (for the binary path) +If the workflow queues forever, check: -```ini -[Unit] -Description=Forgejo Actions Runner -After=docker.service -Requires=docker.service +- Runner online in Forgejo admin. +- `docker logs forgejo-runner` for errors. +- The workflow's `runs-on:` matches a label the runner advertises. -[Service] -ExecStart=/usr/local/bin/forgejo-runner daemon -WorkingDirectory=/var/lib/forgejo-runner -User=forgejo-runner -Restart=on-failure +## Artifact compatibility note -[Install] -WantedBy=multi-user.target -``` - -Save as `/etc/systemd/system/forgejo-runner.service`, then `sudo systemctl enable --now forgejo-runner`. +Forgejo's Actions API is GHES-compatible (not full GHA), so use +`actions/upload-artifact@v3` — **v4+ fails with +`GHESNotSupportedError`** because it needs the newer `@actions/artifact` +protocol Forgejo hasn't implemented yet. ## Security notes -- Jobs run inside a Docker-in-Docker sidecar, not against the host's Docker socket. Still, DinD runs privileged — give the runner its own VM, not a shared host. -- Registration tokens are one-shot; a stolen token can't re-register after the runner is live. -- Prefer repo-scoped runners over instance-wide if you're sharing the runner with other repos you don't control. -- Ubuntu's default systemd-resolved makes the host's stub resolver (`127.0.0.53`) inherit a LAN DNS server that Docker containers may not be able to reach. If container DNS fails, set explicit upstream DNS in `/etc/docker/daemon.json` (e.g. `{"dns": ["1.1.1.1", "8.8.8.8"]}`) and `sudo systemctl restart docker`. +- DooD gives jobs full access to the host's docker daemon — they can + spawn arbitrary containers, including `--privileged` ones. Keep the + runner VM dedicated to CI; don't run other user workloads on it. +- The runner container itself runs as root (`user: "0:0"`). This is + acceptable because the whole VM is purpose-built, but it's a bigger + footgun than the standard non-root runner image default. +- Registration tokens are one-shot; once a runner is live, the token + can't re-register. +- Ubuntu's `systemd-resolved` stub resolver (`127.0.0.53`) sometimes + leaks LAN-only DNS servers that containers can't reach. If container + DNS fails, set explicit upstream DNS in `/etc/docker/daemon.json` + (e.g. `{"dns": ["1.1.1.1", "8.8.8.8"]}`) and restart docker. diff --git a/ops/forgejo-runner/compose.yml b/ops/forgejo-runner/compose.yml index faf6c66..7f33e71 100644 --- a/ops/forgejo-runner/compose.yml +++ b/ops/forgejo-runner/compose.yml @@ -3,20 +3,22 @@ services: image: code.forgejo.org/forgejo/runner:6 container_name: forgejo-runner restart: unless-stopped + # Running as root so (1) apk can install nodejs + docker-cli at + # startup (needed by host-mode jobs that execute JS actions and by + # `iso/build.sh` which shells out to `docker run`), and (2) access + # to the host docker socket doesn't require group juggling. + user: "0:0" environment: - - DOCKER_HOST=tcp://docker-in-docker:2375 + - DOCKER_HOST=unix:///var/run/docker.sock - CONFIG_FILE=/data/config.yml + # Mount at /data so the container's data path matches the host path + # /data (which is a symlink to this directory — see runner-setup.md). + # When a host-mode job does `docker run -v /data/.cache/act/…:/work`, + # host docker resolves the source via the symlink instead of failing + # with "no such file or directory". volumes: - ./data:/data - depends_on: - - docker-in-docker - command: /bin/sh -c "sleep 5; forgejo-runner daemon --config /data/config.yml" - - docker-in-docker: - image: docker:dind - container_name: forgejo-runner-dind - restart: unless-stopped - privileged: true - environment: - - DOCKER_TLS_CERTDIR= - command: dockerd -H tcp://0.0.0.0:2375 --tls=false + - /var/run/docker.sock:/var/run/docker.sock + command: >- + /bin/sh -c "apk add --no-cache nodejs docker-cli && sleep 5 && + forgejo-runner daemon --config /data/config.yml" diff --git a/ops/forgejo-runner/config.yml b/ops/forgejo-runner/config.yml index 849ed2c..ff3bbed 100644 --- a/ops/forgejo-runner/config.yml +++ b/ops/forgejo-runner/config.yml @@ -10,7 +10,16 @@ runner: fetch_timeout: 5s fetch_interval: 2s report_interval: 1s - labels: [] + # Label mappings decide how each `runs-on:` value is executed. The + # `:host` suffix means "run steps directly in the runner container" + # (no wrapping job container). build-iso uses `runs-on: self-hosted` + # because its `docker run -v $REPO_ROOT:/work` needs host-visible + # paths — nested containers would put the workspace in a namespace + # host docker can't see. + labels: + - "ubuntu-latest:docker://catthehacker/ubuntu:act-latest" + - "docker:docker://catthehacker/ubuntu:act-latest" + - "self-hosted:host" cache: enabled: true @@ -22,8 +31,14 @@ cache: container: network: "" privileged: false - valid_volumes: [] - docker_host: "tcp://docker-in-docker:2375" + # Docker-outside-of-docker: runner and all job containers share the + # host's docker daemon via the unix socket. valid_volumes whitelists + # the socket so it can be mounted into job containers (the runner + # handles this automatically — don't also mount it from a workflow + # or you'll get "duplicate mount point"). + valid_volumes: + - "/var/run/docker.sock" + docker_host: "unix:///var/run/docker.sock" force_pull: false host: