Three interlocking issues that made 26.11/26.12 effectively
un-upgradable from pre-auth versions without manual pacman +
symlink surgery. Caught while SSH-testing the .196 VM which landed
on a rollback loop after every Update-now click.
1. auth.py imported werkzeug.security, but the target system runs
core as bare system Python — neither flask nor werkzeug are
pip-installed. Fresh 26.11+ boxes died on import. Replaced with
a 50-line stdlib `furtka/passwd.py` using hashlib.pbkdf2_hmac
for new hashes and parsing werkzeug's `scrypt:N:r:p$salt$hex`
format for backward-read so existing users.json survives.
2. updater._health_check pinged /api/apps expecting 200. Post-
auth, /api/apps returns 401 for unauth requests → HTTPError
caught as URLError → retry loop → 30s timeout → rollback. Now
any 2xx-4xx counts as "server alive"; only 5xx / connection
errors fail. Server responding at all is proof it came back up.
3. _do_install released the fcntl lock between sync pre-validation
and the systemd-run dispatch. A second POST could slip in,
pass the lock check, return 202, and leave its install-bg child
to die silently on the in-child lock. Now the API also reads
install-state.json and refuses 409 on non-terminal stages —
the state file is the reliable signal, the fcntl lock is
defence in depth.
Test coverage:
- tests/test_passwd.py (new, 6 cases): roundtrip, salt uniqueness,
format shape, werkzeug scrypt backward-compat against a real
hash captured from the .196 box, malformed + non-string
rejection.
- tests/test_updater.py: +3 cases for _health_check — 4xx=healthy,
5xx=unhealthy, URLError retry loop.
- tests/test_api.py: +2 cases for install 409 on non-terminal
state + 202 after terminal.
All 267 tests green, ruff check + format clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>