The previous run still hit ENOSPC, this time during `npm ci` while
extracting node_modules. The earlier cleanup left the just-failed act
workspace on disk (mtime < 10min threshold), and its half-extracted
node_modules took the runner past the limit before `npm ci` finished.
- Drop the mtime threshold for act workspaces; instead detect the
currently-running job's directory and rm -rf every sibling. The
current job is preserved by path comparison so we never delete files
the running step needs.
- Blow away ~/.npm/_cacache, ~/.npm/_logs, ~/.cache/setup-node entirely.
`npm ci` re-populates what it needs and the cache is the easiest GB
to reclaim on a tight runner.
- Tighten actions-runner workspace retention from 24h to 30min.
- Drop the docker prune --filter; use `docker system prune -af --volumes`
to reclaim builder cache and volumes too.
- Hard-fail with a clear error if <3.5GB free after cleanup, instead of
letting `npm ci` half-write an unusable node_modules and failing
obscurely. Codebase needs ~3GB for hoisted deps.
A previous deploy failed at the vite chunk-writing stage with
"ENOSPC: no space left on device". The cleanup step ran at the start
of the job but left enough stale data behind that the runner filled up
before `npm run build` could finish.
- Drop the act workspace retention from 60min to 10min. Closely-spaced
pushes used to keep multiple stale jobs around; 10min still preserves
any currently-running job because its mtime keeps advancing.
- Drop _work / setup-node / npm cacache retention from 24h to 60min.
- Drop the `until=24h` filter on docker prune so dangling images,
containers, and builder cache get reclaimed every run.
- Add a second "Ensure free space before build" guard right before the
Build step. If <3GB is free, aggressively prune act caches, npm
cacache, and docker volumes before vite starts writing chunks.
Previous attempt deleted the in-flight act workspace and broke
actions/checkout. Restore the safe >60min sweep for ~/.cache/act
while keeping npm/docker/tmp/log cleanup aggressive.
- wipe all stale act workspaces (keep only current run's dir)
- clear ~/.npm/_cacache and setup-node cache fully
- docker system prune -af --volumes
- apt/yum cache clean, journald vacuum to 100M
- /tmp older than 30min instead of 120min
Add a pre-checkout cleanup step that removes stale act caches,
old setup-node/npm cache entries, dangling docker resources, and
leftover /tmp files older than 2h. Prevents recurring ENOSPC
failures on the EC2 self-hosted runner.
Note: the very first run after this change may still fail if the
runner disk was already at 100% beforehand; one-time manual cleanup
on the host is required to bootstrap.
ALB sends /api/* to an unreachable backend target group (502 on apex).
Use VITE_API_PREFIX=/apnew with nginx proxy to backend-1 until the listener rule is removed.
Co-authored-by: Cursor <cursoragent@cursor.com>