ci(deploy): wipe every non-current act workspace before npm ci
Some checks failed
Deploy to Frontend Servers / deploy (push) Failing after 2s
Some checks failed
Deploy to Frontend Servers / deploy (push) Failing after 2s
The previous run still hit ENOSPC, this time during `npm ci` while extracting node_modules. The earlier cleanup left the just-failed act workspace on disk (mtime < 10min threshold), and its half-extracted node_modules took the runner past the limit before `npm ci` finished. - Drop the mtime threshold for act workspaces; instead detect the currently-running job's directory and rm -rf every sibling. The current job is preserved by path comparison so we never delete files the running step needs. - Blow away ~/.npm/_cacache, ~/.npm/_logs, ~/.cache/setup-node entirely. `npm ci` re-populates what it needs and the cache is the easiest GB to reclaim on a tight runner. - Tighten actions-runner workspace retention from 24h to 30min. - Drop the docker prune --filter; use `docker system prune -af --volumes` to reclaim builder cache and volumes too. - Hard-fail with a clear error if <3.5GB free after cleanup, instead of letting `npm ci` half-write an unusable node_modules and failing obscurely. Codebase needs ~3GB for hoisted deps.
This commit is contained in:
@@ -15,27 +15,41 @@ jobs:
|
|||||||
set +e
|
set +e
|
||||||
echo "=== Disk before cleanup ==="
|
echo "=== Disk before cleanup ==="
|
||||||
df -h
|
df -h
|
||||||
# Stale act runner workspaces. Closely-spaced pushes (e.g. 3 commits
|
# Identify the directory holding the currently-running act job so we
|
||||||
# within 30min) used to leak workspaces because the old 60min
|
# never touch it. Everything else under ~/.cache/act/ is fair game.
|
||||||
# threshold left them in place. 10min is tight but still keeps any
|
CURRENT_ACT_DIR=""
|
||||||
# currently-running job's dir (its mtime updates as it writes).
|
if [ -n "${ACT_TOOLCACHE_PATH:-}" ]; then
|
||||||
|
CURRENT_ACT_DIR=$(dirname "${ACT_TOOLCACHE_PATH}" 2>/dev/null)
|
||||||
|
fi
|
||||||
|
if [ -z "$CURRENT_ACT_DIR" ]; then
|
||||||
|
CURRENT_ACT_DIR=$(pwd | sed -n 's|\(.*/.cache/act/[^/]*\).*|\1|p')
|
||||||
|
fi
|
||||||
|
echo "Current act dir (preserved): ${CURRENT_ACT_DIR:-<unknown>}"
|
||||||
|
# Wipe every other act workspace immediately (no mtime threshold).
|
||||||
|
# The old 10min threshold still left the previous failed job around,
|
||||||
|
# which then ate the disk before `npm ci` could finish.
|
||||||
if [ -d "$HOME/.cache/act" ]; then
|
if [ -d "$HOME/.cache/act" ]; then
|
||||||
du -sh "$HOME/.cache/act" 2>/dev/null
|
du -sh "$HOME/.cache/act" 2>/dev/null
|
||||||
find "$HOME/.cache/act" -mindepth 1 -maxdepth 1 -type d -mmin +10 -exec rm -rf {} + 2>/dev/null
|
for d in "$HOME/.cache/act"/*/; do
|
||||||
|
[ -d "$d" ] || continue
|
||||||
|
case "$d" in
|
||||||
|
"$CURRENT_ACT_DIR"/*|"$CURRENT_ACT_DIR/") echo "skip current: $d" ;;
|
||||||
|
*) rm -rf "$d" && echo "removed: $d" ;;
|
||||||
|
esac
|
||||||
|
done
|
||||||
fi
|
fi
|
||||||
# Stale runner workspaces and node setup/npm caches: 60min is plenty
|
# npm cache + setup-node cache: blow them away entirely. `npm ci`
|
||||||
# since each job re-fetches deps via `npm ci`.
|
# re-populates what it needs; the cache is a nice-to-have, not a
|
||||||
for dir in "$HOME/actions-runner/_work" "$HOME/.cache/setup-node" "$HOME/.npm/_cacache"; do
|
# requirement, and on a tight runner it's the easiest GB to reclaim.
|
||||||
if [ -d "$dir" ]; then
|
rm -rf "$HOME/.npm/_cacache" "$HOME/.npm/_logs" 2>/dev/null
|
||||||
find "$dir" -mindepth 1 -maxdepth 2 -mmin +60 -exec rm -rf {} + 2>/dev/null
|
rm -rf "$HOME/.cache/setup-node" 2>/dev/null
|
||||||
fi
|
# Stale actions-runner workspaces older than 30min.
|
||||||
done
|
if [ -d "$HOME/actions-runner/_work" ]; then
|
||||||
# Docker leftovers: drop the `until=24h` filter so any dangling images
|
find "$HOME/actions-runner/_work" -mindepth 1 -maxdepth 2 -mmin +30 -exec rm -rf {} + 2>/dev/null
|
||||||
# / containers / builder cache get reclaimed every run.
|
fi
|
||||||
|
# Docker: drop everything reclaimable (no `until` filter).
|
||||||
if command -v docker >/dev/null 2>&1; then
|
if command -v docker >/dev/null 2>&1; then
|
||||||
docker image prune -af 2>/dev/null
|
docker system prune -af --volumes 2>/dev/null
|
||||||
docker container prune -f 2>/dev/null
|
|
||||||
docker builder prune -af 2>/dev/null
|
|
||||||
fi
|
fi
|
||||||
# Stale /tmp files older than 2h, keep currently-running runner files.
|
# Stale /tmp files older than 2h, keep currently-running runner files.
|
||||||
find /tmp -mindepth 1 -maxdepth 1 -mmin +120 \
|
find /tmp -mindepth 1 -maxdepth 1 -mmin +120 \
|
||||||
@@ -43,6 +57,17 @@ jobs:
|
|||||||
-exec rm -rf {} + 2>/dev/null
|
-exec rm -rf {} + 2>/dev/null
|
||||||
echo "=== Disk after cleanup ==="
|
echo "=== Disk after cleanup ==="
|
||||||
df -h
|
df -h
|
||||||
|
# Hard fail early if there still isn't enough room for `npm ci`,
|
||||||
|
# which needs ~3GB for this codebase's hoisted node_modules.
|
||||||
|
AVAIL_MB=$(df -Pm . | awk 'NR==2 {print $4}')
|
||||||
|
echo "Available on workspace volume: ${AVAIL_MB} MB"
|
||||||
|
if [ "${AVAIL_MB:-0}" -lt 3500 ]; then
|
||||||
|
echo "::error::Less than 3.5GB free after cleanup (${AVAIL_MB}MB)."
|
||||||
|
echo "The runner's EBS volume is too small for this codebase \
|
||||||
|
— ask devops to expand it. Failing fast so the next steps don't \
|
||||||
|
half-write an unusable build."
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
exit 0
|
exit 0
|
||||||
|
|
||||||
- name: Checkout code
|
- name: Checkout code
|
||||||
|
|||||||
Reference in New Issue
Block a user