-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
Problem
We run ~660K GitHub Actions minutes/month, with almost 100% in a single build.yml containing 200+ jobs across matrix strategies. When a job fails (e.g., a flaky E2E test), you cannot re-run it until all 200+ jobs reach a terminal state — often a 25+ minute wait doing nothing.
Goal
Split build.yml into independent workflows so that:
- Each test suite completes and can be re-run independently
- A flaky E2E test can be re-run without waiting for browser/integration/unit tests to finish
- Overall CI feedback loop is faster
Architecture
Why not workflow_run?
The original plan proposed using workflow_run to chain test workflows after the build. This won't work because:
- Status checks from
workflow_runworkflows don't attach to the PR (they run in default branch context) - Can't re-run them from the PR checks UI
- Adds async delay
Approach: Independent workflows + actions/cache
Each test workflow triggers independently via the same triggers as build.yml today (push, pull_request, merge_group). Build output is shared cross-workflow via actions/cache keyed by commit SHA.
build-and-lint.yml (trigger: push/PR/merge_group)
├─ build → actions/cache/save key="build-output-{SHA}"
├─ lint, size_check, circular_dep, lockfile, format
└─ Build & Lint Gate
ci-unit-tests.yml (trigger: push/PR/merge_group)
├─ restore_build → actions/cache/restore key="build-output-{SHA}" (or Nx-cached fallback)
├─ browser_unit, node_unit (×4 Node), bun, deno
└─ Unit Tests Gate
ci-browser-tests.yml (trigger: push/PR/merge_group)
├─ restore_build
├─ playwright (×13 bundles), loader (×7)
└─ Browser Tests Gate
ci-integration-tests.yml (trigger: push/PR/merge_group)
├─ restore_build
├─ node_integration (×4), node_core (×4), cloudflare, remix (×4)
└─ Integration Tests Gate
ci-e2e.yml (trigger: push/PR/merge_group)
├─ restore_build
├─ e2e_prepare → e2e_tests (~20 apps)
└─ E2E Tests Gate
How restore_build works
Each test workflow includes a restore_build job:
actions/cache/restorewith keybuild-output-{SHA}- If cache hit → done (instant, build workflow already saved it)
- If cache miss →
yarn buildwith Nx cache fallback (~2 min with warm cache from develop) actions/upload-artifactfor downstream jobs within the same workflow
On re-runs, the build cache always exists from the original run. On first run, there may be a race where test workflows start before the build workflow finishes — the Nx-cached fallback handles this gracefully.
Nx cache sharing
The .nxcache directory is persisted via actions/cache keyed by nx-Linux-{branch}-{SHA}. GitHub's cache scoping ensures feature branches can only read from their own branch and develop (the base branch) — no cross-contamination between PRs. With a warm Nx cache from develop, a yarn build completes in ~2 min (only changed packages rebuild).
Transition plan (no big bang)
Each workflow is extracted one at a time with a dual-running stabilization period:
- Add
actions/cache/saveto build job — saves build output alongside existingupload-artifact. No behavior change. (CI: Build output caching + cross-workflow restore-cache action #19677) - Create new test workflow — runs in parallel with
build.yml(tests run in both places). Verify the new workflow is stable. - Add new gate job to rulesets — e.g.,
E2E Tests Passedadded as required status check across all 4 rulesets (develop, master, v[89], v7). Requires admin. - Remove jobs from
build.yml— remove the corresponding test jobs and updatejob_required_jobs_passedneeds list. - Repeat for next workflow.
Extraction order:
- E2E tests (CI: Extract E2E tests into independent workflow #19673) — most flaky, longest running
- Browser integration tests (CI: Extract browser integration tests into independent workflow #19674)
- Server integration tests (CI: Extract server integration tests into independent workflow #19675)
- Unit tests (CI: Extract unit tests into independent workflow #19676)
Ruleset changes required
Each extracted workflow adds a required status check. All 4 rulesets (develop, master, v[89], v7) need updating:
| Workflow | New required check |
|---|---|
| ci-e2e.yml | E2E Tests Passed |
| ci-browser-tests.yml | Browser Tests Passed |
| ci-integration-tests.yml | Integration Tests Passed |
| ci-unit-tests.yml | Unit Tests Passed |
After all extractions, All required jobs passed or were skipped can be renamed/repurposed to only gate build & lint jobs.