Skip to content

CI: Split build.yml into independent workflows for faster reruns and feedback #19671

@chargome

Description

@chargome

Problem

We run ~660K GitHub Actions minutes/month, with almost 100% in a single build.yml containing 200+ jobs across matrix strategies. When a job fails (e.g., a flaky E2E test), you cannot re-run it until all 200+ jobs reach a terminal state — often a 25+ minute wait doing nothing.

Goal

Split build.yml into independent workflows so that:

  1. Each test suite completes and can be re-run independently
  2. A flaky E2E test can be re-run without waiting for browser/integration/unit tests to finish
  3. Overall CI feedback loop is faster

Architecture

Why not workflow_run?

The original plan proposed using workflow_run to chain test workflows after the build. This won't work because:

  • Status checks from workflow_run workflows don't attach to the PR (they run in default branch context)
  • Can't re-run them from the PR checks UI
  • Adds async delay

Approach: Independent workflows + actions/cache

Each test workflow triggers independently via the same triggers as build.yml today (push, pull_request, merge_group). Build output is shared cross-workflow via actions/cache keyed by commit SHA.

build-and-lint.yml (trigger: push/PR/merge_group)
  ├─ build → actions/cache/save key="build-output-{SHA}"
  ├─ lint, size_check, circular_dep, lockfile, format
  └─ Build & Lint Gate

ci-unit-tests.yml (trigger: push/PR/merge_group)
  ├─ restore_build → actions/cache/restore key="build-output-{SHA}" (or Nx-cached fallback)
  ├─ browser_unit, node_unit (×4 Node), bun, deno
  └─ Unit Tests Gate

ci-browser-tests.yml (trigger: push/PR/merge_group)
  ├─ restore_build
  ├─ playwright (×13 bundles), loader (×7)
  └─ Browser Tests Gate

ci-integration-tests.yml (trigger: push/PR/merge_group)
  ├─ restore_build
  ├─ node_integration (×4), node_core (×4), cloudflare, remix (×4)
  └─ Integration Tests Gate

ci-e2e.yml (trigger: push/PR/merge_group)
  ├─ restore_build
  ├─ e2e_prepare → e2e_tests (~20 apps)
  └─ E2E Tests Gate

How restore_build works

Each test workflow includes a restore_build job:

  1. actions/cache/restore with key build-output-{SHA}
  2. If cache hit → done (instant, build workflow already saved it)
  3. If cache missyarn build with Nx cache fallback (~2 min with warm cache from develop)
  4. actions/upload-artifact for downstream jobs within the same workflow

On re-runs, the build cache always exists from the original run. On first run, there may be a race where test workflows start before the build workflow finishes — the Nx-cached fallback handles this gracefully.

Nx cache sharing

The .nxcache directory is persisted via actions/cache keyed by nx-Linux-{branch}-{SHA}. GitHub's cache scoping ensures feature branches can only read from their own branch and develop (the base branch) — no cross-contamination between PRs. With a warm Nx cache from develop, a yarn build completes in ~2 min (only changed packages rebuild).

Transition plan (no big bang)

Each workflow is extracted one at a time with a dual-running stabilization period:

  1. Add actions/cache/save to build job — saves build output alongside existing upload-artifact. No behavior change. (CI: Build output caching + cross-workflow restore-cache action #19677)
  2. Create new test workflow — runs in parallel with build.yml (tests run in both places). Verify the new workflow is stable.
  3. Add new gate job to rulesets — e.g., E2E Tests Passed added as required status check across all 4 rulesets (develop, master, v[89], v7). Requires admin.
  4. Remove jobs from build.yml — remove the corresponding test jobs and update job_required_jobs_passed needs list.
  5. Repeat for next workflow.

Extraction order:

  1. E2E tests (CI: Extract E2E tests into independent workflow #19673) — most flaky, longest running
  2. Browser integration tests (CI: Extract browser integration tests into independent workflow #19674)
  3. Server integration tests (CI: Extract server integration tests into independent workflow #19675)
  4. Unit tests (CI: Extract unit tests into independent workflow #19676)

Ruleset changes required

Each extracted workflow adds a required status check. All 4 rulesets (develop, master, v[89], v7) need updating:

Workflow New required check
ci-e2e.yml E2E Tests Passed
ci-browser-tests.yml Browser Tests Passed
ci-integration-tests.yml Integration Tests Passed
ci-unit-tests.yml Unit Tests Passed

After all extractions, All required jobs passed or were skipped can be renamed/repurposed to only gate build & lint jobs.

Metadata

Metadata

Assignees

Labels

javascriptPull requests that update javascript code
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions