Skip to content

acceptance: shard acceptance tests across 2-8 parallel CI jobs#5435

Draft
denik wants to merge 14 commits into
mainfrom
denik/test-sharding
Draft

acceptance: shard acceptance tests across 2-8 parallel CI jobs#5435
denik wants to merge 14 commits into
mainfrom
denik/test-sharding

Conversation

@denik

@denik denik commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Adds SHARD_INDEX / SHARD_TOTAL env-var support to getTests() in acceptance/acceptance_test.go: each CI job runs only its slice of the 854 acceptance tests (~214 each). The list is already sorted alphabetically, so the modulo split is deterministic and stable across runs.
  • Adds shard_index: [0, 1, 2, 3] to the CI matrix in push.yml, passing SHARD_INDEX and SHARD_TOTAL: 4 to the test step.
  • Job count: 6 → 24 normally; 2 → 8 in the merge queue (linux-only). The test-result aggregator and testmask gating require no changes — GitHub Actions waits for all matrix combinations of a job automatically.

Local runs are unaffected: without SHARD_TOTAL set (or with SHARD_TOTAL=1), getTests() returns the full list as before.

Test plan

  • Verify CI jobs are named correctly (shard 0 .. shard 3) and each runs ~214 tests
  • Verify test-result waits for all 24 (or 8 in merge queue) jobs before passing

This pull request was AI-assisted by Isaac.

Adds SHARD_INDEX / SHARD_TOTAL env-var support to getTests() so each CI
job runs only its share of the 854 acceptance tests (~214 each). The
sorted test list ensures the split is deterministic and stable.

The CI matrix gains a shard_index dimension [0,1,2,3], turning 6 test
jobs into 24 (and 2 → 8 in the merge queue). The test-result aggregator
and testmask gating are unaffected — GitHub Actions waits for all matrix
combinations automatically.

Co-authored-by: Isaac
@denik denik temporarily deployed to test-trigger-is June 4, 2026 09:43 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 4, 2026 09:43 — with GitHub Actions Inactive
@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Waiting for approval

Based on git history, these people are best suited to review:

  • @pietern -- recent work in .github/workflows/, acceptance/

Eligible reviewers: @andrewnester, @anton-107, @renaudhartert-db, @shreyas-goenka, @simonfaltum

Suggestions based on git history. See OWNERS for ownership rules.

@denik denik marked this pull request as draft June 4, 2026 09:45
The sharded acceptance matrix re-ran the full unit suite on all 24 jobs
(4 shards x 3 OS x 2 deployment), and all of them tried to save the Go
cache under the same key on main — only the first writer wins, so the
rest wasted time.

Split into:
- test-unit: one job per OS (no deployment/shard dimension), runs
  `task test-unit`, and is the sole writer of the shared "test" cache.
- test (acc): runs `task test-acc` only, restores the "test" cache
  (save-cache=false so the many shard/deployment instances don't
  collide on the key).

setup-build-environment gains a save-cache input (default true, so the
test-exp-* / test-pipelines jobs with unique keys keep saving as before)
that gates the on-main cache save.

test-result now also waits on test-unit.

Co-authored-by: Isaac
@denik denik temporarily deployed to test-trigger-is June 4, 2026 09:55 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 4, 2026 09:55 — with GitHub Actions Inactive
TestInprocessMode calls testAccept with a specific singleTest
("selftest/basic"). The shard filter lived in getTests(), so it ran
before singleTest selection and could strip the requested test out of
the shard, failing with "did not match any tests" on every shard that
didn't own selftest/basic.

Move the shard filter into a shardTests helper applied in testAccept
only when singleTest == "", leaving named-test selection unsharded.

Co-authored-by: Isaac
@denik denik temporarily deployed to test-trigger-is June 4, 2026 10:10 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 4, 2026 10:10 — with GitHub Actions Inactive
@eng-dev-ecosystem-bot

eng-dev-ecosystem-bot commented Jun 4, 2026

Copy link
Copy Markdown
Collaborator

Commit: 6a010b7

Run: 27146877916

Env 🟨​KNOWN 💚​RECOVERED 🙈​SKIP ✅​pass 🙈​skip Time
💚​ aws linux 7 15 261 924 7:16
🟨​ aws windows 7 15 263 922 26:53
💚​ aws-ucws linux 7 15 357 838 7:55
💚​ aws-ucws windows 7 15 359 836 13:23
💚​ azure linux 1 17 264 922 6:52
💚​ azure windows 1 17 266 920 11:11
💚​ azure-ucws linux 1 17 362 834 8:10
💚​ azure-ucws windows 1 17 364 832 10:47
💚​ gcp linux 1 17 260 925 8:50
💚​ gcp windows 1 17 262 923 10:43
22 interesting tests: 15 SKIP, 7 KNOWN
Test Name aws linux aws windows aws-ucws linux aws-ucws windows azure linux azure windows azure-ucws linux azure-ucws windows gcp linux gcp windows
🟨​ TestAccept 💚​R 🟨​K 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R 💚​R
🙈​ TestAccept/bundle/invariant/no_drift 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/permissions 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions 💚​R 🟨​K 💚​R 💚​R 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=direct 💚​R 🟨​K 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/with_permissions/DATABRICKS_BUNDLE_ENGINE=terraform 💚​R 🟨​K 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions 💚​R 🟨​K 💚​R 💚​R 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=direct 💚​R 🟨​K 💚​R 💚​R
🟨​ TestAccept/bundle/resources/permissions/jobs/destroy_without_mgmtperms/without_permissions/DATABRICKS_BUNDLE_ENGINE=terraform 💚​R 🟨​K 💚​R 💚​R
🙈​ TestAccept/bundle/resources/postgres_branches/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/recreate 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/replace_existing 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/update_protected 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_branches/without_branch_id 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_endpoints/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_endpoints/recreate 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/postgres_projects/update_display_name 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/synced_database_tables/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_endpoints/drift/recreated_same_name 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_indexes/basic 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/bundle/resources/vector_search_indexes/grants/select 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
🙈​ TestAccept/ssh/connection 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S 🙈​S
Top 29 slowest tests (at least 2 minutes):
duration env testname
6:43 azure windows TestAccept
5:53 aws-ucws windows TestAccept
5:21 azure-ucws windows TestAccept
4:58 gcp windows TestAccept
4:46 gcp linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
4:42 gcp windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
4:19 gcp linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
4:10 gcp windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:39 azure-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:38 aws-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:25 aws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:20 azure-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:20 azure windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:12 aws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
3:06 aws-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
3:02 gcp linux TestAccept
3:02 azure linux TestAccept
2:59 aws linux TestAccept
2:57 aws-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:53 azure-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:53 azure-ucws linux TestAccept
2:47 aws-ucws linux TestAccept
2:41 aws-ucws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:41 azure linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:40 aws linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct
2:39 aws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:34 azure linux TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:34 azure-ucws windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=terraform
2:24 azure windows TestAccept/bundle/resources/apps/inline_config/DATABRICKS_BUNDLE_ENGINE=direct

@denik denik temporarily deployed to test-trigger-is June 4, 2026 11:08 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 4, 2026 11:08 — with GitHub Actions Inactive
A static cross-product matrix forces one shard count for every
(os, engine). Windows is the long pole (TASK_CONCURRENCY=1 serializes
within a job) while the direct engine is fast, so a uniform count
over- or under-shards most combinations.

Generate the acceptance shard matrix in testmask as an explicit
include-list and consume it via fromJSON. Shard counts:

  windows/terraform: 8   windows/direct: 8
  linux/terraform:   4   linux/direct:   2
  macos/terraform:   4   macos/direct:   2

merge_group still runs Linux only (6 jobs). PR/push runs 28 acc jobs.

Co-authored-by: Isaac
@denik denik temporarily deployed to test-trigger-is June 4, 2026 12:48 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 4, 2026 12:48 — with GitHub Actions Inactive
The gotestsum JSON upload is debug-only timing telemetry, yet a transient
GitHub artifact-service error during finalization failed an otherwise-
passing windows test-acc shard. Mark both upload steps continue-on-error
so infra hiccups on a debug artifact never block the merge.

Co-authored-by: Isaac
@denik denik temporarily deployed to test-trigger-is June 5, 2026 14:57 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 5, 2026 14:57 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 5, 2026 15:27 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 5, 2026 15:27 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 5, 2026 15:56 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 5, 2026 15:56 — with GitHub Actions Inactive
Data collected across 5 CI runs at ×1/×2/×4/×8:

  linux/terraform  ×4  (~6m   vs 13m unsharded)
  linux/direct     ×2  (~5m   vs  6m unsharded)  ×4 saves only 30s more
  macos/terraform  ×4  (~9m   vs 17m unsharded)
  macos/direct     ×2  (~7.5m vs  9m unsharded)  ×4 saves only 8s more
  windows/terraform ×8 (~15m  vs 35m unsharded)  keeps improving to ×8
  windows/direct   ×4  (~14m  vs 23m unsharded)  ×8 saves only 12s (overhead-bound)

Total acc jobs: 22 (PR/push), 6 (merge queue, linux only).

Co-authored-by: Isaac
@denik denik temporarily deployed to test-trigger-is June 5, 2026 17:47 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 5, 2026 17:47 — with GitHub Actions Inactive
denik added 5 commits June 8, 2026 17:01
Co-authored-by: Isaac
Co-authored-by: Isaac
Co-authored-by: Isaac
Co-authored-by: Isaac
Co-authored-by: Isaac
@denik denik temporarily deployed to test-trigger-is June 8, 2026 15:02 — with GitHub Actions Inactive
@denik denik temporarily deployed to test-trigger-is June 8, 2026 15:02 — with GitHub Actions Inactive
@denik denik changed the title acceptance: shard acceptance tests across 4 parallel CI jobs acceptance: shard acceptance tests across 2-8 parallel CI jobs Jun 9, 2026
@denik

denik commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

Claude analysis:


Final run — all 22 jobs passed ✓

 ┌────────────────────┬────────┬───────────────┬────────┬──────────┬─────────┐
  │    (os, engine)    │ shards │   acc range   │  unit  │ critical │ vs main │
  ├────────────────────┼────────┼───────────────┼────────┼──────────┼─────────┤
  │ linux, terraform   │ ×4     │ 5m16s–6m01s   │ 3m13s  │ 6m01s    │ 2.2×    │
  ├────────────────────┼────────┼───────────────┼────────┼──────────┼─────────┤
  │ macos, direct      │ ×2     │ 6m23s–7m16s   │ 5m36s  │ 7m16s    │ 1.2×    │
  ├────────────────────┼────────┼───────────────┼────────┼──────────┼─────────┤
  │ windows, terraform │ ×8     │ 12m43s–14m18s │ 10m24s │ 14m18s   │ 2.5×    │
  ├────────────────────┼────────┼───────────────┼────────┼──────────┼─────────┤
  │ windows, direct    │ ×4     │ 13m13s–14m18s │ 10m24s │ 14m18s   │ 1.6×    │
  └────────────────────┴────────┴───────────────┴────────┴──────────┴─────────┘

  ---
  Are the shard counts right?

  linux/terraform ×4 — good. Acc-bound at 6m, halving to ×2 would cost 2.5 min, doubling to ×8 has no data but overhead floor is near.

  linux/direct ×2 — borderline. Halving to ×1 costs only 53s and doubling to ×4 saves only 30s. Reasonable to keep ×2 for consistency, but ×1 is defensible here.

  macos/terraform ×4 — good. Acc-bound at 8m50s, ×2 costs ~3.5 min — sharding clearly earns its keep.

  macos/direct ×2 — borderline. Same shape as linux/direct: 1m25s gap to ×1, and ×4 actually measured slower (unit job becomes the bottleneck at ×4 on macos). The ×2 config is genuinely the
  optimum here.

  windows/terraform ×8 — good. Still acc-bound at 14m18s; the data curve (35m→22m→17m→14m) is still improving. ×8 is justified.

  windows/direct ×4 — good. Acc-bound at 14m18s; ×8 saves only 35s (overhead-floor confirmed). ×4 is the right cutoff.

Additional analysis of 5 commits added in a row:

  Load test results — runner wait times across 5 concurrent runs

  All 5 PRs running simultaneously, 22 acc + 3 unit = 25 test jobs each, 125 total.

  ┌───────────────────────┬──────────────────────────┬──────────────┬────────────────────────────┐
  │          run          │ linux wait (min/avg/max) │  macOS wait  │ windows wait (min/avg/max) │
  ├───────────────────────┼──────────────────────────┼──────────────┼────────────────────────────┤
  │ baseline (single run) │ 85s / 91s / 96s          │ 3s / 4s / 6s │ 215s / 230s / 249s         │
  ├───────────────────────┼──────────────────────────┼──────────────┼────────────────────────────┤
  │ run 1/5               │ 79s / 83s / 88s          │ 5s / 5s / 6s │ 4s / 228s / 286s           │
  ├───────────────────────┼──────────────────────────┼──────────────┼────────────────────────────┤
  │ run 2/5               │ 38s / 98s / 111s         │ 3s / 3s / 4s │ 236s / 262s / 280s         │
  ├───────────────────────┼──────────────────────────┼──────────────┼────────────────────────────┤
  │ run 3/5               │ 111s / 112s / 115s       │ 3s / 4s / 6s │ 271s / 279s / 289s         │
  ├───────────────────────┼──────────────────────────┼──────────────┼────────────────────────────┤
  │ run 4/5               │ 114s / 114s / 114s       │ 3s / 4s / 5s │ 290s / 841s / 1066s        │
  ├───────────────────────┼──────────────────────────┼──────────────┼────────────────────────────┤
  │ run 5/5               │ 107s / 107s / 108s       │ 3s / 4s / 5s │ 1070s / 1116s / 1167s      │
  └───────────────────────┴──────────────────────────┴──────────────┴────────────────────────────┘

  macOS: flat at 3–6s throughout. GitHub-hosted, no issue.

  Linux: barely moves — 91s baseline, 98–115s at full load. The pool handles 35 concurrent linux jobs (5 × 7) with essentially no queuing.

  Windows: this is where it shows. The first 3 runs absorbed normally (230–290s, within baseline noise). By run 4/5 the max jumped to 1066s (~18 min queue wait). Run 5/5 hit 1167s (~19 min)
  on the max shard — meaning some windows runners had to wait nearly 20 minutes just to start, before running any tests.

  Verdict

  Windows queuing starts noticeably at 3–4 concurrent PRs and becomes severe at 5. With 13 windows jobs/PR and ~65 windows jobs competing at 5 simultaneous PRs, the pool is clearly smaller
  than 65. Based on the degradation curve it looks like there are roughly 30–40 windows runners available: runs 1–3 (~39 jobs) absorbed fine, runs 4–5 (~26 more) queued badly.

  The failure in run 2/5 was a transient infra flake (setup step failed before tests), unrelated.

  Practical implication: at busy times with 4+ open PRs actively pushing, windows terraform ×8 will be the bottleneck — those 8 jobs pile up in the queue and each waits up to 19 min before
  starting. Worth asking the infra team what the windows runner pool cap is and whether it can be raised, or reducing windows/terraform to ×4 to halve the per-PR windows footprint.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants