[ExecuTorch][WebGPU] et_vk.embedding_q4gsw test suite (export + native golden) by JulianCloudNTH · Pull Request #20289 · pytorch/executorch

JulianCloudNTH · 2026-06-15T21:52:55Z

Stack from ghstack (oldest at bottom):

[ExecuTorch][WebGPU] et_vk.prepack test suite (export + native golden) #20292
[ExecuTorch][WebGPU] Add et_vk.prepack (constant-tensor packing) for E2E weight loading #20265
[ExecuTorch][WebGPU] et_vk.apply_rotary_emb test suite (export + native golden) #20290
[ExecuTorch][WebGPU] Add et_vk.apply_rotary_emb (interleaved RoPE) + ValueList multi-output #20264
-> [ExecuTorch][WebGPU] et_vk.embedding_q4gsw test suite (export + native golden) #20289
[ExecuTorch][WebGPU] Add et_vk.embedding_q4gsw (4-bit groupwise-symmetric quantized embedding) #20263

Splits the et_vk.embedding_q4gsw tests into their own diff (op below, tests above), matching the sdpa/update_cache/linear_q4gsw op+tests convention, and brings them to the same rigor: a multi-shape config sweep run on-device, an fp64 dual-oracle, and required-when-present gating.
@exported-using-ghexport

Differential Revision: D108668383

[ghstack-poisoned]

pytorch-bot · 2026-06-15T21:52:59Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20289

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures

As of commit 97bd5e5 with merge base 0378fc4 ():

NEW FAILURES - The following jobs have failed:

pull / test-moshi-linux / linux-job (gh)
RuntimeError: Command docker exec -t 546239962a49ff4eb093034a11c4cf66192849c674c22bc5348b0c883272403a /exec failed with exit code 1
pull / test-static-llama-qnn-linux (stories_260k_bc) / linux-job (gh)
RuntimeError: Command docker exec -t 51feb919e201e4efc176d06ad03636f874bec751aede037133890a840631404a /exec failed with exit code 92

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-06-15T21:54:59Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

JulianCloudNTH · 2026-06-15T21:59:21Z

@claude review

claude · 2026-06-15T22:00:17Z

Claude finished @JulianCloudNTH's task in 3m 22s —— View job

Review: `et_vk.embedding_q4gsw` test suite

Read changed files
Analyze test logic & golden oracle
Check CI script & gating
Post review

Nice split — the structure mirrors the sdpa/linear_q4gsw convention well, and the fp64 dual-oracle in test_golden_matches_eager is a genuinely strong guard against a self-consistent-but-wrong golden. The nibble ordering in the manual dequant (high first for the non-linear branch) correctly matches embedding_q4gsw_impl in custom_ops_lib.py. A few points, none blocking:

1. Tolerance gate doesn't reuse the existing per-element abs-OR-rel helper (main point)

test_embedding_q4gsw (test_webgpu_native.cpp:493) uses a global gate:

if (max_abs_err > 1e-3f || max_rel_err > 1e-3f) { ... FAIL ... }

This fails if the max abs error or max rel error exceeds 1e-3. For an embedding output element near zero, max_rel_err = ae / max(|golden|, 1e-6) can blow up even when the abs error is tiny — exactly the failure mode that quant_within_tol / sdpa_within_tol (:505, :356) were written to avoid with their per-element abs-OR-rel semantics. Embedding outputs are dequantized random weights and will contain near-zero values, so this gate is both stricter and less principled than the helper right below it. Recommend reusing quant_within_tol(out, golden, n, /*atol*/1e-3, /*rtol*/1e-3, &ma, &mr) for consistency with the rest of the suite. Fix this →

2. Orphaned comment in `test_webgpu_native.cpp`

The new load_indices + test_embedding_q4gsw were inserted between line 410 and quant_within_tol. The comment at :410:

// Per-element dual tolerance (abs OR rel), parameterized like sdpa_within_tol.

originally documented quant_within_tol (it sits directly above it on base). It now misleadingly sits above load_indices. Move it back down to immediately precede quant_within_tol. Fix this →

3. CI WARN message is inaccurate for the CI path

In test_webgpu_native_ci.sh, the env vars (EMBEDDING_MODEL, etc.) are assigned unconditionally at the top and always passed to webgpu_native_test. So if the export at :68-72 fails, the || echo "...embedding cases self-skip" fires — but the env vars are still set, so the native test takes the m && ip && g && *m... branch, fails to load the missing files, and returns FAIL, not skip. The self-skip only happens for local runs where the env is unset. The message should say something like "embedding configs will FAIL in webgpu_native_test" (matching the q4gsw linear WARN on :66), since per the PR description these are meant to be required-when-present.

4. fp64 reference loop is a pure-Python double loop (minor)

test_golden_matches_eager iterates vocab * embed in Python — 512 * 2048 ≈ 1.05M element accesses for llama1b, each doing tensor indexing. This will be slow (likely tens of seconds). It's correct and readable, but consider a vectorized fp64 dequant (unpack nibbles + broadcast scales) if test runtime matters. Not required.

Minor

load_indices (:411) is near-identical to load_golden (:335) modulo type — fine to leave, but a small templated load_raw<T> would dedupe.
num_indices/embed in emb_configs are hardcoded (4/64, 4/2048) and must be kept in sync manually with SHAPES in the Python. Acceptable for a test, just a coupling to be aware of.
_quant_params selecting the weight by dtype == torch.uint8 and scales by k.endswith("scales") is a touch brittle, but reasonable given the fixed EmbeddingQuantHandler recipe.

Overall this looks solid and ready to land once the tolerance gate (#1) is reconciled with the existing helper.
· gh/JulianCloudNTH/28/head

[ghstack-poisoned]

Update

45732c9

[ghstack-poisoned]

JulianCloudNTH temporarily deployed to cadence June 15, 2026 21:53 — with GitHub Actions Inactive

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 15, 2026

Update

97bd5e5

[ghstack-poisoned]

JulianCloudNTH temporarily deployed to cadence June 15, 2026 22:25 — with GitHub Actions Inactive

meta-codesync Bot added the meta-exported label Jun 15, 2026

This was referenced Jun 15, 2026

[ExecuTorch][WebGPU] Add et_vk.prepack (constant-tensor packing) for E2E weight loading #20265

Open

[ExecuTorch][WebGPU] et_vk.prepack test suite (export + native golden) #20292

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ExecuTorch][WebGPU] et_vk.embedding_q4gsw test suite (export + native golden)#20289

[ExecuTorch][WebGPU] et_vk.embedding_q4gsw test suite (export + native golden)#20289
JulianCloudNTH wants to merge 2 commits into
gh/JulianCloudNTH/28/basefrom
gh/JulianCloudNTH/28/head

JulianCloudNTH commented Jun 15, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Jun 15, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 15, 2026

Uh oh!

JulianCloudNTH commented Jun 15, 2026

Uh oh!

claude Bot commented Jun 15, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

JulianCloudNTH commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20289

❌ 2 New Failures

Uh oh!

github-actions Bot commented Jun 15, 2026

This PR needs a release notes: label

Uh oh!

JulianCloudNTH commented Jun 15, 2026

Uh oh!

claude Bot commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review: et_vk.embedding_q4gsw test suite

1. Tolerance gate doesn't reuse the existing per-element abs-OR-rel helper (main point)

2. Orphaned comment in test_webgpu_native.cpp

3. CI WARN message is inaccurate for the CI path

4. fp64 reference loop is a pure-Python double loop (minor)

Minor

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

JulianCloudNTH commented Jun 15, 2026 •

edited

Loading

pytorch-bot Bot commented Jun 15, 2026 •

edited

Loading

This PR needs a `release notes:` label

claude Bot commented Jun 15, 2026 •

edited

Loading

Review: `et_vk.embedding_q4gsw` test suite

2. Orphaned comment in `test_webgpu_native.cpp`