test(golden): harness, three sentinels, semver gate (PR 1/9) by bnusunny · Pull Request #9058 · aws/aws-sam-cli

bnusunny · 2026-06-02T03:55:57Z

Summary

PR 1 of the golden-templates rollout (design in #9057). Adds the in-process harness, three sentinel cases, the semver gate, and the workflow.

What lands

tests/golden/harness.py — run_build_pipeline and run_package_pipeline. Calls expand_language_extensions (PR feat: make AWS::LanguageExtensions local processing opt-in via --language-extensions flag #9033's explicit entry) and the SAM Translator directly; replaces local artifact paths with <<BUILT_ARTIFACT>> and S3 URIs with s3://golden-bucket/<sha256-of-resource>. No subprocess, no AWS calls.
tests/golden/normalize.py — deterministic YAML rendering (sorted keys, dropped Metadata.SamTransformMetrics, single trailing newline).
tests/golden/update_goldens.py — CLI: --filter, --check, --diff, --new. Same harness functions used by tests, so regenerator and verifier cannot diverge. Works both as python tests/golden/update_goldens.py and python -m tests.golden.update_goldens.
tests/golden/check_semver_bump.py — pure check() + git CLI shim. Modifications/deletions of existing pins require major-version bump in the same PR. Additions ungated. Renames forced major (conservative). Self-gates: returns exit 0 when no expected.*.yaml changed.
tests/golden/conftest.py + test_corpus.py — parametrized: one test ID per case dir (e.g. `language_extensions/foreach_static_zip`). Two tests per case — build and package output diffed against pinned files. Mismatch hint points at `update_goldens.py --diff` and explains the semver rule.
.github/workflows/golden-semver-gate.yml — runs unconditionally on every PR. Does NOT use `on.pull_request.paths:` filtering (per review feedback on docs(designs): golden templates and semver check (PR 0/9) #9057): path filtering at the trigger level skips the workflow on unrelated PRs, no status posts, branch protection treats a required check as missing/pending. The script self-gates instead.
Makefile — `test` and `test-all` targets each include `pytest tests/golden` as a separate invocation (so it doesn't enter the 94% `samcli` coverage gate).
Three sentinel cases:
- `sam_resources/serverless_function_zip` — vanilla SAM Function, ZIP packaging.
- `packageable_resources/lambda_function_zip` — raw `AWS::Lambda::Function` with local `Code` path (no SAM transform).
- `language_extensions/foreach_static_zip` — `Fn::ForEach` over a static collection, shared CodeUri. Locks expanded sibling shape (AlphaFunction / BetaFunction with no `Fn::ForEach` key).

Known limitations (will surface in PR 2 onward)

Documented inline in `harness.py`'s module docstring. Listed here so the next contributor knows to expect them:

Image Lambdas / ECR Repository names get over-rewritten. The artifact walker iterates `RESOURCES_WITH_IMAGE_COMPONENT` indiscriminately; ECR `RepositoryName` is the resource name, not a buildable artifact, but currently gets replaced. None of the three sentinels exercise this; PR 3+ adding image-Lambda or ECR cases must extend the walker first.
`_stub_local_uris_for_translator` diverges from `SamTemplateValidator._replace_local_codeuri`. Doesn't yet handle `Globals.Function.CodeUri` or skip-stub on `PackageType: Image`. Will need both before PR 3 lands a `Globals` case or an image case.
`GoldenS3Uploader` is duck-typed, not a subclass of `S3Uploader`. Today only `upload_with_dedup` is exercised. `Template.export()` paths invoked by `merge_language_extensions_s3_uris` may reach for other methods (`make_url`, `delete_artifact`) and silently `AttributeError`. Subclass when the first such case lands.
No AWS::Include sentinel yet. PR 9 (cross-cutting) adds the case. Until then, the harness's `_export_global_artifacts_pass` post-pass is exercised but not asserted.

These are deliberate scope limits, not bugs — the harness is correct for what's currently committed and covers the specific bug shapes from #9004, #9005, #9027, #9029. The wave plan has each axis extending the harness as needed.

Test plan

`pytest tests/golden -v` — 37 passed.
`python tests/golden/update_goldens.py --check` — exit 0.
`python tests/golden/update_goldens.py --diff` — exit 0, no output.
`python tests/golden/check_semver_bump.py --base origin/develop --head HEAD` — exit 0 ("6 new pin(s); no version bump required").
Manually edited a pinned file; `pytest tests/golden -k ` printed the actionable hint pointing at `update_goldens.py --diff` and the semver rule.
Manually wired a fake modification through `check_semver_bump.py`'s `check()` — returns non-zero with the suggested next major version unless major was bumped.
`make lint` — only the 3 pre-existing mypy errors in `samcli/lib/utils/resource_trigger.py`, `samcli/local/apigw/authorizers/lambda_authorizer.py`, `samcli/lib/sync/sync_flow.py` (verified present on origin/develop).
`ruff check tests/golden` — clean.

Three sentinel cases (SAM, CFN, LE-foreach) used by the harness in the next commit. Each case carries metadata.yaml + README.md + src/app.py. Pinned outputs (expected.*.yaml) are generated by update_goldens.py once the harness is in place.

Strips volatile Metadata.SamTransformMetrics, sort_keys=True, single trailing newline. Used by both the harness and the regenerator so they cannot diverge on serialization.

Mirrors sam build's pipeline up to the template-write step: LE expansion -> SAM transform -> local-artifact-path normalization. Calls expand_language_extensions and the upstream samtranslator.translator.translator.Translator directly; no subprocess, no AWS calls.

Walks packageable artifact properties (using RESOURCES_WITH_LOCAL_PATHS + RESOURCES_WITH_IMAGE_COMPONENT) and rewrites each to a deterministic s3://golden-bucket/<sha256>. Runs _export_global_artifacts_pass before and after rewriting to mirror PR #9030's pre-LE AWS::Include pass.

CLI supports --filter / --check / --diff / --new. Dry-run modes never mutate disk; --new only writes missing files. Three sentinel cases have expected.{build,package}.yaml pinned.

One parametrized test per case for build + package output. Mismatch hint points at update_goldens.py and explains the semver gate's expectations.

Pure-function check() takes a changed-files list + base/head versions and returns (exit_code, message). git interaction lives behind a thin CLI shim that reads git diff --name-status and pulls __version__ from both refs.

Triggered only when corpus pins or samcli/__init__.py change. Tiny, auditable, isolated from build.yml so it can be reasoned about independently.

Separate pytest invocation so the corpus harness doesn't enter the samcli coverage calculation.

- Type-narrow rtype before passing to RESOURCES_WITH_LOCAL_PATHS.get. - Annotate Translator.translate() return so harness's run_build_pipeline return type is preserved (no_any_return). - Use Optional[str] = None on GoldenS3Uploader keyword args (PEP 484 no-implicit-optional).

Drop unused pytest/yaml imports across the new tests, and replace the rename-detection magic number in check_semver_bump.py with a named constant. No behavior change. Surfaces from ruff check tests/golden when run directly (the project's `make lint` target restricts to samcli/schema, so these escaped CI).

…rect scripts Both scripts have docstrings and corpus-test failure hints that tell users to invoke them as `python tests/golden/<script>.py ...` from the repo root. But Python does not auto-add the repo root to sys.path for direct script invocation, so the absolute `from tests.golden...` imports fail with ModuleNotFoundError. Prepend the repo root to sys.path when each script is run as __main__, gated on `__package__ is None` so the module form continues to work.

…nt-aware artifact digest Two correctness fixes for the regen CLI and the harness it drives: 1. update_goldens.py --new: peek at expected.{build,package}.yaml before running _generate. When both already exist, skip the case entirely instead of running the full build+package pipeline and discarding the result. Add a regression test that monkeypatches _generate to raise so any future re-introduction of the wasted call is caught. 2. harness._packageable_replacement: include resource_id in the digest salt. After the build pass replaces local paths with <<BUILT_ARTIFACT>>, `current` is constant — so every Lambda Code in the corpus collapsed to the same S3Key, and a regression that swapped two functions' artifacts would have gone undetected. Salting on resource_id makes per-resource digests distinct (Alpha vs Beta within one template, etc.). Pins regenerated to reflect the new digests.

…n limitations Cleanups surfaced by the code-quality review of PR 1: - harness.run_package_pipeline: drop the dead first _export_global_artifacts_pass(original) call. The original template was read, mutated by that pass, and then never referenced again — the artifact rewrite operates on a deep-copied build_output, so the first pass had no observable effect. Update the docstring to match. - harness._walk_artifact_properties: replace the bare `-> List:` return type with `-> List[Tuple[str, str, Dict[str, Any]]]` so downstream callers and reviewers can see the shape without reading the body. - harness module docstring: add a Known limitations subsection covering four issues that the next contributor should expect (image / ECR over-rewrite, _stub_local_uris_for_translator drift, GoldenS3Uploader not subclassing S3Uploader, missing AWS::Include sentinel). They will be addressed by PR 2 / corpus growth. - test_harness_package.test_package_sam_case_rewrites_code_to_s3_uri: replace the lenient `s3_uri_field == "golden-bucket"` short-circuit with two specific assertions that the dict has the expected keys. - test_harness_build: drop unused `tmp_path` fixture parameters from two tests. - update_goldens.py / check_semver_bump.py: silence mypy `[unreachable]` on the script-form sys.path bootstrap added in the previous fix.

aws-sam-cli-bot

Code Review Results

Reviewed: faf5ec8..a668286
Files: 33 (focused on harness, semver gate, normalize, update_goldens, conftest, workflow)
Comments: 3

CI's black-check runs against Python 3.10 and rejected the formatting written by black running under Python 3.13 (the venv's interpreter), which preferred longer single-line function signatures that 3.10's parser couldn't accept. Re-running black with an explicit --target-version py310 collapses the affected lines back to the multi-line shape CI accepts. Pure formatting; no behavior change. 37/37 tests still pass.

aws-sam-cli-bot

Code Review Results

Reviewed: faf5ec8..2af45cd
Files: 33
Comments: 3

Per GitHub's security hardening guide, context expressions like ${{ github.base_ref }} are substituted into the script string before the shell parses it. Although base_ref is constrained by git ref-name rules, the canonical mitigation is to pass the value via env: so the shell sees a literal variable expansion.

git diff --name-status emits C<score>\tOLD\tNEW for copies (when -C is enabled via repo config). Previously only R was special-cased; C fell through with status="C", which doesn't match A/M/D filters — so a copy of an existing pin would silently bypass the gate. Note the asymmetry: - Rename: source goes away (D) and target appears (A). - Copy: source persists, only the target is new (A only). Renamed _RENAME_DIFF_PARTS to _RENAME_OR_COPY_PARTS to reflect the broader meaning. Added parser-level tests covering both rename and copy lines, plus an end-to-end test asserting a copy is observed (not silently elided).

main() unconditionally read base/head versions before checking whether any corpus pin changed. _read_version_at_ref raises if the regex misses or if samcli/__init__.py is missing at the ref, so on every unrelated PR — the common case — it could fail for unrelated reasons even though the gate is supposed to be self-gating. Now main() short-circuits with exit 0 the moment we determine no corpus pin paths are in the diff. Added two tests: one asserting _read_version_at_ref is never called when the diff has no corpus pin changes, and a counterpart asserting it IS called (and the gate runs) when a corpus pin does change.

The parser only rejected --check + --new and --diff + --new. Passing both --check and --diff was accepted, with --diff winning on output. The docstring describes them as alternatives, so reject the combination at the parser — symmetric with the existing --new checks, and less surprise.

Pull magic numbers (rename change count, argparse usage exit code) into named constants, and let black collapse the adjacent-string formatting on the new parser tests.

CI runs without AWS_DEFAULT_REGION and without ~/.aws/config; the upstream SAM Translator's parameter SDK calls ``boto3.session.Session()`` and unconditionally raises ``NoRegionFound`` when the resulting session has no region — even when AWS::Region is already in the parameter_values dict we pass. Hand the Translator a session pinned to the same region we feed via parameter_values (us-east-1, from ``IntrinsicsSymbolTable.DEFAULT_PSEUDO_PARAM_VALUES``). Pure in-process state, no AWS calls. Confirmed by reproducing the CI failure locally with ``env -u AWS_DEFAULT_REGION -u AWS_REGION HOME=/tmp/no-aws-config pytest tests/golden`` — 16 failures match CI without the fix; 43 pass with it.

…merge The harness was producing pinned expected.{build,package}.yaml files that showed the LE-EXPANDED template (AlphaFunction/BetaFunction as siblings), but real `sam build` and `sam package` preserve the original Fn::ForEach structure with merged artifact paths inside the body — that's the design in PR #8637. run_build_pipeline now delegates the LE merge to ``BuildContext._get_template_for_output`` (invoked via a __new__ shell since the method only reads ``self.*`` for other methods on the same class). run_package_pipeline mirrors PackageContext's LE flow: ``_export_global_artifacts_pass`` → LE expand → rewrite artifacts → ``merge_language_extensions_s3_uris`` to copy URIs back into the original ForEach body. The SAM transform is skipped on the LE branch — the body keeps ``AWS::Serverless::Function`` with ``CodeUri`` (matching real sam build / sam package output, where the SAM transform runs server-side at deploy time). Update the corresponding unit test assertions to verify ``Fn::ForEach::*`` is preserved instead of expanded.

…rving harness Both files now show ``Fn::ForEach::Workers`` at the top of Resources with the body resource's ``CodeUri`` rewritten to the build placeholder (build) or an s3://golden-bucket/<sha256> URI (package). This matches what real ``sam build`` / ``sam package`` write to disk for an LE template — see PR #8637.

…ne reimplementation The previous harness reimplemented pieces of the build/package flow (running the SAM translator manually, walking artifact properties, calling internal merge helpers directly), and that drift produced two user-visible bugs in the pinned outputs: 1. ``Fn::ForEach`` was replaced with the expanded form. Real ``sam build`` preserves the ForEach so CloudFormation can re-expand it server-side at deploy time (PR #8637). 2. ``AWS::Serverless::Function`` got SAM-transformed to ``AWS::Lambda::Function`` (with an auto-generated IAM Role). Real ``sam build`` does NOT run the SAM transform; that happens server-side at deploy. Both were symptoms of running steps SAM CLI itself does not run. Fix the underlying issue by driving the same ``BuildContext`` and ``PackageContext`` classes the CLI instantiates and stubbing only the network/compute boundaries: - ``ApplicationBuilder.build`` returns an ``ApplicationBuildResult`` with artifacts pointing at a sentinel directory under ``<build_dir>/__sam_golden_artifact__/<full_path>/``. The placeholder file's content is the resource's full path, which makes the downstream content-addressed S3 URI deterministic per resource. - ``S3Uploader.upload`` / ``upload_with_dedup`` return ``s3://golden-bucket/<sha256>`` where the digest is taken over the uploaded file's content. After build, the on-disk ``.aws-sam/build/template.yaml`` is read back and any artifact-property string that resolves under the sentinel directory is rewritten to ``<<BUILT_ARTIFACT>>``. Everything else (LE expansion, ForEach merge, Mappings generation, AWS::Include export) flows through the real CLI code paths. Also: ``samcli.yamlhelper.yaml_parse`` mutates the global ``yaml.SafeLoader`` to emit ``OrderedDict``; once any caller imports yamlhelper, every subsequent ``yaml.safe_load`` returns OrderedDict and ``yaml.safe_dump`` rejects it. Coerce parsed templates to plain ``dict`` in both the harness and ``normalize`` so the test corpus and pin regenerator see plain dicts. Test signatures change: ``run_package_pipeline`` no longer takes a ``build_output`` dict (PackageContext re-loads the template from disk); a new ``run_build_and_package`` helper produces both pins from a single build for callers that need both.

Five sentinel pins now reflect what real ``sam build`` and ``sam package`` write to disk: - ``sam_resources/serverless_function_zip/expected.build.yaml`` now contains ``AWS::Serverless::Function`` (no SAM transform), with ``CodeUri: <<BUILT_ARTIFACT>>`` and a SAM-CLI-added ``Metadata.SamResourceId``. The pre-existing auto-generated ``HelloFunctionRole`` is gone (the SAM transform runs server-side at deploy time, not at build time). - ``sam_resources/serverless_function_zip/expected.package.yaml`` preserves the same shape, with ``CodeUri`` rewritten to ``s3://golden-bucket/<sha>``. - ``packageable_resources/lambda_function_zip/expected.build.yaml`` now carries ``Metadata.SamResourceId`` (added by SAM CLI's stack provider for raw CFN Lambdas, too). - ``packageable_resources/lambda_function_zip/expected.package.yaml`` carries the same metadata; ``Code`` becomes the ``{S3Bucket, S3Key}`` dict shape that real ``sam package`` emits. - ``language_extensions/foreach_static_zip/expected.package.yaml`` digest changes because the package URI is now content-addressed over the harness's sentinel placeholder file (whose content is the resource full path) rather than a synthesized hash. ``Fn::ForEach`` preservation is unchanged.

Real `sam package` zips the build artifact directory before "uploading" it. The harness's S3Uploader.upload stub was hashing the resulting zip bytes — but raw zip bytes embed OS-specific metadata (DOS timestamps, Unix mode bits, central directory ordering), so the same source content produced different digests on Linux vs Windows. The Windows runner saw the package pins regenerate with new hashes and the corpus tests failed on byte-exact diff: - CodeUri: s3://golden-bucket/dd2bf2d8... + CodeUri: s3://golden-bucket/1c1f1e20... Hash zip files via a normalized representation: sorted member names plus each member's content sha256. The envelope metadata is excluded from the digest, so identical source content produces an identical hash on every OS. Non-zip artifacts continue to be hashed by raw bytes. Detection uses zipfile.is_zipfile so .jar / .war and other zip-format artifacts are treated the same as a Lambda zip. Pins regenerated once with the new hash; from now on Linux and Windows runners agree.

…lpath Real `sam build` writes CodeUri/Code/etc. as a relative path from the output template's directory to the artifact directory — for a top-level resource the result is just the resource id (e.g. `CodeUri: HelloFunction`) because both end up under .aws-sam/build/. The harness was substituting a synthetic <<BUILT_ARTIFACT>> placeholder instead, which obscured what real sam build actually produces and made the build pin disagree with the CLI's own output. Drop the placeholder rewriter entirely. Stage the case dir (template.yaml + src/) into the temp tree before driving BuildContext, and put .aws-sam/build/ inside the staged tree. ApplicationBuilder's relpath then resolves to .aws-sam/build/<full_path> which move_template collapses to <full_path> when writing to .aws-sam/build/template.yaml — exactly what real sam build writes. Net: harness drops the artifact-walker, the path-rewriter, the RESOURCES_WITH_LOCAL_PATHS / RESOURCES_WITH_IMAGE_COMPONENT imports, the BUILT_ARTIFACT_PLACEHOLDER constant, and the sentinel-dirname indirection (~150 lines). All sentinel build pins now show the real deterministic relpath. Tests updated to assert the relpath shape; one test renamed for accuracy (`test_build_cfn_case_replaces_code_with_placeholder` → `test_build_cfn_case_keeps_lambda_function`). 42/42 pass under CI-like env.

…erge queue This repo uses GitHub's merge queue (per build.yml's existing merge_group trigger). When a PR enters the queue, GitHub dispatches a merge_group event against an ephemeral ref, not pull_request. A required check that only declares on.pull_request never reports a status on the merge_group event, so the merge queue treats it as missing/pending and blocks the merge — the same skipped-but-required failure mode the path-filter pitfall was about, just under a different trigger. Mirror build.yml's pattern: listen on both pull_request and merge_group with the same branch filters. The check_semver_bump.py script's self-gating (exit 0 when no expected.*.yaml changed) only helps when the workflow runs at all; this fix makes it run. Add a small "resolve base ref" step because the two event types expose the target branch differently: pull_request sets github.base_ref to the bare branch name; merge_group leaves base_ref empty and exposes the full ref at github.event.merge_group.base_ref (e.g. "refs/heads/develop"), which we strip to match the pull_request shape before passing to check_semver_bump.py. Caught by review on PR #9057.

bnusunny added 15 commits June 2, 2026 02:34

test(golden): normalize() — deterministic YAML rendering

ef13c26

Strips volatile Metadata.SamTransformMetrics, sort_keys=True, single trailing newline. Used by both the harness and the regenerator so they cannot diverge on serialization.

test(golden): update_goldens.py CLI + initial pinned outputs

5c72dd8

CLI supports --filter / --check / --diff / --new. Dry-run modes never mutate disk; --new only writes missing files. Three sentinel cases have expected.{build,package}.yaml pinned.

test(golden): parametrized corpus tests with actionable hints

ac2771b

One parametrized test per case for build + package output. Mismatch hint points at update_goldens.py and explains the semver gate's expectations.

ci(golden): standalone workflow runs the semver gate on PRs

dde8373

Triggered only when corpus pins or samcli/__init__.py change. Tiny, auditable, isolated from build.yml so it can be reasoned about independently.

chore(make): include tests/golden in test and test-all targets

e8b475d

Separate pytest invocation so the corpus harness doesn't enter the samcli coverage calculation.

docs(golden): authoring guide for new and existing cases

eb1f954

bnusunny requested a review from a team as a code owner June 2, 2026 03:55

github-actions Bot added the pr/internal label Jun 2, 2026

aws-sam-cli-bot reviewed Jun 2, 2026

View reviewed changes

Comment thread .github/workflows/golden-semver-gate.yml Outdated

Comment thread tests/golden/check_semver_bump.py Outdated

Comment thread tests/golden/update_goldens.py

aws-sam-cli-bot reviewed Jun 2, 2026

View reviewed changes

Comment thread .github/workflows/golden-semver-gate.yml Outdated

Comment thread tests/golden/check_semver_bump.py Outdated

Comment thread tests/golden/check_semver_bump.py

bnusunny added 10 commits June 2, 2026 04:06

chore(golden): satisfy ruff PLR2004 + black on new fix-up tests

e7828b6

Pull magic numbers (rename change count, argparse usage exit code) into named constants, and let black collapse the adjacent-string formatting on the new parser tests.

bnusunny added 3 commits June 2, 2026 06:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(golden): harness, three sentinels, semver gate (PR 1/9)#9058

test(golden): harness, three sentinels, semver gate (PR 1/9)#9058
bnusunny wants to merge 29 commits into
developfrom
feat/golden-templates-harness

bnusunny commented Jun 2, 2026

Uh oh!

aws-sam-cli-bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aws-sam-cli-bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bnusunny commented Jun 2, 2026

Summary

What lands

Known limitations (will surface in PR 2 onward)

Test plan

Uh oh!

aws-sam-cli-bot left a comment

Choose a reason for hiding this comment

Code Review Results

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aws-sam-cli-bot left a comment

Choose a reason for hiding this comment

Code Review Results

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants