Skip to content

test(golden): harness, three sentinels, semver gate (PR 1/9)#9058

Open
bnusunny wants to merge 29 commits into
developfrom
feat/golden-templates-harness
Open

test(golden): harness, three sentinels, semver gate (PR 1/9)#9058
bnusunny wants to merge 29 commits into
developfrom
feat/golden-templates-harness

Conversation

@bnusunny
Copy link
Copy Markdown
Contributor

@bnusunny bnusunny commented Jun 2, 2026

Summary

PR 1 of the golden-templates rollout (design in #9057). Adds the in-process harness, three sentinel cases, the semver gate, and the workflow.

What lands

  • tests/golden/harness.pyrun_build_pipeline and run_package_pipeline. Calls expand_language_extensions (PR feat: make AWS::LanguageExtensions local processing opt-in via --language-extensions flag #9033's explicit entry) and the SAM Translator directly; replaces local artifact paths with <<BUILT_ARTIFACT>> and S3 URIs with s3://golden-bucket/<sha256-of-resource>. No subprocess, no AWS calls.
  • tests/golden/normalize.py — deterministic YAML rendering (sorted keys, dropped Metadata.SamTransformMetrics, single trailing newline).
  • tests/golden/update_goldens.py — CLI: --filter, --check, --diff, --new. Same harness functions used by tests, so regenerator and verifier cannot diverge. Works both as python tests/golden/update_goldens.py and python -m tests.golden.update_goldens.
  • tests/golden/check_semver_bump.py — pure check() + git CLI shim. Modifications/deletions of existing pins require major-version bump in the same PR. Additions ungated. Renames forced major (conservative). Self-gates: returns exit 0 when no expected.*.yaml changed.
  • tests/golden/conftest.py + test_corpus.py — parametrized: one test ID per case dir (e.g. `language_extensions/foreach_static_zip`). Two tests per case — build and package output diffed against pinned files. Mismatch hint points at `update_goldens.py --diff` and explains the semver rule.
  • .github/workflows/golden-semver-gate.yml — runs unconditionally on every PR. Does NOT use `on.pull_request.paths:` filtering (per review feedback on docs(designs): golden templates and semver check (PR 0/9) #9057): path filtering at the trigger level skips the workflow on unrelated PRs, no status posts, branch protection treats a required check as missing/pending. The script self-gates instead.
  • Makefile — `test` and `test-all` targets each include `pytest tests/golden` as a separate invocation (so it doesn't enter the 94% `samcli` coverage gate).
  • Three sentinel cases:
    • `sam_resources/serverless_function_zip` — vanilla SAM Function, ZIP packaging.
    • `packageable_resources/lambda_function_zip` — raw `AWS::Lambda::Function` with local `Code` path (no SAM transform).
    • `language_extensions/foreach_static_zip` — `Fn::ForEach` over a static collection, shared CodeUri. Locks expanded sibling shape (AlphaFunction / BetaFunction with no `Fn::ForEach` key).

Known limitations (will surface in PR 2 onward)

Documented inline in `harness.py`'s module docstring. Listed here so the next contributor knows to expect them:

  1. Image Lambdas / ECR Repository names get over-rewritten. The artifact walker iterates `RESOURCES_WITH_IMAGE_COMPONENT` indiscriminately; ECR `RepositoryName` is the resource name, not a buildable artifact, but currently gets replaced. None of the three sentinels exercise this; PR 3+ adding image-Lambda or ECR cases must extend the walker first.
  2. `_stub_local_uris_for_translator` diverges from `SamTemplateValidator._replace_local_codeuri`. Doesn't yet handle `Globals.Function.CodeUri` or skip-stub on `PackageType: Image`. Will need both before PR 3 lands a `Globals` case or an image case.
  3. `GoldenS3Uploader` is duck-typed, not a subclass of `S3Uploader`. Today only `upload_with_dedup` is exercised. `Template.export()` paths invoked by `merge_language_extensions_s3_uris` may reach for other methods (`make_url`, `delete_artifact`) and silently `AttributeError`. Subclass when the first such case lands.
  4. No AWS::Include sentinel yet. PR 9 (cross-cutting) adds the case. Until then, the harness's `_export_global_artifacts_pass` post-pass is exercised but not asserted.

These are deliberate scope limits, not bugs — the harness is correct for what's currently committed and covers the specific bug shapes from #9004, #9005, #9027, #9029. The wave plan has each axis extending the harness as needed.

Test plan

  • `pytest tests/golden -v` — 37 passed.
  • `python tests/golden/update_goldens.py --check` — exit 0.
  • `python tests/golden/update_goldens.py --diff` — exit 0, no output.
  • `python tests/golden/check_semver_bump.py --base origin/develop --head HEAD` — exit 0 ("6 new pin(s); no version bump required").
  • Manually edited a pinned file; `pytest tests/golden -k ` printed the actionable hint pointing at `update_goldens.py --diff` and the semver rule.
  • Manually wired a fake modification through `check_semver_bump.py`'s `check()` — returns non-zero with the suggested next major version unless major was bumped.
  • `make lint` — only the 3 pre-existing mypy errors in `samcli/lib/utils/resource_trigger.py`, `samcli/local/apigw/authorizers/lambda_authorizer.py`, `samcli/lib/sync/sync_flow.py` (verified present on origin/develop).
  • `ruff check tests/golden` — clean.

bnusunny added 15 commits June 2, 2026 02:34
Three sentinel cases (SAM, CFN, LE-foreach) used by the harness in the
next commit. Each case carries metadata.yaml + README.md + src/app.py.
Pinned outputs (expected.*.yaml) are generated by update_goldens.py
once the harness is in place.
Strips volatile Metadata.SamTransformMetrics, sort_keys=True,
single trailing newline. Used by both the harness and the
regenerator so they cannot diverge on serialization.
Mirrors sam build's pipeline up to the template-write step:
LE expansion -> SAM transform -> local-artifact-path normalization.
Calls expand_language_extensions and the upstream
samtranslator.translator.translator.Translator directly; no
subprocess, no AWS calls.
Walks packageable artifact properties (using RESOURCES_WITH_LOCAL_PATHS
+ RESOURCES_WITH_IMAGE_COMPONENT) and rewrites each to a deterministic
s3://golden-bucket/<sha256>. Runs _export_global_artifacts_pass before
and after rewriting to mirror PR #9030's pre-LE AWS::Include pass.
CLI supports --filter / --check / --diff / --new. Dry-run modes never
mutate disk; --new only writes missing files. Three sentinel cases
have expected.{build,package}.yaml pinned.
One parametrized test per case for build + package output. Mismatch
hint points at update_goldens.py and explains the semver gate's
expectations.
Pure-function check() takes a changed-files list + base/head versions
and returns (exit_code, message). git interaction lives behind a thin
CLI shim that reads git diff --name-status and pulls __version__ from
both refs.
Triggered only when corpus pins or samcli/__init__.py change. Tiny,
auditable, isolated from build.yml so it can be reasoned about
independently.
Separate pytest invocation so the corpus harness doesn't enter the
samcli coverage calculation.
- Type-narrow rtype before passing to RESOURCES_WITH_LOCAL_PATHS.get.
- Annotate Translator.translate() return so harness's run_build_pipeline
  return type is preserved (no_any_return).
- Use Optional[str] = None on GoldenS3Uploader keyword args (PEP 484
  no-implicit-optional).
Drop unused pytest/yaml imports across the new tests, and replace
the rename-detection magic number in check_semver_bump.py with a
named constant. No behavior change. Surfaces from ruff check
tests/golden when run directly (the project's `make lint` target
restricts to samcli/schema, so these escaped CI).
…rect scripts

Both scripts have docstrings and corpus-test failure hints that tell users
to invoke them as `python tests/golden/<script>.py ...` from the repo root.
But Python does not auto-add the repo root to sys.path for direct script
invocation, so the absolute `from tests.golden...` imports fail with
ModuleNotFoundError.

Prepend the repo root to sys.path when each script is run as __main__,
gated on `__package__ is None` so the module form continues to work.
…nt-aware artifact digest

Two correctness fixes for the regen CLI and the harness it drives:

1. update_goldens.py --new: peek at expected.{build,package}.yaml before
   running _generate. When both already exist, skip the case entirely
   instead of running the full build+package pipeline and discarding the
   result. Add a regression test that monkeypatches _generate to raise so
   any future re-introduction of the wasted call is caught.

2. harness._packageable_replacement: include resource_id in the digest
   salt. After the build pass replaces local paths with
   <<BUILT_ARTIFACT>>, `current` is constant — so every Lambda Code in
   the corpus collapsed to the same S3Key, and a regression that swapped
   two functions' artifacts would have gone undetected. Salting on
   resource_id makes per-resource digests distinct (Alpha vs Beta within
   one template, etc.). Pins regenerated to reflect the new digests.
…n limitations

Cleanups surfaced by the code-quality review of PR 1:

- harness.run_package_pipeline: drop the dead first
  _export_global_artifacts_pass(original) call. The original template was
  read, mutated by that pass, and then never referenced again — the
  artifact rewrite operates on a deep-copied build_output, so the first
  pass had no observable effect. Update the docstring to match.
- harness._walk_artifact_properties: replace the bare `-> List:` return
  type with `-> List[Tuple[str, str, Dict[str, Any]]]` so downstream
  callers and reviewers can see the shape without reading the body.
- harness module docstring: add a Known limitations subsection covering
  four issues that the next contributor should expect (image / ECR
  over-rewrite, _stub_local_uris_for_translator drift, GoldenS3Uploader
  not subclassing S3Uploader, missing AWS::Include sentinel). They will
  be addressed by PR 2 / corpus growth.
- test_harness_package.test_package_sam_case_rewrites_code_to_s3_uri:
  replace the lenient `s3_uri_field == "golden-bucket"` short-circuit
  with two specific assertions that the dict has the expected keys.
- test_harness_build: drop unused `tmp_path` fixture parameters from
  two tests.
- update_goldens.py / check_semver_bump.py: silence mypy `[unreachable]`
  on the script-form sys.path bootstrap added in the previous fix.
@bnusunny bnusunny requested a review from a team as a code owner June 2, 2026 03:55
Copy link
Copy Markdown
Collaborator

@aws-sam-cli-bot aws-sam-cli-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Results

Reviewed: faf5ec8..a668286
Files: 33 (focused on harness, semver gate, normalize, update_goldens, conftest, workflow)
Comments: 3

Comment thread .github/workflows/golden-semver-gate.yml Outdated
Comment thread tests/golden/check_semver_bump.py Outdated
Comment thread tests/golden/update_goldens.py
CI's black-check runs against Python 3.10 and rejected the
formatting written by black running under Python 3.13 (the venv's
interpreter), which preferred longer single-line function signatures
that 3.10's parser couldn't accept. Re-running black with an explicit
--target-version py310 collapses the affected lines back to the
multi-line shape CI accepts.

Pure formatting; no behavior change. 37/37 tests still pass.
Copy link
Copy Markdown
Collaborator

@aws-sam-cli-bot aws-sam-cli-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Results

Reviewed: faf5ec8..2af45cd
Files: 33
Comments: 3

Comment thread .github/workflows/golden-semver-gate.yml Outdated
Comment thread tests/golden/check_semver_bump.py Outdated
Comment thread tests/golden/check_semver_bump.py
bnusunny added 10 commits June 2, 2026 04:06
Per GitHub's security hardening guide, context expressions like
${{ github.base_ref }} are substituted into the script string before the
shell parses it. Although base_ref is constrained by git ref-name rules,
the canonical mitigation is to pass the value via env: so the shell sees
a literal variable expansion.
git diff --name-status emits C<score>\tOLD\tNEW for copies (when -C is
enabled via repo config). Previously only R was special-cased; C fell
through with status="C", which doesn't match A/M/D filters — so a copy
of an existing pin would silently bypass the gate.

Note the asymmetry:
  - Rename: source goes away (D) and target appears (A).
  - Copy:   source persists, only the target is new (A only).

Renamed _RENAME_DIFF_PARTS to _RENAME_OR_COPY_PARTS to reflect the
broader meaning. Added parser-level tests covering both rename and
copy lines, plus an end-to-end test asserting a copy is observed (not
silently elided).
main() unconditionally read base/head versions before checking whether
any corpus pin changed. _read_version_at_ref raises if the regex misses
or if samcli/__init__.py is missing at the ref, so on every unrelated
PR — the common case — it could fail for unrelated reasons even though
the gate is supposed to be self-gating.

Now main() short-circuits with exit 0 the moment we determine no
corpus pin paths are in the diff. Added two tests: one asserting
_read_version_at_ref is never called when the diff has no corpus pin
changes, and a counterpart asserting it IS called (and the gate runs)
when a corpus pin does change.
The parser only rejected --check + --new and --diff + --new. Passing
both --check and --diff was accepted, with --diff winning on output.
The docstring describes them as alternatives, so reject the combination
at the parser — symmetric with the existing --new checks, and less
surprise.
Pull magic numbers (rename change count, argparse usage exit code) into
named constants, and let black collapse the adjacent-string formatting
on the new parser tests.
CI runs without AWS_DEFAULT_REGION and without ~/.aws/config; the
upstream SAM Translator's parameter SDK calls
``boto3.session.Session()`` and unconditionally raises
``NoRegionFound`` when the resulting session has no region — even
when AWS::Region is already in the parameter_values dict we pass.

Hand the Translator a session pinned to the same region we feed via
parameter_values (us-east-1, from
``IntrinsicsSymbolTable.DEFAULT_PSEUDO_PARAM_VALUES``). Pure
in-process state, no AWS calls. Confirmed by reproducing the CI
failure locally with ``env -u AWS_DEFAULT_REGION -u AWS_REGION
HOME=/tmp/no-aws-config pytest tests/golden`` — 16 failures match CI
without the fix; 43 pass with it.
…merge

The harness was producing pinned expected.{build,package}.yaml files that
showed the LE-EXPANDED template (AlphaFunction/BetaFunction as siblings),
but real `sam build` and `sam package` preserve the original Fn::ForEach
structure with merged artifact paths inside the body — that's the design
in PR #8637.

run_build_pipeline now delegates the LE merge to
``BuildContext._get_template_for_output`` (invoked via a __new__ shell
since the method only reads ``self.*`` for other methods on the same
class). run_package_pipeline mirrors PackageContext's LE flow:
``_export_global_artifacts_pass`` → LE expand → rewrite artifacts →
``merge_language_extensions_s3_uris`` to copy URIs back into the original
ForEach body.

The SAM transform is skipped on the LE branch — the body keeps
``AWS::Serverless::Function`` with ``CodeUri`` (matching real
sam build / sam package output, where the SAM transform runs server-side
at deploy time). Update the corresponding unit test assertions to verify
``Fn::ForEach::*`` is preserved instead of expanded.
…rving harness

Both files now show ``Fn::ForEach::Workers`` at the top of Resources with
the body resource's ``CodeUri`` rewritten to the build placeholder
(build) or an s3://golden-bucket/<sha256> URI (package). This matches
what real ``sam build`` / ``sam package`` write to disk for an LE
template — see PR #8637.
…ne reimplementation

The previous harness reimplemented pieces of the build/package flow
(running the SAM translator manually, walking artifact properties,
calling internal merge helpers directly), and that drift produced
two user-visible bugs in the pinned outputs:

  1. ``Fn::ForEach`` was replaced with the expanded form. Real
     ``sam build`` preserves the ForEach so CloudFormation can re-expand
     it server-side at deploy time (PR #8637).
  2. ``AWS::Serverless::Function`` got SAM-transformed to
     ``AWS::Lambda::Function`` (with an auto-generated IAM Role). Real
     ``sam build`` does NOT run the SAM transform; that happens
     server-side at deploy.

Both were symptoms of running steps SAM CLI itself does not run. Fix
the underlying issue by driving the same ``BuildContext`` and
``PackageContext`` classes the CLI instantiates and stubbing only the
network/compute boundaries:

- ``ApplicationBuilder.build`` returns an ``ApplicationBuildResult``
  with artifacts pointing at a sentinel directory under
  ``<build_dir>/__sam_golden_artifact__/<full_path>/``. The placeholder
  file's content is the resource's full path, which makes the
  downstream content-addressed S3 URI deterministic per resource.
- ``S3Uploader.upload`` / ``upload_with_dedup`` return
  ``s3://golden-bucket/<sha256>`` where the digest is taken over the
  uploaded file's content.

After build, the on-disk ``.aws-sam/build/template.yaml`` is read back
and any artifact-property string that resolves under the sentinel
directory is rewritten to ``<<BUILT_ARTIFACT>>``. Everything else
(LE expansion, ForEach merge, Mappings generation, AWS::Include
export) flows through the real CLI code paths.

Also: ``samcli.yamlhelper.yaml_parse`` mutates the global
``yaml.SafeLoader`` to emit ``OrderedDict``; once any caller imports
yamlhelper, every subsequent ``yaml.safe_load`` returns OrderedDict
and ``yaml.safe_dump`` rejects it. Coerce parsed templates to plain
``dict`` in both the harness and ``normalize`` so the test corpus and
pin regenerator see plain dicts.

Test signatures change: ``run_package_pipeline`` no longer takes a
``build_output`` dict (PackageContext re-loads the template from
disk); a new ``run_build_and_package`` helper produces both pins
from a single build for callers that need both.
Five sentinel pins now reflect what real ``sam build`` and
``sam package`` write to disk:

- ``sam_resources/serverless_function_zip/expected.build.yaml`` now
  contains ``AWS::Serverless::Function`` (no SAM transform), with
  ``CodeUri: <<BUILT_ARTIFACT>>`` and a SAM-CLI-added
  ``Metadata.SamResourceId``. The pre-existing auto-generated
  ``HelloFunctionRole`` is gone (the SAM transform runs server-side
  at deploy time, not at build time).
- ``sam_resources/serverless_function_zip/expected.package.yaml``
  preserves the same shape, with ``CodeUri`` rewritten to
  ``s3://golden-bucket/<sha>``.
- ``packageable_resources/lambda_function_zip/expected.build.yaml``
  now carries ``Metadata.SamResourceId`` (added by SAM CLI's stack
  provider for raw CFN Lambdas, too).
- ``packageable_resources/lambda_function_zip/expected.package.yaml``
  carries the same metadata; ``Code`` becomes the
  ``{S3Bucket, S3Key}`` dict shape that real ``sam package`` emits.
- ``language_extensions/foreach_static_zip/expected.package.yaml``
  digest changes because the package URI is now content-addressed
  over the harness's sentinel placeholder file (whose content is the
  resource full path) rather than a synthesized hash. ``Fn::ForEach``
  preservation is unchanged.
bnusunny added 3 commits June 2, 2026 06:10
Real `sam package` zips the build artifact directory before "uploading"
it. The harness's S3Uploader.upload stub was hashing the resulting zip
bytes — but raw zip bytes embed OS-specific metadata (DOS timestamps,
Unix mode bits, central directory ordering), so the same source content
produced different digests on Linux vs Windows. The Windows runner saw
the package pins regenerate with new hashes and the corpus tests failed
on byte-exact diff:

    -    CodeUri: s3://golden-bucket/dd2bf2d8...
    +    CodeUri: s3://golden-bucket/1c1f1e20...

Hash zip files via a normalized representation: sorted member names plus
each member's content sha256. The envelope metadata is excluded from the
digest, so identical source content produces an identical hash on every
OS. Non-zip artifacts continue to be hashed by raw bytes. Detection uses
zipfile.is_zipfile so .jar / .war and other zip-format artifacts are
treated the same as a Lambda zip.

Pins regenerated once with the new hash; from now on Linux and Windows
runners agree.
…lpath

Real `sam build` writes CodeUri/Code/etc. as a relative path from the
output template's directory to the artifact directory — for a top-level
resource the result is just the resource id (e.g. `CodeUri: HelloFunction`)
because both end up under .aws-sam/build/. The harness was substituting a
synthetic <<BUILT_ARTIFACT>> placeholder instead, which obscured what real
sam build actually produces and made the build pin disagree with the CLI's
own output.

Drop the placeholder rewriter entirely. Stage the case dir
(template.yaml + src/) into the temp tree before driving BuildContext,
and put .aws-sam/build/ inside the staged tree. ApplicationBuilder's
relpath then resolves to .aws-sam/build/<full_path> which move_template
collapses to <full_path> when writing to .aws-sam/build/template.yaml —
exactly what real sam build writes.

Net: harness drops the artifact-walker, the path-rewriter, the
RESOURCES_WITH_LOCAL_PATHS / RESOURCES_WITH_IMAGE_COMPONENT imports,
the BUILT_ARTIFACT_PLACEHOLDER constant, and the sentinel-dirname
indirection (~150 lines). All sentinel build pins now show the real
deterministic relpath.

Tests updated to assert the relpath shape; one test renamed for accuracy
(`test_build_cfn_case_replaces_code_with_placeholder` →
`test_build_cfn_case_keeps_lambda_function`). 42/42 pass under
CI-like env.
…erge queue

This repo uses GitHub's merge queue (per build.yml's existing
merge_group trigger). When a PR enters the queue, GitHub dispatches a
merge_group event against an ephemeral ref, not pull_request. A
required check that only declares on.pull_request never reports a
status on the merge_group event, so the merge queue treats it as
missing/pending and blocks the merge — the same skipped-but-required
failure mode the path-filter pitfall was about, just under a different
trigger.

Mirror build.yml's pattern: listen on both pull_request and
merge_group with the same branch filters. The check_semver_bump.py
script's self-gating (exit 0 when no expected.*.yaml changed) only
helps when the workflow runs at all; this fix makes it run.

Add a small "resolve base ref" step because the two event types
expose the target branch differently: pull_request sets
github.base_ref to the bare branch name; merge_group leaves
base_ref empty and exposes the full ref at
github.event.merge_group.base_ref (e.g. "refs/heads/develop"),
which we strip to match the pull_request shape before passing to
check_semver_bump.py.

Caught by review on PR #9057.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants