Skip to content

feat(gateway): add system registry support and source indicators#1625

Merged
johntmyers merged 8 commits into
NVIDIA:mainfrom
alexclewontin:system-gateway-dir
Jun 11, 2026
Merged

feat(gateway): add system registry support and source indicators#1625
johntmyers merged 8 commits into
NVIDIA:mainfrom
alexclewontin:system-gateway-dir

Conversation

@alexclewontin

Copy link
Copy Markdown
Contributor

Summary

Add a system-managed gateway registry under /etc/openshell and surface whether each gateway comes from user or system config in the CLI and TUI.

Related Issue

N/A

Changes

  • add bootstrap path handling for installer-provided gateway metadata and active-gateway fallback under /etc/openshell, with OPENSHELL_SYSTEM_GATEWAY_DIR as an override
  • make per-user gateways shadow system entries, and keep system registrations read-only from the CLI
  • show gateway config source in openshell gateway list, list JSON output, and openshell term
  • add CLI smoke coverage for gateway list table and JSON source rendering
  • update gateway architecture and manage-gateways docs for the new registry and source indicators

Testing

  • mise run pre-commit passes
  • Unit tests added/updated
  • E2E tests added/updated (if applicable)

Checklist

  • Follows Conventional Commits
  • Commits are signed off (DCO)
  • Architecture docs updated (if applicable)

@copy-pr-bot

copy-pr-bot Bot commented May 28, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@johntmyers

Copy link
Copy Markdown
Collaborator

gator-agent

/ok to test d1736c7

@johntmyers johntmyers added gator:blocked Gator is blocked by process or repository gates test:e2e Requires end-to-end coverage labels Jun 4, 2026
@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown

Label test:e2e applied, but pull-request/1625 is at {"messa while the PR head is d1736c7. A maintainer needs to comment /ok to test d1736c73fa7558b48ac1d31f9e1e295342344c09 to refresh the mirror. Once the mirror catches up, re-run Branch E2E Checks from the Actions tab.

@johntmyers

Copy link
Copy Markdown
Collaborator

/ok to test d1736c7

@johntmyers

Copy link
Copy Markdown
Collaborator

gator-agent

PR Review Status

Validation: project-valid as a concentrated gateway/CLI/TUI packaging-config improvement. The PR has a clear user path for installer-seeded system gateways, user entries shadowing system entries, read-only system registrations, and source indicators in CLI/TUI surfaces.
Head SHA: d1736c73fa7558b48ac1d31f9e1e295342344c09

Review findings:

  • Blocking: crates/openshell-bootstrap/src/metadata.rs in list_gateways_with_source only records a gateway name as shadowed after user metadata parses successfully. A corrupt or unreadable user gateways/<name>/metadata.json can therefore allow the same-named system gateway to appear in gateway list as system, which contradicts the stated user-over-system shadowing contract and differs from load_gateway_metadata, which resolves the user path first. Please mark the user directory name as seen before parsing, or otherwise ensure invalid user entries cannot reveal same-named system entries, and add a regression test for corrupt user metadata shadowing a valid system entry.
  • Process blocker: GitHub REST still reports mergeable_state: dirty; the PR needs to be updated with main or otherwise resolve merge conflicts.

Docs: direct UX changes are documented in docs/sandboxes/manage-gateways.mdx; no docs/index.yml navigation change appears needed because this updates an existing page.

Tests: applied test:e2e; test:e2e-gpu and test:e2e-kubernetes are not required for this change. I verified maintainer authority and posted /ok to test d1736c73fa7558b48ac1d31f9e1e295342344c09 for the current head SHA after the label-helper requested it.

Next state: gator:blocked until merge conflicts and the blocking review finding are resolved.

@johntmyers

Copy link
Copy Markdown
Collaborator

gator-agent

CI Update

Branch Checks are now failing on the current head SHA d1736c73fa7558b48ac1d31f9e1e295342344c09:

  • Rust (linux-amd64-cpu8) and Rust (linux-arm64-cpu8) fail in mise run rust:format:check; cargo fmt --all -- --check reports formatting diffs in the changed Rust files.
  • mise Lockfile fails because mise lock rewrites mise.lock for updated Python 3.14.5 standalone artifact URLs/checksums.

This still leaves the PR in gator:blocked along with the existing merge-conflict state and the blocking review finding. E2E is still running, with at least one Kubernetes smoke job already failed; I will keep monitoring the workflow result.

@johntmyers

Copy link
Copy Markdown
Collaborator

gator-agent

E2E Update

The required OpenShell / E2E status has now concluded failure for head SHA d1736c73fa7558b48ac1d31f9e1e295342344c09.

Observed failures:

  • kubernetes-e2e / Kubernetes E2E (Rust smoke) fails during tool setup because github:mozilla/sccache@0.14.0 is not in the lockfile, consistent with the stale mise.lock failure in Branch Checks.
  • e2e / E2E (python) has broad sandbox/policy failures with repeated supervisor relay h2 protocol error stream warnings and many tests failing soon after startup.
  • e2e / E2E (rust-docker) and e2e / E2E (rust-podman) passed.

The PR remains in gator:blocked: it still reports merge conflicts, has the blocking list_gateways_with_source review finding, and now has failed required Branch Checks/E2E. The next useful action is still for the branch author to update the PR with main, fix formatting/lockfile drift, and address the review finding.

@alexclewontin

Copy link
Copy Markdown
Contributor Author

@johntmyers I've rebased on main and fixed the formatting, then added a commit to ensure that broken user config still shadows system config. Locally, all the e2e failures I see are reproducible on both main and this branch, so I am not sure they are being introduced by these changes

@johntmyers

Copy link
Copy Markdown
Collaborator

/ok to test 5b2bb33

@johntmyers

Copy link
Copy Markdown
Collaborator

gator-agent

PR Review Status

Validation: still project-valid as a concentrated gateway/CLI/TUI packaging-config improvement. The PR has a clear user path for installer-seeded system gateways, user entries shadowing system entries, read-only system registrations, and source indicators in CLI/TUI surfaces.
Head SHA: 5b2bb335302864be7b8ed82f1e28d99e3e3d3cbb

Review findings:

  • Blocking: crates/openshell-bootstrap/src/metadata.rs in list_gateways_with_source now records a user gateway directory as seen before checking whether metadata.json exists. That fixes corrupt user metadata shadowing, but an empty or stale user gateways/<name>/ directory can still hide a valid system gateway. This is reachable after remove_gateway_metadata, which removes the file but not necessarily the directory. Please only shadow system config when the user metadata.json exists or cannot be statted, and add a regression test where an empty user gateway directory does not hide a valid system gateway.
  • Blocking: crates/openshell-bootstrap/src/paths.rs accepts OPENSHELL_SYSTEM_GATEWAY_DIR verbatim, including empty or relative values. Because this is presented as a package-managed system registry and can provide the active-gateway fallback, please require a non-empty absolute override path, or provide an explicit maintainer rationale for allowing relative/current-directory system registry lookup.

Docs: direct UX changes are documented in docs/sandboxes/manage-gateways.mdx; no docs/index.yml navigation change appears needed because this updates an existing page.

Tests: test:e2e remains applied. I verified maintainer authority and posted /ok to test 5b2bb335302864be7b8ed82f1e28d99e3e3d3cbb for the new head SHA; Branch Checks and E2E are now running, Helm Lint is green, and GPU E2E is not required.

Next state: gator:in-review while the blocking review feedback is addressed.

@johntmyers johntmyers added gator:in-review Gator is reviewing or awaiting PR review feedback gator:blocked Gator is blocked by process or repository gates and removed gator:blocked Gator is blocked by process or repository gates gator:in-review Gator is reviewing or awaiting PR review feedback labels Jun 4, 2026
@johntmyers

Copy link
Copy Markdown
Collaborator

gator-agent

Gate Update

Required checks are now green for head SHA 5b2bb335302864be7b8ed82f1e28d99e3e3d3cbb: Branch Checks, Helm Lint, and E2E all pass.

The PR remains blocked because GitHub REST still reports mergeable_state: dirty, so the branch needs to be updated with main or otherwise have merge conflicts resolved. The two blocking review findings from the latest gator review also still need author response or code changes:

  • empty/stale user gateway directories should not hide a valid system gateway unless user metadata.json exists or cannot be statted
  • OPENSHELL_SYSTEM_GATEWAY_DIR should reject empty or relative override paths, or have an explicit maintainer rationale for allowing them

Next state: gator:blocked until the merge conflict and blocking review feedback are resolved.

@johntmyers

Copy link
Copy Markdown
Collaborator

gator-agent

Blocker Follow-Up Nudge

This PR is still blocked after more than 48 business hours.

Current blockers:

  • GitHub REST still reports mergeable_state: dirty, so the branch needs to be updated with main or otherwise have merge conflicts resolved.
  • The latest gator review findings still need author response or code changes: empty/stale user gateway directories should not hide a valid system gateway, and OPENSHELL_SYSTEM_GATEWAY_DIR should reject empty or relative override paths or have explicit maintainer rationale.

Checks are currently green for head SHA 5b2bb335302864be7b8ed82f1e28d99e3e3d3cbb.

Next action: @alexclewontin, please update the branch and address or respond to the remaining review feedback.

@johntmyers

Copy link
Copy Markdown
Collaborator

/ok to test 85ea854

@johntmyers

Copy link
Copy Markdown
Collaborator

gator-agent

Re-check After Author Update

I re-evaluated latest head 85ea854d33003808c85b89fa49af0d1dc28b0c4e after @alexclewontin pushed an update on 2026-06-09.

Disposition: partially resolved.

Remaining items:

  • Resolved: the previous code review blockers are addressed. Empty/stale user gateway directories no longer hide valid system metadata, malformed user metadata still shadows same-named system metadata, and OPENSHELL_SYSTEM_GATEWAY_DIR now ignores empty or relative override values with warnings.
  • Still blocked: GitHub REST reports mergeable_state: dirty, so the branch still needs to be updated with main or otherwise have merge conflicts resolved.
  • Checks: required Branch Checks, Helm Lint, and E2E are pending on this new head because the copy-pr mirror was waiting for /ok to test; I posted /ok to test 85ea854d33003808c85b89fa49af0d1dc28b0c4e as a standalone command.

Next state: gator:blocked until the merge conflict is resolved. Once the branch is mergeable and required checks complete, gator can move this back to pipeline watch or maintainer approval as appropriate.

@johntmyers

Copy link
Copy Markdown
Collaborator

gator-agent

Gate Update

Current head: 85ea854d33003808c85b89fa49af0d1dc28b0c4e.

State remains gator:blocked.

  • Mergeability: GitHub REST still reports mergeable_state: dirty, so the branch still needs to be updated with main or otherwise have merge conflicts resolved.
  • Branch Checks: passed on this head.
  • Helm Lint: failed before linting in Detect Helm changes while running ./.github/actions/pr-merge-base; the job hit HTTP 401: Requires authentication from a GraphQL-backed gh pr view call.
  • E2E: still queued/running for this head.

Next action: keep this PR blocked while the merge conflict and current required-check state are unresolved; gator will continue watching.

@johntmyers

Copy link
Copy Markdown
Collaborator

gator-agent

Gate Update

Current head: 85ea854d33003808c85b89fa49af0d1dc28b0c4e.

State remains gator:blocked.

  • Mergeability: GitHub REST still reports mergeable_state: dirty, so the branch still needs to be updated with main or otherwise have merge conflicts resolved.
  • Branch Checks: passed on this head.
  • E2E: passed on this head.
  • Helm Lint: failed before linting in Detect Helm changes; the setup action runs gh pr view, which hit HTTP 401: Requires authentication from https://api.github.com/graphql. This appears to be a workflow/auth setup failure rather than a Helm lint regression in this PR.

Next action: @alexclewontin, please resolve the merge conflict. Once the branch is mergeable, gator can re-check the required Helm gate and move the PR forward if no blockers remain.

@johntmyers

Copy link
Copy Markdown
Collaborator

/ok to test 23aec7a

@johntmyers

Copy link
Copy Markdown
Collaborator

gator-agent

Re-check After Author Update

I re-evaluated latest head 23aec7a9dac9ebdfaa78cd909045185bfcfd634c after @alexclewontin pushed the June 11 update.

Disposition: partially resolved.

Remaining items:

  • Resolved: the prior code review blockers are addressed. Empty/stale user gateway directories no longer hide valid system metadata, malformed user metadata still shadows same-named system metadata, and OPENSHELL_SYSTEM_GATEWAY_DIR now ignores empty or relative override values with warnings. The PR is mergeable against current main; the previous dirty merge-conflict blocker is no longer present.
  • Still needs author follow-up: docs/reference/gateway-auth.mdx still describes only the user config active gateway and metadata layout. Please update that reference to mention the /etc/openshell system fallback for active_gateway and gateways/<name>/metadata.json.
  • Still needs author follow-up: please add direct CLI smoke or unit coverage for rejecting openshell gateway remove <system-only-name> while preserving the system entry, or get a maintainer waiver for that test gap.
  • Checks: OpenShell / Branch Checks, OpenShell / Helm Lint, and OpenShell / E2E are pending with “Waiting for /ok to test mirror”; I posted /ok to test 23aec7a9dac9ebdfaa78cd909045185bfcfd634c as a standalone command. DCO and required-status publication are green.

Next state: gator:in-review while the remaining docs/test feedback is addressed and the freshly authorized checks run.

@johntmyers johntmyers added gator:in-review Gator is reviewing or awaiting PR review feedback and removed gator:blocked Gator is blocked by process or repository gates labels Jun 11, 2026
@johntmyers

Copy link
Copy Markdown
Collaborator

gator-agent

Review Re-check

I re-checked current head 23aec7a9dac9ebdfaa78cd909045185bfcfd634c with an independent code review pass.

Validation: still project-valid as a focused gateway/CLI/TUI packaging-config improvement.

Disposition: still needs author follow-up.

Blocking review findings:

  • crates/openshell-tui/src/lib.rs:501 still routes every non-OIDC gateway through build_mtls_channel(). A package-managed plaintext gateway such as http://127.0.0.1:17670 can appear in the TUI, but switching to it from openshell term will try to load mTLS material instead of opening a plaintext channel. Please branch on auth_mode == Some("plaintext") or http:// endpoints and build a plain channel, matching CLI startup behavior.
  • crates/openshell-cli/src/run.rs:945, crates/openshell-cli/src/run.rs:1136, and crates/openshell-cli/src/run.rs:1438 still snapshot/restore rollback state through load_active_gateway(). Because that function now falls back to /etc/openshell/active_gateway, a failed OIDC/cloud gateway add can persist the system fallback into the user active_gateway file. Please snapshot only the user active-gateway file for rollback and restore it only when a user value previously existed.

Remaining follow-up from the prior gator comment:

  • docs/reference/gateway-auth.mdx still needs the /etc/openshell active-gateway and gateways/<name>/metadata.json fallback layout.
  • Please add direct CLI smoke or unit coverage for rejecting openshell gateway remove <system-only-name> while preserving the system entry, or get a maintainer waiver for that test gap.

Checks: required OpenShell / Branch Checks, OpenShell / Helm Lint, and OpenShell / E2E are green for this head. test:e2e remains appropriate; test:e2e-gpu and test:e2e-kubernetes are not required for this change.

Next state: gator:in-review while the blocking review feedback is addressed.

Adds a read-only installer-seeded gateway registry that the CLI consults after per-user gateway config. The registry uses the same layout as per-user config with `active_gateway` at the root and `gateways/<name>/metadata.json` beneath it. By default the system config root is `/etc/openshell`, while `OPENSHELL_SYSTEM_GATEWAY_DIR` remains available as an override for packages that need a different location. User-managed gateways continue to shadow installer entries on name collision.

Originally-authored-by: Mark Shuttleworth <mark@ubuntu.com>
Signed-off-by: Alex Lewontin <alex.lewontin@canonical.com>
Expose whether a gateway registration comes from user or system config in `openshell gateway list`, the TUI gateway pane, and list JSON output. The CLI also refuses to remove system-managed registrations and the smoke tests cover the new list output.

Signed-off-by: Alex Lewontin <alex.lewontin@canonical.com>
Signed-off-by: Alex Lewontin <alex.lewontin@canonical.com>
@alexclewontin alexclewontin force-pushed the system-gateway-dir branch 2 times, most recently from 46536d0 to 033ee36 Compare June 11, 2026 15:48
@alexclewontin

alexclewontin commented Jun 11, 2026

Copy link
Copy Markdown
Contributor Author

Addressed, though I don't think the TUI plaintext falling back to mTLS is strictly related to this feature, so please let me know if I should split that into a separate PR

@johntmyers

Copy link
Copy Markdown
Collaborator

gator-agent

Re-check After Author Update

I re-evaluated latest head 033ee36fa9ea74e17f461996a0820c63c392861a after @alexclewontin's 2026-06-11 comment that the prior items were addressed.

Disposition: partially resolved.

Remaining items:

  • Resolved: the prior docs/test and rollback/TUI plaintext items are addressed. docs/reference/gateway-auth.mdx now documents the /etc/openshell active-gateway and metadata fallback layout, gateway remove <system-only-name> has direct smoke coverage that verifies the system entry is preserved, rollback now snapshots only the user active-gateway file, and plaintext TUI switching is no longer routed through mTLS.
  • Blocking: gateway names are still joined directly into user and system metadata paths in crates/openshell-bootstrap/src/metadata.rs. Because names can come from -g, OPENSHELL_GATEWAY, or active_gateway, please validate gateway names as a single path component before constructing metadata, last-sandbox, active/write, and system lookup paths. This is a CWE-22 path traversal hardening issue for the new system registry surface.
  • Blocking: crates/openshell-tui/src/lib.rs still routes auth_mode = "cloudflare_jwt" through the mTLS branch when switching gateways inside openshell term. Initial TUI launch supports edge auth through the CLI channel builder, but selecting an edge-authenticated gateway from the TUI will fail with missing mTLS certs. Please add an explicit edge mode that loads the edge token and builds the appropriate channel/interceptor, or return a clear unsupported error instead of falling through to mTLS.
  • Suggestion: openshell gateway select still omits the SOURCE column even though system entries are selectable and read-only. Showing source there would reduce user surprise, but I am not treating this as blocking.

Checks: DCOAssistant and required-status publication are green on this head. Required OpenShell / Branch Checks, OpenShell / Helm Lint, and OpenShell / E2E are still pending with “Waiting for /ok to test mirror”; test:e2e remains appropriate, and GPU/Kubernetes-specific labels are not required.

Next state: gator:in-review while the blocking review feedback is addressed.

@alexclewontin

alexclewontin commented Jun 11, 2026

Copy link
Copy Markdown
Contributor Author

As far as I understand, adding jwt support to the TUI is a more extensive change than adding the plaintext support, and both are issues that pre-date this PR (there will be failures there whether the config comes from the user config or the system config). In the interests of keeping this targeted, I would suggest that be treated as a separate issue, and not block this PR

@johntmyers

Copy link
Copy Markdown
Collaborator

/ok to test acefaad

@johntmyers

Copy link
Copy Markdown
Collaborator

gator-agent

Re-check After Author Update

I re-evaluated latest head acefaadae2888fd6522aed0977386772b95864ac after @alexclewontin's 2026-06-11 comment that Cloudflare JWT TUI switching should be tracked separately, and after the follow-up force-push.

Disposition: partially resolved.

Remaining items:

  • Resolved: the CWE-22 gateway-name path hardening blocker is addressed. The current head validates gateway names as a single path component before user/system metadata, active-gateway, token, last-sandbox, and mTLS write path joins. The independent reviewer found no blocking path-hardening findings on this head.
  • Non-blocking follow-up: I agree the cloudflare_jwt in-TUI gateway switching gap is pre-existing and can be tracked separately unless system-seeded Cloudflare gateways become an intended package-managed case.
  • Blocking: the current head no longer contains the earlier plaintext TUI switching fix. crates/openshell-tui/src/lib.rs still only special-cases OIDC and otherwise routes gateway switches through build_mtls_channel(). Because this PR exposes system/plaintext entries in the TUI gateway picker, selecting a plaintext gateway from openshell term would try to load mTLS material instead of opening a plaintext channel. Please restore the plaintext branch for auth_mode == "plaintext" or http:// endpoints, matching CLI startup behavior.

Checks: test:e2e remains appropriate. Required OpenShell / Branch Checks, OpenShell / Helm Lint, and OpenShell / E2E were waiting for the copy-pr mirror on this head; I posted /ok to test acefaadae2888fd6522aed0977386772b95864ac as a standalone command. GPU E2E is not required.

Next state: gator:in-review while the plaintext TUI switching blocker is addressed and the freshly authorized checks run.

@alexclewontin

Copy link
Copy Markdown
Contributor Author

Blocking: the current head no longer contains the earlier plaintext TUI switching fix. crates/openshell-tui/src/lib.rs still only special-cases OIDC and otherwise routes gateway switches through build_mtls_channel(). Because this PR exposes system/plaintext entries in the TUI gateway picker, selecting a plaintext gateway from openshell term would try to load mTLS material instead of opening a plaintext channel. Please restore the plaintext branch for auth_mode == "plaintext" or http:// endpoints, matching CLI startup behavior.

The plaintext gap is also pre-existing, and such config could previously be introduced via the user gateways directory, and system introduced gateways are not inherently plaintext. Both jwt and plaintext will be addressed in #1876

@johntmyers johntmyers added gator:approval-needed Gator completed review; maintainer approval needed and removed gator:in-review Gator is reviewing or awaiting PR review feedback labels Jun 11, 2026
@johntmyers

Copy link
Copy Markdown
Collaborator

gator-agent

Re-check After Author Update

I re-evaluated latest head acefaadae2888fd6522aed0977386772b95864ac after @alexclewontin's 2026-06-11 comment that the plaintext/JWT TUI switching issue is pre-existing and should be handled in PR #1876.

Disposition: resolved for this PR.

Remaining items:

Maintainer Approval Needed

Gator validation and PR monitoring are complete.

Validation: project-valid as a focused gateway registry, CLI/TUI source indicator, and packaging-config improvement.
Review: independent code-only re-check found no remaining blocking findings for this PR.
Docs: gateway architecture and Fern gateway-auth/manage-gateways docs are updated for the /etc/openshell system registry and source indicators.
Checks: required Branch Checks, Helm Lint, and E2E are green for acefaadae2888fd6522aed0977386772b95864ac.
E2E: test:e2e is applied and passed; GPU and Kubernetes-specific E2E are not required for this change.

Human maintainer approval or merge decision is now required.

Next state: gator:approval-needed.

@johntmyers johntmyers merged commit fb83d1a into NVIDIA:main Jun 11, 2026
43 of 44 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

gator:approval-needed Gator completed review; maintainer approval needed test:e2e Requires end-to-end coverage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants