Skip to content

fix(buildlog): bump coder/coder to v2.33.2 to send agent token via header#170

Draft
deansheather wants to merge 2 commits intomainfrom
fix-buildlog-agentsdk-auth
Draft

fix(buildlog): bump coder/coder to v2.33.2 to send agent token via header#170
deansheather wants to merge 2 commits intomainfrom
fix-buildlog-agentsdk-auth

Conversation

@deansheather
Copy link
Copy Markdown
Member

Problem

When envbox starts in a Coder workspace pod, the buildlog client logs this on a tight retry loop:

{"level":"ERROR","msg":"connect err","caller":"buildlog/coder.go:119",
 "func":"github.com/coder/envbox/buildlog.newAgentClientV2",
 "fields":{"error":"GET https://<server>/api/v2/workspaceagents/me/rpc?version=2.2: unexpected status code 401: Cookie \"coder_session_token\" must be provided.: Try logging in using 'coder login'."}}

The workspace still comes up (the call site at cli/docker.go:199 deliberately doesn't fail startup on buildlog errors), but no envbox build logs are pushed and the error spam confuses users debugging real workspace problems.

Root cause

Envbox was pinned to github.com/coder/coder/v2 v2.24.4. In that version, agentsdk.connectRPCVersion authenticates the WebSocket upgrade via a cookie jar:

jar, _ := cookiejar.New(nil)
jar.SetCookies(rpcURL, []*http.Cookie{{Name: codersdk.SessionTokenCookie, Value: c.SDK.SessionToken()}})
httpClient := &http.Client{Jar: jar, Transport: c.SDK.HTTPClient.Transport}
conn, res, err := websocket.Dial(ctx, rpcURL.String(), &websocket.DialOptions{HTTPClient: httpClient})

The coder/websocket package does not honor http.Client.Jar during the upgrade — cookies from the jar are silently dropped, so the upgrade request reaches the server with no auth and the agent-token middleware returns 401.

Mainline coder/coder already fixed this — connectRPCVersion now passes the token via HTTPHeader: http.Header{codersdk.SessionTokenHeader: ...} in the websocket.DialOptions, which coder/websocket does honor. Verified in v2.33.2 at codersdk/agentsdk/agentsdk.go:349-350.

Changes

  1. Bump github.com/coder/coder/v2 from v2.24.4 → v2.33.2. This is past the SessionTokenProvider refactor (1354d84) and the functional-args refactor (5c2b9a5), both of which are required so agentsdk uses HTTPHeader for the upgrade.

  2. buildlog/coder.go: adopt the new agentsdk.New(serverURL, SessionTokenSetup, ...) signature:

    // before
    client := agentsdk.New(accessURL)
    client.SetSessionToken(token)
    // after
    client := agentsdk.New(accessURL, agentsdk.WithFixedToken(token))
  3. Migrate cdr.dev/slogcdr.dev/slog/v3 across the repo (12 files). Required because the new agentsdk.NewLogSender takes a cdr.dev/slog/v3.Logger, and buildlog/coder.go passes its logger straight through.

  4. Mirror coder/coder's replace directives for tailscale.com, github.com/tailscale/wireguard-go, and gvisor.dev so the v2.33.2 build resolves cleanly.

  5. Pin github.com/docker/cli to v27.4.1 via replace. docker/cli v29 (pulled in transitively by coder) imports the new github.com/moby/moby/client v0.3.0 module. The legacy github.com/moby/moby +incompatible module that ory/dockertest still drags in also exposes the same client package, and Go cannot disambiguate when the parent module is +incompatible (no go.mod). v27.4.1 is the version dockertest v3.12.0 itself requires and its cli/compose/loader doesn't import moby/moby/client, so the conflict goes away. Rationale is documented inline in go.mod.

Acceptance criteria

  • go.mod github.com/coder/coder/v2 bumped to a version where agentsdk.connectRPCVersion uses HTTPHeader for the session token (v2.33.2).
  • buildlog/coder.go updated for the new agentsdk.New signature.
  • go mod tidy clean.
  • go build ./... passes.
  • go vet ./... — no new warnings (the two pre-existing cli/docker.go cancel/leak warnings are unchanged from main).
  • go test ./... — all unit tests pass.
  • Live workspace verification not yet done in this PR — needs a workspace pod with CODER_AGENT_TOKEN and CODER_AGENT_URL set against a real Coder deployment. The buildlog client should NOT log connect err … coder_session_token must be provided on the retry loop; it should connect successfully (or, on a real network error, surface a more accurate error). Marking this as a draft so a maintainer can sanity-check the dep bump scope before we run that.
  • Integration suite (CODER_TEST_INTEGRATION=1 make test-integration) — not run locally because per AGENTS.md it requires a VM/physical machine and Docker socket access not available in my sandbox. Will rely on CI.

Out of scope

  • Changing the buildlog retry semantics or the --no-startup-log flag.
  • Removing the cookie-jar code from older agentsdk versions (that's a coder/coder repo concern, already fixed upstream).

…ader

The buildlog client was logging '401: coder_session_token must be provided'
on a tight retry loop because agentsdk's connectRPCVersion in v2.24.4
authenticated the WebSocket upgrade via http.Client.Jar, which the
coder/websocket client silently ignores during the upgrade.

Upstream v2.33.2 switched to passing the token via HTTPHeader on
websocket.DialOptions, which coder/websocket does honor.

Changes:
- Bump github.com/coder/coder/v2 from v2.24.4 to v2.33.2.
- Adopt the new agentsdk.New(serverURL, SessionTokenSetup, ...) signature
  by passing agentsdk.WithFixedToken(token); SetSessionToken is gone.
- Migrate cdr.dev/slog -> cdr.dev/slog/v3 across the repo (agentsdk now
  uses v3, and the buildlog package passes the logger through to
  agentsdk.NewLogSender).
- Mirror the relevant replace directives from coder/coder v2.33.2's
  go.mod (tailscale, wireguard-go, gvisor) so the build resolves.
- Pin docker/cli to v27.4.1 via replace. docker/cli v29 (pulled in
  transitively by coder) imports the new moby/moby/client v0 module,
  which conflicts with the legacy moby/moby +incompatible module that
  ory/dockertest still drags in. v27.4.1 is the version dockertest
  itself requires and avoids the ambiguous import.
The official v1.64.8 pre-built binary is built with Go 1.24 and refuses
to load a config when go.mod's go directive is >= 1.25. The coder/coder
v2.33.2 bump pushed envbox's go.mod to go 1.25.9, breaking lint.

Use install-mode: goinstall on golangci-lint-action so the runner
builds the same v1.64.8 from source with the Go version we installed
(now bumped to ~1.25). Other jobs are left on ~1.24 because Go's
toolchain directive auto-downloads a matching toolchain at build time;
golangci-lint does not use that mechanism.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant