refactor(run-control): unify target resolution and SSE handling by dgageot · Pull Request #2731 · docker/docker-agent

dgageot · 2026-05-09T12:55:18Z

Continues the work merged in #2714 by simplifying and hardening the run
control plane (run --listen, send, watch, proto, serve mcp --attach).

What this changes

One target resolver everywhere. New runregistry.Find(target) is the
single entry point used by send, watch, proto, and serve mcp --attach. It resolves an empty target to the latest live run, a numeric
target by PID, an http(s)://host:port target by address, and anything
else as a (partial) session id. Exact session-id matches always win over
substring matches, and ambiguous substring matches return a clear error
instead of silently picking one. Replaces the per-command resolveTarget
and the previous PID-only behaviour of --to/--attach.
Shared SSE helper. cmd/root/sse.go collapses the duplicated SSE
client code from watch.go and proto.go into openEventStream +
readEventStream. Payloads are surfaced as json.RawMessage so consumers
can either forward the bytes verbatim or decode once.
- proto now forwards the raw event bytes instead of doing a redundant
  unmarshal/marshal round-trip.
- Error responses are read with io.LimitReader(resp.Body, 4 KiB) so a
  misbehaving server can't push an unbounded payload into client memory.
- readEventStream returns ctx.Err() on cancellation; watch maps
  context.Canceled back to a clean exit so Ctrl+C doesn't print a
  spurious error.
No double responses in proto. dispatchProto reports whether it
already produced a typed reply (e.g. transcript); the generic ack is
skipped in that case.
Atomic registry writes. runregistry.Write now writes through a
sibling temp file + Rename so concurrent readers never see torn JSON.
Latest is simplified with slices.MaxFunc.
Cross-platform event hooks. --on-event shell commands now go through
shellpath.DetectShell() (PowerShell on Windows, $SHELL on Unix)
instead of hard-coded sh -c. Captured stdout/stderr is logged on
failure, and the use of context.Background() is documented (hooks
intentionally outlive app shutdown so the last event still flushes).

Tests

runregistry covers Find for pid / addr / session id, ambiguity, the
exact-match-vs-substring regression, and the empty-registry case.
cmd/root/sse_test.go covers happy path, error-body cap, and ctx
cancellation.
Existing tests updated for the new dispatchProto signature.

task lint and task test are green.

docker-agent

Assessment: 🟡 NEEDS ATTENTION

Two medium-severity bugs found in the new code. No high-severity issues.

docker-agent · 2026-05-09T12:59:18Z

+				return r, nil
+			}
+		}
+		return Record{}, fmt.Errorf("no live run with pid %d", pid)


[MEDIUM] ErrNoRun sentinel not returned for "target not found" — docstring/implementation mismatch

The doc comment on ErrNoRun states:

"ErrNoRun is returned when no live run can be found that satisfies the caller's request (empty registry, or no record matches the target)."

But the implementation only returns ErrNoRun in the two empty-registry paths (lines 160 and 170). When the registry is non-empty and the specific target is not found, all three miss-paths (pid, addr, session-id) return plain fmt.Errorf(...) values that are not wrapped with ErrNoRun.

// pid miss — plain error, not ErrNoRun return Record{}, fmt.Errorf("no live run with pid %d", pid) // addr miss — plain error, not ErrNoRun return Record{}, fmt.Errorf("no live run at %s", target) // session miss — plain error, not ErrNoRun return Record{}, fmt.Errorf("no live run matches %q ...", target)

Any caller that does errors.Is(err, runregistry.ErrNoRun) to detect "no matching run" will silently miss these three cases. Given the docstring promises the sentinel for all "not found" outcomes, these paths should either wrap with fmt.Errorf("…: %w", ErrNoRun) or the docstring should be corrected to document the current (narrower) behaviour.

docker-agent · 2026-05-09T12:59:18Z

 	cmd.Stdin = bytes.NewReader(payload)
-	if err := cmd.Run(); err != nil {
-		slog.Warn("on-event hook failed", "command", command, "error", err)
+	out, err := cmd.CombinedOutput()


[MEDIUM] CombinedOutput() is incompatible with "tail-f-style listener" hooks — goroutine leak

cmd.CombinedOutput() blocks until the subprocess exits and its stdout/stderr reaches EOF. A "tail-f-style listener" (the example in the comment directly above this line) never exits, so this call would block indefinitely.

Because the caller spawns a fresh goroutine per event (go runEventHook(...)), every event that fires while a long-running hook is active leaks another goroutine. CombinedOutput() also buffers all subprocess output in memory, so an infinite-output hook also causes unbounded heap growth.

The comment explicitly documents the intent that such hooks should be allowed to run, but the implementation contradicts that intent. Consider using cmd.Start() + non-blocking reads (or simply discarding output for "fire-and-forget" hooks) so the goroutine can finish promptly without waiting for process exit.

dgageot · 2026-05-09T13:44:52Z

Addressed review feedback:

runregistry.ErrNoRun — wrapped the three "target not found" errors (pid miss, addr miss, session-id miss) with fmt.Errorf("...: %w", ErrNoRun) so callers using errors.Is(err, ErrNoRun) actually catch them. The docstring now matches the implementation. Tests in TestFind were extended with assert.ErrorIs for each miss path.
runEventHook — replaced cmd.CombinedOutput() with a boundedBuffer capped at 4 KiB (matches the SSE error-body cap). A chatty or long-running hook can no longer push unbounded data into the agent's heap, while we still keep enough output for the failure log. The misleading "tail-f-style" wording in the comment is gone — each invocation receives a single event on stdin and exits, and the comment now says exactly that. Added TestBoundedBuffer_CapsAtMaxHookOutput.

task lint and task test are green.

dgageot requested a review from a team as a code owner May 9, 2026 12:55

docker-agent reviewed May 9, 2026

View reviewed changes

dgageot added 2 commits May 9, 2026 15:35

simplify run control plane: shared SSE helper and runregistry.Find

4366414

tighten Find resolution and SSE error handling

ad7b51c

dgageot force-pushed the board/588d847ac6fc2a26 branch from 90e088b to ad7b51c Compare May 9, 2026 13:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(run-control): unify target resolution and SSE handling#2731

refactor(run-control): unify target resolution and SSE handling#2731
dgageot wants to merge 2 commits intodocker:mainfrom
dgageot:board/588d847ac6fc2a26

dgageot commented May 9, 2026

Uh oh!

docker-agent left a comment

Uh oh!

docker-agent May 9, 2026

Uh oh!

docker-agent May 9, 2026

Uh oh!

dgageot commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dgageot commented May 9, 2026

What this changes

Tests

Uh oh!

docker-agent left a comment

Choose a reason for hiding this comment

Assessment: 🟡 NEEDS ATTENTION

Uh oh!

docker-agent May 9, 2026

Choose a reason for hiding this comment

Uh oh!

docker-agent May 9, 2026

Choose a reason for hiding this comment

Uh oh!

dgageot commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants