Skip to content

Commit a4ce95e

Browse files
committed
Merge remote-tracking branch 'upstream/main' into feat/otel-genai-semconv
2 parents 2f686d0 + f2be0bf commit a4ce95e

44 files changed

Lines changed: 4180 additions & 105 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.gitignore

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,3 +23,8 @@ vendor
2323
/cagent-*
2424
/docker-mcp-*
2525
docker-agent
26+
27+
# Generated wasm artifacts (built locally; see cmd/wasm/README.md)
28+
/wasm
29+
cmd/wasm/web/docker-agent.wasm
30+
cmd/wasm/web/wasm_exec.js

cmd/wasm/README.md

Lines changed: 215 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,215 @@
1+
# docker-agent in the browser (js/wasm)
2+
3+
`cmd/wasm` is a `GOOS=js GOARCH=wasm` entry point that exposes a thin slice
4+
of docker-agent — config parsing and a single-round streaming chat — to a
5+
JavaScript host (a browser tab or Node).
6+
7+
It is **not** a port of the full agent. It is a proof-of-concept for the
8+
"realistic plan" outlined when we surveyed which parts of docker-agent could
9+
even be cross-compiled to wasm. See the *Limits* section below.
10+
11+
## Build
12+
13+
```sh
14+
# Compile.
15+
GOOS=js GOARCH=wasm go build -o cmd/wasm/web/docker-agent.wasm ./cmd/wasm
16+
17+
# Copy the matching wasm_exec.js shim from the Go toolchain (its API is
18+
# tied to the compiler version and must match the binary).
19+
cp "$(go env GOROOT)/lib/wasm/wasm_exec.js" cmd/wasm/web/wasm_exec.js
20+
```
21+
22+
The output is a ~75 MB `.wasm`. It includes the entire YAML parser, three LLM
23+
provider clients (OpenAI, Anthropic, Google) and their dependencies. With
24+
`tinygo` or `-ldflags="-s -w"` plus `wasm-opt` you can roughly halve it; we
25+
have not optimised the size.
26+
27+
## Run (Node smoke test)
28+
29+
```sh
30+
node cmd/wasm/smoke_test.js
31+
```
32+
33+
Prints the parsed config of a small two-agent YAML and exits 0. This proves
34+
that the Go runtime starts under wasm, that `globalThis.dockerAgent` gets
35+
registered, and that `parseConfig` returns the expected shape.
36+
37+
## Run (browser, with OpenRouter sign-in)
38+
39+
The demo page implements OpenRouter's [PKCE OAuth flow](https://openrouter.ai/docs/use-cases/oauth-pkce):
40+
41+
1. Serve `cmd/wasm/web/` over HTTP (`WebAssembly.instantiateStreaming`
42+
needs the `application/wasm` MIME type, so `file://` won't work):
43+
44+
```sh
45+
cd cmd/wasm/web && python3 -m http.server 8765
46+
```
47+
48+
2. Open <http://localhost:8765/>.
49+
50+
3. Click **Sign in with OpenRouter** — you'll be redirected to
51+
`openrouter.ai`, log in, approve the app, and bounced back. The page
52+
exchanges the `?code=` for a user-controlled API key (PKCE / S256) and
53+
stores it in `localStorage`.
54+
55+
4. Pick a free model from the dropdown (the YAML textarea updates
56+
automatically) and click **Run**. Streaming completion deltas appear in
57+
the output box.
58+
59+
5. To revoke the key: click **Sign out** in the page (clears local copy)
60+
and/or visit
61+
<https://openrouter.ai/settings/keys> to revoke server-side.
62+
63+
### Why this works in a browser
64+
65+
Most LLM providers block direct browser fetches because anyone could read
66+
the API key out of the Network tab. OpenRouter solves this by issuing
67+
*per-app, per-user* keys via PKCE — the user owns the key, can revoke it,
68+
and the app never sees a master credential.
69+
70+
We verified the relevant CORS posture before shipping:
71+
72+
```
73+
$ curl -is -X OPTIONS https://openrouter.ai/api/v1/auth/keys \
74+
-H "Origin: http://localhost:8765" \
75+
-H "Access-Control-Request-Method: POST"
76+
HTTP/2 204
77+
access-control-allow-origin: *
78+
access-control-allow-headers: Authorization,...,Content-Type,...
79+
```
80+
81+
Both `/api/v1/auth/keys` (token exchange) and `/api/v1/chat/completions`
82+
(inference) return `access-control-allow-origin: *` with `Authorization`
83+
in the allowed headers. No proxy needed.
84+
85+
### What the YAML looks like
86+
87+
OpenRouter exposes an OpenAI-compatible API, so it slots in as a custom
88+
provider:
89+
90+
```yaml
91+
providers:
92+
openrouter:
93+
provider: openai
94+
base_url: https://openrouter.ai/api/v1
95+
token_key: OPENROUTER_API_KEY
96+
agents:
97+
root:
98+
model: openrouter/meta-llama/llama-3.3-70b-instruct:free
99+
instruction: |
100+
You are a helpful assistant ...
101+
```
102+
103+
When the user clicks **Run**, the page passes the stored key as
104+
`env.OPENROUTER_API_KEY` to `dockerAgent.chat(...)`, the existing
105+
`pkg/model/provider/openai` reads it via `env.Get(ctx, cfg.TokenKey)`, and
106+
the Go HTTP transport (mapped to `fetch` under js/wasm) sends the
107+
request. **Same code path the CLI uses** — no special browser-only branch.
108+
109+
### Bring-your-own-key fallback
110+
111+
The `<details>` block on the page lets advanced users paste OpenAI /
112+
Anthropic / Gemini keys directly. Anthropic recently added
113+
`anthropic-dangerous-direct-browser-access` so it actually works from a
114+
tab; OpenAI and Gemini still block CORS for browser origins, so those
115+
fields are mostly there for use against a self-hosted proxy.
116+
117+
## JavaScript API
118+
119+
Once the wasm boots, two functions are exported on `globalThis.dockerAgent`:
120+
121+
### `parseConfig(yamlString) -> object`
122+
123+
Synchronous. Loads the YAML through `pkg/config.Load` (so all the version
124+
upgraders run) and returns a small JS-shaped projection:
125+
126+
```js
127+
dockerAgent.parseConfig(`
128+
version: "2"
129+
agents:
130+
root:
131+
model: openai/gpt-4o-mini
132+
instruction: hi
133+
`);
134+
// =>
135+
// {
136+
// version: "2",
137+
// agents: [{ name: "root", model: "openai/gpt-4o-mini", instruction: "hi", ... }],
138+
// models: { "openai/gpt-4o-mini": { provider: "openai", model: "gpt-4o-mini" } }
139+
// }
140+
```
141+
142+
Throws a JS `Error` on invalid YAML / unsupported version / failed validation.
143+
144+
### `chat({yaml, agentName?, env?, messages}, onEvent) -> Promise`
145+
146+
Asynchronous. Loads the config, picks the agent (or the only one), builds
147+
its model provider, and opens one streaming chat completion.
148+
149+
- `yaml` — the YAML document, same as for `parseConfig`.
150+
- `agentName` — required if the config defines more than one agent.
151+
- `env` — `{ OPENAI_API_KEY: "...", ANTHROPIC_API_KEY: "...", GEMINI_API_KEY: "..." }`.
152+
Whatever your model needs.
153+
- `messages` — array of `{role, content}` objects, OpenAI-style. The agent's
154+
`instruction` is automatically prepended as a `system` message if you
155+
haven't already supplied one.
156+
- `onEvent(ev)` — called from Go for every stream event:
157+
- `{type: "delta", content?: string, reasoning?: string}`
158+
- `{type: "finish", reason: string}`
159+
160+
Resolves to `{message: {role, content, reasoning, finish}}` once the stream
161+
ends. Rejects with an `Error` on any failure.
162+
163+
## Limits
164+
165+
These are not bugs to fix; they are direct consequences of `GOOS=js`:
166+
167+
- **No tools, no MCP, no hooks, no sub-agent handoffs.** Anything that needs
168+
`os/exec` or local file I/O is excluded from the build.
169+
- **No sessions.** `pkg/session` and `pkg/memory/database/sqlite` pull in
170+
`modernc.org/libc` which does not have a js port. The browser caller is
171+
responsible for keeping the message history.
172+
- **No fallbacks, no rule-based routing.** The rule-based router uses bleve,
173+
which depends on `mmap` / file-locking primitives that don't exist on
174+
js/wasm. A js-only `factory_js.go` swaps the full provider factory for a
175+
slim variant with only openai / anthropic / google.
176+
- **No Docker Model Runner, no Bedrock, no Vertex AI.** Same reason —
177+
`dmr` shells out, `bedrock` and `vertexai` pull in cloud SDKs that don't
178+
cross-compile to wasm.
179+
- **No Docker Desktop integration.** `pkg/desktop` has stubs for js that
180+
return empty paths and refuse to dial.
181+
- **CORS.** Mentioned above. Real deployment needs a proxy.
182+
183+
## Where the cross-compilation work lives
184+
185+
The shims that make the existing tree compile under `GOOS=js GOARCH=wasm`
186+
are intentionally tiny:
187+
188+
| File | Purpose |
189+
| --- | --- |
190+
| `pkg/cache/lock_js.go` | No-op file-lock stubs (single-threaded js). |
191+
| `pkg/desktop/sockets_js.go` | Returns empty Docker Desktop paths. |
192+
| `pkg/desktop/connection_js.go` | Refuses Unix-socket / named-pipe dials. |
193+
| `pkg/desktop/connection_other.go` | Build tag updated to `!windows && !js`. |
194+
| `pkg/model/provider/factory.go` | Build tag added: `!js`. |
195+
| `pkg/model/provider/factory_js.go` | js-only provider dispatch (openai / anthropic / google). |
196+
197+
Everything else compiles unchanged because docker-agent already had the
198+
`os/exec`, sandbox, sound, audio, server, browser, keyring code well-isolated
199+
behind their own packages — the wasm entry just doesn't import them.
200+
201+
## Sanity check
202+
203+
```sh
204+
# Native build still happy.
205+
go build ./...
206+
207+
# Wasm build still happy.
208+
GOOS=js GOARCH=wasm go build -o /tmp/cagent.wasm ./cmd/wasm
209+
210+
# Existing tests of the touched packages still pass.
211+
go test ./pkg/cache/... ./pkg/desktop/... ./pkg/model/provider/...
212+
213+
# End-to-end runtime smoke test.
214+
node cmd/wasm/smoke_test.js
215+
```

cmd/wasm/bridge.go

Lines changed: 147 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,147 @@
1+
//go:build js && wasm
2+
3+
package main
4+
5+
import (
6+
"encoding/json"
7+
"fmt"
8+
"syscall/js"
9+
10+
"github.com/docker/docker-agent/pkg/chat"
11+
"github.com/docker/docker-agent/pkg/config/latest"
12+
)
13+
14+
// ---------------------------------------------------------------------------
15+
// JS <-> Go conversion helpers
16+
// ---------------------------------------------------------------------------
17+
18+
// throwingError builds a JS object the JS-side wrapper recognises as an
19+
// error to re-throw. We use a sentinel field so plain return values can never
20+
// collide with it (real config output never includes a `__error` key).
21+
func throwingError(msg string) any {
22+
return js.ValueOf(map[string]any{"__error": msg})
23+
}
24+
25+
// jsError wraps a Go error in a JS Error so it can be passed to Promise.reject.
26+
func jsError(err error) js.Value {
27+
return js.Global().Get("Error").New(err.Error())
28+
}
29+
30+
// rejectedPromise returns a Promise that rejects immediately with msg.
31+
// Used when we can detect a bad call before launching a goroutine.
32+
func rejectedPromise(msg string) js.Value {
33+
return newPromise(func(_, reject func(any)) {
34+
reject(jsError(fmt.Errorf("%s", msg)))
35+
})
36+
}
37+
38+
// newPromise builds a JavaScript Promise whose executor is a Go function.
39+
// The returned js.Value is the Promise itself; resolve/reject are passed
40+
// to executor and may be called from any goroutine.
41+
func newPromise(executor func(resolve, reject func(any))) js.Value {
42+
promiseCtor := js.Global().Get("Promise")
43+
handler := js.FuncOf(func(_ js.Value, args []js.Value) any {
44+
resolve := args[0]
45+
reject := args[1]
46+
executor(
47+
func(v any) { resolve.Invoke(v) },
48+
func(v any) { reject.Invoke(v) },
49+
)
50+
return nil
51+
})
52+
// Note: handler is a one-shot Func. The Promise executor is invoked
53+
// synchronously from the constructor, so it is safe to Release the
54+
// handler immediately afterwards. We don't, because the executor
55+
// captures goroutines that may continue to call resolve/reject — which
56+
// only invoke the captured `resolve`/`reject` js.Values, not the handler
57+
// itself, so leaking the handler is OK and avoids a subtle race.
58+
return promiseCtor.New(handler)
59+
}
60+
61+
// emit invokes onEvent(payload) on the JS side, swallowing any error if the
62+
// caller passed something other than a function (or nothing at all).
63+
func emit(onEvent js.Value, payload any) {
64+
if onEvent.Type() != js.TypeFunction {
65+
return
66+
}
67+
onEvent.Invoke(js.ValueOf(payload))
68+
}
69+
70+
// jsObjectToStringMap converts a flat JS object {k: "v", ...} into a Go
71+
// map[string]string. Non-string values are stringified via Object.toString.
72+
// A null/undefined input yields a nil map.
73+
func jsObjectToStringMap(v js.Value) map[string]string {
74+
if v.Type() != js.TypeObject {
75+
return nil
76+
}
77+
keys := js.Global().Get("Object").Call("keys", v)
78+
out := make(map[string]string, keys.Length())
79+
for i := 0; i < keys.Length(); i++ {
80+
k := keys.Index(i).String()
81+
val := v.Get(k)
82+
if val.Type() == js.TypeString {
83+
out[k] = val.String()
84+
} else {
85+
out[k] = val.Call("toString").String()
86+
}
87+
}
88+
return out
89+
}
90+
91+
// jsToMessages decodes a JS array of {role, content} objects into a slice of
92+
// chat.Message. We round-trip through JSON so any extra fields the JS side
93+
// supplies (e.g. tool_calls in a future iteration) flow through naturally.
94+
func jsToMessages(v js.Value) ([]chat.Message, error) {
95+
if v.Type() != js.TypeObject {
96+
return nil, fmt.Errorf("messages must be an array")
97+
}
98+
jsonStr := js.Global().Get("JSON").Call("stringify", v).String()
99+
var msgs []chat.Message
100+
if err := json.Unmarshal([]byte(jsonStr), &msgs); err != nil {
101+
return nil, fmt.Errorf("decoding messages: %w", err)
102+
}
103+
return msgs, nil
104+
}
105+
106+
// configToMap reduces a fully-parsed *latest.Config to the small JS-friendly
107+
// shape returned by parseConfig. We deliberately omit fields that wouldn't
108+
// mean anything in the browser (toolsets, hooks, sandbox).
109+
func configToMap(cfg *latest.Config) map[string]any {
110+
agents := make([]any, 0, len(cfg.Agents))
111+
for _, a := range cfg.Agents {
112+
agents = append(agents, map[string]any{
113+
"name": a.Name,
114+
"description": a.Description,
115+
"model": a.Model,
116+
"instruction": a.Instruction,
117+
"sub_agents": stringsToAny(a.SubAgents),
118+
"handoffs": stringsToAny(a.Handoffs),
119+
"add_date": a.AddDate,
120+
"add_env_info": a.AddEnvironmentInfo,
121+
})
122+
}
123+
124+
models := map[string]any{}
125+
for k, m := range cfg.Models {
126+
models[k] = map[string]any{
127+
"provider": m.Provider,
128+
"model": m.Model,
129+
"base_url": m.BaseURL,
130+
}
131+
}
132+
133+
return map[string]any{
134+
"version": cfg.Version,
135+
"agents": agents,
136+
"models": models,
137+
}
138+
}
139+
140+
// stringsToAny widens a []string to []any so syscall/js.ValueOf accepts it.
141+
func stringsToAny(in []string) []any {
142+
out := make([]any, len(in))
143+
for i, s := range in {
144+
out[i] = s
145+
}
146+
return out
147+
}

0 commit comments

Comments
 (0)