|
| 1 | +# docker-agent in the browser (js/wasm) |
| 2 | + |
| 3 | +`cmd/wasm` is a `GOOS=js GOARCH=wasm` entry point that exposes a thin slice |
| 4 | +of docker-agent — config parsing and a single-round streaming chat — to a |
| 5 | +JavaScript host (a browser tab or Node). |
| 6 | + |
| 7 | +It is **not** a port of the full agent. It is a proof-of-concept for the |
| 8 | +"realistic plan" outlined when we surveyed which parts of docker-agent could |
| 9 | +even be cross-compiled to wasm. See the *Limits* section below. |
| 10 | + |
| 11 | +## Build |
| 12 | + |
| 13 | +```sh |
| 14 | +# Compile. |
| 15 | +GOOS=js GOARCH=wasm go build -o cmd/wasm/web/docker-agent.wasm ./cmd/wasm |
| 16 | + |
| 17 | +# Copy the matching wasm_exec.js shim from the Go toolchain (its API is |
| 18 | +# tied to the compiler version and must match the binary). |
| 19 | +cp "$(go env GOROOT)/lib/wasm/wasm_exec.js" cmd/wasm/web/wasm_exec.js |
| 20 | +``` |
| 21 | + |
| 22 | +The output is a ~75 MB `.wasm`. It includes the entire YAML parser, three LLM |
| 23 | +provider clients (OpenAI, Anthropic, Google) and their dependencies. With |
| 24 | +`tinygo` or `-ldflags="-s -w"` plus `wasm-opt` you can roughly halve it; we |
| 25 | +have not optimised the size. |
| 26 | + |
| 27 | +## Run (Node smoke test) |
| 28 | + |
| 29 | +```sh |
| 30 | +node cmd/wasm/smoke_test.js |
| 31 | +``` |
| 32 | + |
| 33 | +Prints the parsed config of a small two-agent YAML and exits 0. This proves |
| 34 | +that the Go runtime starts under wasm, that `globalThis.dockerAgent` gets |
| 35 | +registered, and that `parseConfig` returns the expected shape. |
| 36 | + |
| 37 | +## Run (browser, with OpenRouter sign-in) |
| 38 | + |
| 39 | +The demo page implements OpenRouter's [PKCE OAuth flow](https://openrouter.ai/docs/use-cases/oauth-pkce): |
| 40 | + |
| 41 | +1. Serve `cmd/wasm/web/` over HTTP (`WebAssembly.instantiateStreaming` |
| 42 | + needs the `application/wasm` MIME type, so `file://` won't work): |
| 43 | + |
| 44 | + ```sh |
| 45 | + cd cmd/wasm/web && python3 -m http.server 8765 |
| 46 | + ``` |
| 47 | + |
| 48 | +2. Open <http://localhost:8765/>. |
| 49 | + |
| 50 | +3. Click **Sign in with OpenRouter** — you'll be redirected to |
| 51 | + `openrouter.ai`, log in, approve the app, and bounced back. The page |
| 52 | + exchanges the `?code=` for a user-controlled API key (PKCE / S256) and |
| 53 | + stores it in `localStorage`. |
| 54 | + |
| 55 | +4. Pick a free model from the dropdown (the YAML textarea updates |
| 56 | + automatically) and click **Run**. Streaming completion deltas appear in |
| 57 | + the output box. |
| 58 | + |
| 59 | +5. To revoke the key: click **Sign out** in the page (clears local copy) |
| 60 | + and/or visit |
| 61 | + <https://openrouter.ai/settings/keys> to revoke server-side. |
| 62 | + |
| 63 | +### Why this works in a browser |
| 64 | + |
| 65 | +Most LLM providers block direct browser fetches because anyone could read |
| 66 | +the API key out of the Network tab. OpenRouter solves this by issuing |
| 67 | +*per-app, per-user* keys via PKCE — the user owns the key, can revoke it, |
| 68 | +and the app never sees a master credential. |
| 69 | + |
| 70 | +We verified the relevant CORS posture before shipping: |
| 71 | + |
| 72 | +``` |
| 73 | +$ curl -is -X OPTIONS https://openrouter.ai/api/v1/auth/keys \ |
| 74 | + -H "Origin: http://localhost:8765" \ |
| 75 | + -H "Access-Control-Request-Method: POST" |
| 76 | +HTTP/2 204 |
| 77 | +access-control-allow-origin: * |
| 78 | +access-control-allow-headers: Authorization,...,Content-Type,... |
| 79 | +``` |
| 80 | + |
| 81 | +Both `/api/v1/auth/keys` (token exchange) and `/api/v1/chat/completions` |
| 82 | +(inference) return `access-control-allow-origin: *` with `Authorization` |
| 83 | +in the allowed headers. No proxy needed. |
| 84 | + |
| 85 | +### What the YAML looks like |
| 86 | + |
| 87 | +OpenRouter exposes an OpenAI-compatible API, so it slots in as a custom |
| 88 | +provider: |
| 89 | + |
| 90 | +```yaml |
| 91 | +providers: |
| 92 | + openrouter: |
| 93 | + provider: openai |
| 94 | + base_url: https://openrouter.ai/api/v1 |
| 95 | + token_key: OPENROUTER_API_KEY |
| 96 | +agents: |
| 97 | + root: |
| 98 | + model: openrouter/meta-llama/llama-3.3-70b-instruct:free |
| 99 | + instruction: | |
| 100 | + You are a helpful assistant ... |
| 101 | +``` |
| 102 | +
|
| 103 | +When the user clicks **Run**, the page passes the stored key as |
| 104 | +`env.OPENROUTER_API_KEY` to `dockerAgent.chat(...)`, the existing |
| 105 | +`pkg/model/provider/openai` reads it via `env.Get(ctx, cfg.TokenKey)`, and |
| 106 | +the Go HTTP transport (mapped to `fetch` under js/wasm) sends the |
| 107 | +request. **Same code path the CLI uses** — no special browser-only branch. |
| 108 | + |
| 109 | +### Bring-your-own-key fallback |
| 110 | + |
| 111 | +The `<details>` block on the page lets advanced users paste OpenAI / |
| 112 | +Anthropic / Gemini keys directly. Anthropic recently added |
| 113 | +`anthropic-dangerous-direct-browser-access` so it actually works from a |
| 114 | +tab; OpenAI and Gemini still block CORS for browser origins, so those |
| 115 | +fields are mostly there for use against a self-hosted proxy. |
| 116 | + |
| 117 | +## JavaScript API |
| 118 | + |
| 119 | +Once the wasm boots, two functions are exported on `globalThis.dockerAgent`: |
| 120 | + |
| 121 | +### `parseConfig(yamlString) -> object` |
| 122 | + |
| 123 | +Synchronous. Loads the YAML through `pkg/config.Load` (so all the version |
| 124 | +upgraders run) and returns a small JS-shaped projection: |
| 125 | + |
| 126 | +```js |
| 127 | +dockerAgent.parseConfig(` |
| 128 | +version: "2" |
| 129 | +agents: |
| 130 | + root: |
| 131 | + model: openai/gpt-4o-mini |
| 132 | + instruction: hi |
| 133 | +`); |
| 134 | +// => |
| 135 | +// { |
| 136 | +// version: "2", |
| 137 | +// agents: [{ name: "root", model: "openai/gpt-4o-mini", instruction: "hi", ... }], |
| 138 | +// models: { "openai/gpt-4o-mini": { provider: "openai", model: "gpt-4o-mini" } } |
| 139 | +// } |
| 140 | +``` |
| 141 | + |
| 142 | +Throws a JS `Error` on invalid YAML / unsupported version / failed validation. |
| 143 | + |
| 144 | +### `chat({yaml, agentName?, env?, messages}, onEvent) -> Promise` |
| 145 | + |
| 146 | +Asynchronous. Loads the config, picks the agent (or the only one), builds |
| 147 | +its model provider, and opens one streaming chat completion. |
| 148 | + |
| 149 | +- `yaml` — the YAML document, same as for `parseConfig`. |
| 150 | +- `agentName` — required if the config defines more than one agent. |
| 151 | +- `env` — `{ OPENAI_API_KEY: "...", ANTHROPIC_API_KEY: "...", GEMINI_API_KEY: "..." }`. |
| 152 | + Whatever your model needs. |
| 153 | +- `messages` — array of `{role, content}` objects, OpenAI-style. The agent's |
| 154 | + `instruction` is automatically prepended as a `system` message if you |
| 155 | + haven't already supplied one. |
| 156 | +- `onEvent(ev)` — called from Go for every stream event: |
| 157 | + - `{type: "delta", content?: string, reasoning?: string}` |
| 158 | + - `{type: "finish", reason: string}` |
| 159 | + |
| 160 | +Resolves to `{message: {role, content, reasoning, finish}}` once the stream |
| 161 | +ends. Rejects with an `Error` on any failure. |
| 162 | + |
| 163 | +## Limits |
| 164 | + |
| 165 | +These are not bugs to fix; they are direct consequences of `GOOS=js`: |
| 166 | + |
| 167 | +- **No tools, no MCP, no hooks, no sub-agent handoffs.** Anything that needs |
| 168 | + `os/exec` or local file I/O is excluded from the build. |
| 169 | +- **No sessions.** `pkg/session` and `pkg/memory/database/sqlite` pull in |
| 170 | + `modernc.org/libc` which does not have a js port. The browser caller is |
| 171 | + responsible for keeping the message history. |
| 172 | +- **No fallbacks, no rule-based routing.** The rule-based router uses bleve, |
| 173 | + which depends on `mmap` / file-locking primitives that don't exist on |
| 174 | + js/wasm. A js-only `factory_js.go` swaps the full provider factory for a |
| 175 | + slim variant with only openai / anthropic / google. |
| 176 | +- **No Docker Model Runner, no Bedrock, no Vertex AI.** Same reason — |
| 177 | + `dmr` shells out, `bedrock` and `vertexai` pull in cloud SDKs that don't |
| 178 | + cross-compile to wasm. |
| 179 | +- **No Docker Desktop integration.** `pkg/desktop` has stubs for js that |
| 180 | + return empty paths and refuse to dial. |
| 181 | +- **CORS.** Mentioned above. Real deployment needs a proxy. |
| 182 | + |
| 183 | +## Where the cross-compilation work lives |
| 184 | + |
| 185 | +The shims that make the existing tree compile under `GOOS=js GOARCH=wasm` |
| 186 | +are intentionally tiny: |
| 187 | + |
| 188 | +| File | Purpose | |
| 189 | +| --- | --- | |
| 190 | +| `pkg/cache/lock_js.go` | No-op file-lock stubs (single-threaded js). | |
| 191 | +| `pkg/desktop/sockets_js.go` | Returns empty Docker Desktop paths. | |
| 192 | +| `pkg/desktop/connection_js.go` | Refuses Unix-socket / named-pipe dials. | |
| 193 | +| `pkg/desktop/connection_other.go` | Build tag updated to `!windows && !js`. | |
| 194 | +| `pkg/model/provider/factory.go` | Build tag added: `!js`. | |
| 195 | +| `pkg/model/provider/factory_js.go` | js-only provider dispatch (openai / anthropic / google). | |
| 196 | + |
| 197 | +Everything else compiles unchanged because docker-agent already had the |
| 198 | +`os/exec`, sandbox, sound, audio, server, browser, keyring code well-isolated |
| 199 | +behind their own packages — the wasm entry just doesn't import them. |
| 200 | + |
| 201 | +## Sanity check |
| 202 | + |
| 203 | +```sh |
| 204 | +# Native build still happy. |
| 205 | +go build ./... |
| 206 | +
|
| 207 | +# Wasm build still happy. |
| 208 | +GOOS=js GOARCH=wasm go build -o /tmp/cagent.wasm ./cmd/wasm |
| 209 | +
|
| 210 | +# Existing tests of the touched packages still pass. |
| 211 | +go test ./pkg/cache/... ./pkg/desktop/... ./pkg/model/provider/... |
| 212 | +
|
| 213 | +# End-to-end runtime smoke test. |
| 214 | +node cmd/wasm/smoke_test.js |
| 215 | +``` |
0 commit comments