Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
147 changes: 147 additions & 0 deletions examples/servers/mrtr-options/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
# MRTR handler-shape options (SEP-2322)

Python-SDK counterpart to [typescript-sdk#1701]. Seven ways to write the same
weather-lookup tool, so the diff between files is the argument.

Unlike the TS demos, the lowlevel plumbing here is **real** — each option is
an actual `mcp.server.Server` that round-trips `IncompleteResult` through the
wire protocol. The invariant test at the bottom asserts they all produce
identical client-observed behaviour.

[typescript-sdk#1701]: https://github.com/modelcontextprotocol/typescript-sdk/pull/1701

## Start here

If you just want to see what an MRTR lowlevel handler looks like without
the comparison framing, read these first:

- [`basic.py`](mrtr_options/basic.py) — the simple-tool equivalent. One
`IncompleteResult`, one retry, done. ~130 lines, half of which are
comments explaining the two moves every MRTR handler makes.
- [`basic_multiround.py`](mrtr_options/basic_multiround.py) — the
ADO-rules SEP example. Two rounds, with `request_state` carrying
accumulated context across the retry so any server instance can
handle any round.

Both are runnable end-to-end against the in-memory client:

```sh
uv run python -m mrtr_options.basic
uv run python -m mrtr_options.basic_multiround
```

## The quadrant

| Server infra | Pre-MRTR client | MRTR client |
| ------------------------------- | --------------------------------- | ----------- |
| Can hold SSE | E by default; A/C/D if you opt in | MRTR |
| MRTR-only (horizontally scaled) | E by necessity | MRTR |

Both rows *work* for old clients — version negotiation succeeds,
`tools/list` is complete, tools that don't elicit are unaffected. Only
elicitation inside a tool is unavailable. Bottom-left isn't "unresolvable";
it's "E is the only option." Top-left is "E, unless you choose to carry SSE
infra." The rows collapse for E, which is why it's the SDK default.

## Options

| | Author writes | SDK does | Hidden re-entry | Server state | Old client gets |
| ------------------------------ | ------------------------------- | -------------------------------- | --------------- | -------------------- | --------------------------------- |
| [E](mrtr_options/option_e_degrade.py) | MRTR-native only | Nothing | No | None | Result w/ default, or error |
| [A](mrtr_options/option_a_sse_shim.py) | MRTR-native only | Retry-loop over SSE | Yes, safe | SSE connection | Full elicitation |
| [B](mrtr_options/option_b_await_shim.py) | `await elicit()` | Exception → `IncompleteResult` | **Yes, unsafe** | None | Full elicitation |
| [C](mrtr_options/option_c_version_branch.py) | One handler, `if version` branch | Version accessor | No | SSE (old-client arm) | Full elicitation |
| [D](mrtr_options/option_d_dual_handler.py) | Two handlers | Picks by version | No | SSE (old-client arm) | Full elicitation |
| [F](mrtr_options/option_f_ctx_once.py) | MRTR-native + `ctx.once` wraps | `once()` guard in request_state | No | None | (same as E) |
| [G](mrtr_options/option_g_tool_builder.py) | Step functions + `.build()` | Step-tracking in request_state | No | None | (same as E) |
| [H](mrtr_options/option_h_linear.py) | `await ctx.elicit()` (linear) | Holds coroutine frame in memory | No | Coroutine frame | (same as E) |

"Hidden re-entry" = the handler function is invoked more than once for a
single logical tool call, and the author can't tell from the source text.

**A is safe** because MRTR-native code has the re-entry guard (`if not
prefs: return IncompleteResult(...)`) visible in source even though the
*loop* is hidden.

**B is unsafe** because `await elicit()` looks like a suspension point but
is actually a re-entry point on MRTR sessions — see the `audit_log`
landmine in that file.

## Footgun prevention (F, G)

A–E are about the dual-path axis (old client vs new). F and G address a
different axis: even in a pure-MRTR world, the naive handler shape has a
footgun. Code above the `if not prefs` guard runs on every retry. If that
code is a DB write or HTTP POST, it executes N times for N-round
elicitation. Nothing *enforces* putting side-effects below the guard —
safety depends on the developer knowing the convention. The analogy from
SDK-WG review: the naive MRTR handler is de-facto GOTO.

**F (`MrtrCtx.once`)** keeps the monolithic handler but wraps side-effects
in an idempotency guard. `ctx.once("audit", lambda: audit_log(...))` checks
`request_state` — if the key is marked executed, skip. Opt-in: an unwrapped
mutation still fires twice. The footgun is made *visually distinct*, which
is reviewable.

**G (`ToolBuilder`)** decomposes the handler into named step functions.
`incomplete_step` may return `IncompleteResult` or data; `end_step` receives
everything and runs exactly once. There is no "above the guard" zone because
there is no guard — the SDK's step-tracking is the guard. Side-effects go in
`end_step`, structurally unreachable until all elicitations complete.

Both depend on `request_state` integrity. The demos use plain base64-JSON;
a real SDK MUST HMAC-sign the blob, or the client can forge step-done
markers and skip the guards. Per-session key derived from `initialize` keeps
it stateless. Without signing, the safety story is advisory.

## Trade-offs

**E is the SDK default.** A horizontally-scaled server gets E for free —
it's the only thing that works on that infra. A server that can hold SSE
also gets E by default, and opts into A/C/D only if serving old-client
elicitation is worth the extra infra dependency.

**A vs E** is the core tension. Same author-facing code (MRTR-native), the
only difference is whether old clients get elicitation. A requires shipping
`sse_retry_shim`; E requires nothing. A also carries a deployment-time
hazard E doesn't: the shim calls real SSE under the hood, so on MRTR-only
infra it fails at runtime when an old client connects — a constraint that
lives nowhere near the tool code.

**B** is zero-migration but breaks silently for anything non-idempotent
above the await. Not a ship target.

**C vs D** is factoring: one function with a branch vs two functions with a
dispatcher. Both put the dual-path burden on the tool author.

**F vs G** is the footgun-prevention trade. F is minimal — one line per
side-effect, composes with any handler shape. G is structural —
double-execution impossible for `end_step`, but costs two function defs
per tool. Likely SDK answer: ship F as a primitive on the context, ship G
as an opt-in builder, recommend G for multi-round tools and F for
single-question tools.

**H (linear continuation)** is the Option B footgun, *fixed*. Handler code
reads exactly like the SSE era — `await ctx.elicit()` is a genuine
suspension point, side-effects above it fire once — because the coroutine
frame is held in memory across rounds. The trade: server is stateful
*within* a single tool call (frame keyed by `request_state`), so
horizontally-scaled deployments need sticky routing on the token. Same
operational shape as A's SSE hold but without the long-lived connection.
Use for migrating existing SSE-era tools without rewriting, or when the
linear style is genuinely clearer than guard-first. Don't use if you need
true statelessness — E/F/G encode everything in `request_state` itself.

## The invariant test

`tests/server/experimental/test_mrtr_options.py` parametrises all seven
servers against the same `Client` + `elicitation_callback`, asserting
identical output. The footgun test measures `audit_count` to prove F and G
hold the side-effect to one.

## Not in scope

- Persistent/Tasks workflow — `ServerTaskContext` already does
`input_required`; MRTR integration is a separate PR
- `mrtrOnly` client flag — trivial to add, not demoed
- requestState HMAC signing — called out in code comments
7 changes: 7 additions & 0 deletions examples/servers/mrtr-options/mrtr_options/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
"""MRTR handler-shape comparison — seven options on the same weather tool.
See README.md for the trade-off matrix. Every option here is a real lowlevel
``mcp.server.Server`` that produces identical wire behaviour to each client
version — the server's internal choice doesn't leak. That's the argument
against per-feature ``-mrtr`` capability flags.
"""
58 changes: 58 additions & 0 deletions examples/servers/mrtr-options/mrtr_options/_shared.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
"""Domain logic shared across all options — *not* SDK machinery.

The weather tool: given a location, asks which units, returns a temperature
string. Same tool throughout so the diff between option files is the
argument.

``audit_log`` is the side-effect that makes the MRTR footgun concrete: under
naive re-entry it fires once per round. Options F and G tame it.
"""

from __future__ import annotations

from mcp import types
from mcp.server import Server, ServerRequestContext

UNITS_SCHEMA: types.ElicitRequestedSchema = {
"type": "object",
"properties": {"units": {"type": "string", "enum": ["metric", "imperial"], "title": "Units"}},
"required": ["units"],
}

UNITS_REQUEST = types.ElicitRequest(
params=types.ElicitRequestFormParams(message="Which units?", requested_schema=UNITS_SCHEMA)
)


def lookup_weather(location: str, units: str) -> str:
temp = "22°C" if units == "metric" else "72°F"
return f"Weather in {location}: {temp}, partly cloudy."


_audit_count = 0


def audit_log(location: str) -> None:
"""The footgun. Under naive re-entry this fires N times for N-round MRTR."""
global _audit_count
_audit_count += 1
print(f"[audit] lookup requested for {location} (count={_audit_count})")


def audit_count() -> int:
return _audit_count


def reset_audit() -> None:
global _audit_count
_audit_count = 0


async def no_tools(ctx: ServerRequestContext, params: types.PaginatedRequestParams | None) -> types.ListToolsResult:
"""Minimal tools/list handler so Client validation has something to call."""
return types.ListToolsResult(tools=[])


def build_server(name: str, on_call_tool: object, **kwargs: object) -> Server:
"""Consistent Server construction across option files."""
return Server(name, on_call_tool=on_call_tool, on_list_tools=no_tools, **kwargs) # type: ignore[arg-type]
128 changes: 128 additions & 0 deletions examples/servers/mrtr-options/mrtr_options/basic.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
"""The minimal MRTR lowlevel server — the simple-tool equivalent.

No version checks, no comparison framing. Just the two moves every MRTR
handler makes:

1. Check ``params.input_responses`` for the answer to a prior ask.
2. If it's not there, return ``IncompleteResult`` with the ask embedded.

The client SDK (``mcp.client.Client.call_tool``) drives the retry loop —
this handler is invoked once per round with whatever the client collected.

Run against the in-memory client:

uv run python -m mrtr_options.basic
"""

from __future__ import annotations

import anyio

from mcp import types
from mcp.client import Client
from mcp.client.context import ClientRequestContext
from mcp.server import Server, ServerRequestContext


async def on_list_tools(
ctx: ServerRequestContext, params: types.PaginatedRequestParams | None
) -> types.ListToolsResult:
return types.ListToolsResult(
tools=[
types.Tool(
name="get_weather",
description="Look up weather for a location. Asks which units you want.",
input_schema={
"type": "object",
"properties": {"location": {"type": "string"}},
"required": ["location"],
},
)
]
)


async def on_call_tool(
ctx: ServerRequestContext, params: types.CallToolRequestParams
) -> types.CallToolResult | types.IncompleteResult:
"""The MRTR tool handler. Called once per round."""
location = (params.arguments or {}).get("location", "?")

# ───────────────────────────────────────────────────────────────────────
# Step 1: check if the client has already answered our question.
#
# ``input_responses`` is a dict keyed by the same keys we used in
# ``input_requests`` on the prior round. Each value is the raw result
# the client produced (ElicitResult, CreateMessageResult, ListRootsResult
# — serialized to dict form over the wire).
#
# On the first round, ``input_responses`` is None. On subsequent rounds,
# it contains ONLY the answers to the most recent round's asks — not
# accumulated across rounds. If you need to accumulate, encode it in
# ``request_state`` (see option_f_ctx_once.py / option_g_tool_builder.py).
# ───────────────────────────────────────────────────────────────────────
responses = params.input_responses or {}
prefs = responses.get("unit_prefs")

if prefs is None or prefs.get("action") != "accept":
# ───────────────────────────────────────────────────────────────────
# Step 2: ask. Return IncompleteResult with the embedded request.
#
# The client SDK receives this, dispatches the embedded ElicitRequest
# to its elicitation_callback, and re-invokes this handler with the
# answer in input_responses["unit_prefs"].
#
# Keys are server-assigned and opaque to the client. Pick whatever
# makes the code readable — they just need to be consistent between
# the ask and the check above.
# ───────────────────────────────────────────────────────────────────
return types.IncompleteResult(
input_requests={
"unit_prefs": types.ElicitRequest(
params=types.ElicitRequestFormParams(
message="Which units for the temperature?",
requested_schema={
"type": "object",
"properties": {"units": {"type": "string", "enum": ["metric", "imperial"]}},
"required": ["units"],
},
)
)
},
# request_state is optional. Use it for anything that must
# survive across rounds without server-side storage — e.g.
# partially-computed results, progress markers, or (in F/G)
# idempotency guards. The client echoes it verbatim.
request_state=None,
)

# ───────────────────────────────────────────────────────────────────────
# Step 3: we have the answer. Compute and return a normal result.
# ───────────────────────────────────────────────────────────────────────
units = prefs["content"]["units"]
temp = "22°C" if units == "metric" else "72°F"
return types.CallToolResult(content=[types.TextContent(text=f"Weather in {location}: {temp}, partly cloudy.")])


server = Server("mrtr-basic", on_list_tools=on_list_tools, on_call_tool=on_call_tool)


# ─── Demo driver ─────────────────────────────────────────────────────────────


async def elicitation_callback(context: ClientRequestContext, params: types.ElicitRequestParams) -> types.ElicitResult:
"""What the app developer writes. Same signature as SSE-era callbacks."""
assert isinstance(params, types.ElicitRequestFormParams)
print(f"[client] server asks: {params.message}")
# A real client presents params.requested_schema as a form. We hard-code.
return types.ElicitResult(action="accept", content={"units": "metric"})


async def main() -> None:
async with Client(server, elicitation_callback=elicitation_callback) as client:
result = await client.call_tool("get_weather", {"location": "Tokyo"})
print(f"[client] result: {result.content[0].text}") # type: ignore[union-attr]


if __name__ == "__main__":
anyio.run(main)
Loading
Loading