Skip to content

fix(sse): prevent memory leak from zombie SSE connections#22552

Open
zhumengzhu wants to merge 2 commits intoanomalyco:devfrom
zhumengzhu:fix/sse-memory-leak
Open

fix(sse): prevent memory leak from zombie SSE connections#22552
zhumengzhu wants to merge 2 commits intoanomalyco:devfrom
zhumengzhu:fix/sse-memory-leak

Conversation

@zhumengzhu
Copy link
Copy Markdown

@zhumengzhu zhumengzhu commented Apr 15, 2026

Issue for this PR

Closes #22198, closes #22422.

Type of change

  • Bug fix

What does this PR do?

Root cause: adapter.bun.ts sets idleTimeout: 0, disabling Bun's idle timeout
entirely. When an SSE connection enters TCP CLOSE_WAIT (Electron shell reconnecting on
minimize/restore, or a reverse proxy swallowing the client FIN), the server has no
signal that the connection is dead. Three resources leak per zombie connection: the
heartbeat setInterval, the Bus.subscribeAll subscription, and the AsyncQueue.
The queue keeps receiving every Bus event with no consumer draining it — at 66+ zombie
connections this was ~14 MB/sec, peaking at 24.5 GB before OOM.

The finally { stop() } block does not help — for await is suspended on q.next()
which returns a Promise that never resolves when the queue has no consumer.

Five changes:

  1. adapter.bun.ts: idleTimeout: 0120 — the root fix. Bun's idle timeout
    fires after 120s of no successful writes, aborting Request.signal and triggering
    cleanup. The 10s heartbeat resets the timer on every beat for active connections, so
    live connections are never affected.

  2. c.req.raw.signal.addEventListener("abort", stop) in event.ts and global.ts
    — adds an independent cleanup path that fires before stream.onAbort on direct
    connections (~2ms earlier). Hono gates this listener behind isOldBunVersion() and
    skips it on Bun 1.2+; we add it unconditionally.

  3. writeSSE wrapped in try/catch in the same files — kept for forward
    compatibility if a future Hono version propagates write errors. Hono 4.x
    StreamingApi.write() has an empty catch block so this does not fire on the current
    version (noted in code comments).

  4. AsyncQueue capacity limit (default 1000, drop oldest on overflow) — defensive
    backstop so growth is bounded even if a cleanup signal is missed.

  5. GlobalBus.setMaxListeners(100) — each SSE connection adds one listener; the
    default of 10 causes MaxListenersExceededWarning under normal usage.

Reproduction attempt and detailed analysis: https://gist.github.com/zhumengzhu/091076a04491762659c3cd210d02cee7

Inspired by #18395 and #15646, both closed without merging.

How did you verify your code works?

  • Full test suite: 1890 pass, 1 fail (the failure is pre-existing on upstream/dev
    before any changes — same result on unmodified upstream)
  • Local reproduction attempt with nginx proxy_ignore_client_abort on (20 SSE
    connections, kill all clients, monitor VmRSS for 120s) — detailed in the gist above
  • Signal timing instrumentation to verify req.raw.signal vs stream.onAbort firing
    order in both direct and proxied scenarios

Screenshots / recordings

N/A — server-side memory fix

Checklist

  • I have tested my changes locally
  • I have not included unrelated changes in this PR

@github-actions github-actions bot added needs:compliance This means the issue will auto-close after 2 hours. and removed needs:compliance This means the issue will auto-close after 2 hours. labels Apr 15, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for updating your PR! It now meets our contributing guidelines. 👍

Root cause: adapter.bun.ts explicitly set idleTimeout: 0, disabling
Bun's idle timeout entirely. When connections enter TCP CLOSE_WAIT
(reverse proxy or Electron shell swallowing client FIN), the server
has no signal that the connection is dead — three resource classes
leak permanently: AsyncQueue grows without bound (14 MB/s, peak
24.5 GB), setInterval heartbeats keep ticking, Bus subscriptions
are never cancelled.

Five changes:

- server/adapter.bun.ts: idleTimeout: 0 → 120. Heartbeat interval
  is 10s so active connections reset the timer on every beat and are
  never killed; dead connections are cleaned up within 120s
- server/instance/event.ts: register c.req.raw.signal abort listener
  as a second cleanup path independent of responseReadable.cancel().
  Hono skips this listener on Bun 1.2+ (isOldBunVersion gate); we
  add it unconditionally. writeSSE try/catch kept for forward
  compatibility — Hono 4.x write() has an empty catch so this block
  does not fire on the current version
- server/instance/global.ts: same two changes applied to streamEvents()
- util/queue.ts: add capacity limit (default 1000), drop oldest entry
  on overflow — defensive backstop so growth is bounded even if a
  cleanup signal is missed
- bus/global.ts: setMaxListeners(100) — fixes anomalyco#22422
  MaxListenersExceededWarning; each SSE connection adds one listener,
  default limit of 10 is too low for normal concurrent usage

Fixes anomalyco#22198, fixes anomalyco#22422

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@zhumengzhu zhumengzhu force-pushed the fix/sse-memory-leak branch from fe23f0f to 4a3ffb7 Compare April 15, 2026 13:28
@zhumengzhu
Copy link
Copy Markdown
Author

On not adding an application-layer idle timeout check

A few comments in #22198 (AlexZander85) suggested tracking lastConsumed and calling
stop() if no heartbeat has been consumed in N seconds. We considered this but did not
implement it for two reasons:

  1. idleTimeout: 120 already provides this guarantee at the right layer. Bun's
    idle timeout fires when no data has been successfully written for 120s. With a 10s
    heartbeat, active connections reset the timer on every beat and are never affected.
    Dead connections stop resetting the timer once the downstream TCP buffer fills, and
    are cleaned up within 120s. This is the same signal, handled at the transport layer
    where it belongs.

  2. Application-layer detection has no reliable signal. writeSSE appears to
    succeed even on dead connections — Hono 4.x StreamingApi.write() has an empty
    catch block, so write failures are invisible to the caller. The only application-
    layer proxy for "was data consumed" is WritableStreamDefaultWriter.desiredSize,
    which only turns negative after the downstream TCP buffer is full — by which point
    idleTimeout would fire anyway.

@zhumengzhu
Copy link
Copy Markdown
Author

Relationship with #20642

PR #20642 (by @jwcrystal) redesigns
AsyncQueue with a capacity parameter, a close() method, and a sentinel value —
a more complete queue interface that also caps memory growth.

The two PRs are complementary and do not overlap:

If #20642 is merged first, the AsyncQueue changes here would need rebasing against
its interface, but the rest of the changes in this PR are independent.

Covers: basic FIFO order, oldest-entry eviction on overflow, queue
length never exceeds limit, direct-resolver bypass, null sentinel
termination, and zombie scenario (10000 pushes, no consumer).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Memory leak: SSE connections stuck in CLOSE_WAIT cause unbounded AsyncQueue growth (~14 MB/sec) Memory Leak Warning

1 participant