fix(sse): prevent memory leak from zombie SSE connections#22552
fix(sse): prevent memory leak from zombie SSE connections#22552zhumengzhu wants to merge 2 commits intoanomalyco:devfrom
Conversation
|
Thanks for updating your PR! It now meets our contributing guidelines. 👍 |
Root cause: adapter.bun.ts explicitly set idleTimeout: 0, disabling Bun's idle timeout entirely. When connections enter TCP CLOSE_WAIT (reverse proxy or Electron shell swallowing client FIN), the server has no signal that the connection is dead — three resource classes leak permanently: AsyncQueue grows without bound (14 MB/s, peak 24.5 GB), setInterval heartbeats keep ticking, Bus subscriptions are never cancelled. Five changes: - server/adapter.bun.ts: idleTimeout: 0 → 120. Heartbeat interval is 10s so active connections reset the timer on every beat and are never killed; dead connections are cleaned up within 120s - server/instance/event.ts: register c.req.raw.signal abort listener as a second cleanup path independent of responseReadable.cancel(). Hono skips this listener on Bun 1.2+ (isOldBunVersion gate); we add it unconditionally. writeSSE try/catch kept for forward compatibility — Hono 4.x write() has an empty catch so this block does not fire on the current version - server/instance/global.ts: same two changes applied to streamEvents() - util/queue.ts: add capacity limit (default 1000), drop oldest entry on overflow — defensive backstop so growth is bounded even if a cleanup signal is missed - bus/global.ts: setMaxListeners(100) — fixes anomalyco#22422 MaxListenersExceededWarning; each SSE connection adds one listener, default limit of 10 is too low for normal concurrent usage Fixes anomalyco#22198, fixes anomalyco#22422 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
fe23f0f to
4a3ffb7
Compare
|
On not adding an application-layer idle timeout check A few comments in #22198 (AlexZander85) suggested tracking
|
|
Relationship with #20642 PR #20642 (by @jwcrystal) redesigns The two PRs are complementary and do not overlap:
If #20642 is merged first, the |
Covers: basic FIFO order, oldest-entry eviction on overflow, queue length never exceeds limit, direct-resolver bypass, null sentinel termination, and zombie scenario (10000 pushes, no consumer). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Issue for this PR
Closes #22198, closes #22422.
Type of change
What does this PR do?
Root cause:
adapter.bun.tssetsidleTimeout: 0, disabling Bun's idle timeoutentirely. When an SSE connection enters TCP CLOSE_WAIT (Electron shell reconnecting on
minimize/restore, or a reverse proxy swallowing the client FIN), the server has no
signal that the connection is dead. Three resources leak per zombie connection: the
heartbeat
setInterval, theBus.subscribeAllsubscription, and theAsyncQueue.The queue keeps receiving every Bus event with no consumer draining it — at 66+ zombie
connections this was ~14 MB/sec, peaking at 24.5 GB before OOM.
The
finally { stop() }block does not help —for awaitis suspended onq.next()which returns a Promise that never resolves when the queue has no consumer.
Five changes:
adapter.bun.ts:idleTimeout: 0→120— the root fix. Bun's idle timeoutfires after 120s of no successful writes, aborting
Request.signaland triggeringcleanup. The 10s heartbeat resets the timer on every beat for active connections, so
live connections are never affected.
c.req.raw.signal.addEventListener("abort", stop)inevent.tsandglobal.ts— adds an independent cleanup path that fires before
stream.onAborton directconnections (~2ms earlier). Hono gates this listener behind
isOldBunVersion()andskips it on Bun 1.2+; we add it unconditionally.
writeSSEwrapped in try/catch in the same files — kept for forwardcompatibility if a future Hono version propagates write errors. Hono 4.x
StreamingApi.write()has an empty catch block so this does not fire on the currentversion (noted in code comments).
AsyncQueuecapacity limit (default 1000, drop oldest on overflow) — defensivebackstop so growth is bounded even if a cleanup signal is missed.
GlobalBus.setMaxListeners(100)— each SSE connection adds one listener; thedefault of 10 causes
MaxListenersExceededWarningunder normal usage.Reproduction attempt and detailed analysis: https://gist.github.com/zhumengzhu/091076a04491762659c3cd210d02cee7
Inspired by #18395 and #15646, both closed without merging.
How did you verify your code works?
upstream/devbefore any changes — same result on unmodified upstream)
proxy_ignore_client_abort on(20 SSEconnections, kill all clients, monitor VmRSS for 120s) — detailed in the gist above
req.raw.signalvsstream.onAbortfiringorder in both direct and proxied scenarios
Screenshots / recordings
N/A — server-side memory fix
Checklist