Skip to content

Conversation

@dkorittki
Copy link
Contributor

@dkorittki dkorittki commented Nov 28, 2025

Summary by CodeRabbit

  • Bug Fixes

    • Improved error handling and propagation in GraphQL subscriptions over Server-Sent Events (SSE).
    • Enhanced HTTP response flushing to ensure error responses are reliably transmitted.
    • Optimized response formatting to handle edge cases correctly.
  • Tests

    • Added test coverage for SSE error handling when subgraph requests fail.

✏️ Tip: You can customize this high-level summary in your review settings.

Checklist

  • I have discussed my proposed changes in an issue and have received approval to proceed.
  • I have followed the coding standards of the project.
  • Tests or benchmarks have been added or updated.
  • Documentation has been updated on https://github.com/wundergraph/cosmo-docs.
  • I have read the Contributors Guide.

Context

When we write errors to a connection backed by HttpFlushWriter (used on subscriptions via SSE/Multipart) we do not flush the connection as oposed to a websocket connection, where this actually happens. I fixed it by simply checking if we are dealing with a flush writer and flush if so.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 28, 2025

Walkthrough

This pull request adds SSE error handling tests and enhances HTTP error flushing behavior. Changes include a new subtest verifying error propagation when a subgraph returns 403 Forbidden over SSE, an additional HttpFlushWriter flush operation in error paths, and trailing newline trimming in response normalization.

Changes

Cohort / File(s) Summary
Test Enhancements
router-tests/http_subscriptions_test.go
Added subtest verifying SSE error handling for 403 Forbidden responses; reorganized response body cleanup with earlier defer placement
HTTP Error Flushing
router/core/graphql_handler.go
Added HttpFlushWriter flush operation after WebSocket response flushing in WriteError method
Response Normalization
router/core/flushwriter.go
Added trailing newline trimming in Flush() method before response concatenation

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(router): flush errors on sse streams' directly and clearly summarizes the main change: ensuring errors are flushed on SSE (Server-Sent Events) streams.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings


📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between efb126e and 9a5019e.

📒 Files selected for processing (2)
  • router-tests/http_subscriptions_test.go
  • router/core/graphql_handler.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • router-tests/http_subscriptions_test.go
  • router/core/graphql_handler.go

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link

github-actions bot commented Nov 28, 2025

Router image scan passed

✅ No security vulnerabilities found in image:

ghcr.io/wundergraph/cosmo/router:sha-41cc9f12a415895faaaa2ebc3e591e6129a44736

@codecov
Copy link

codecov bot commented Nov 28, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 61.73%. Comparing base (8721174) to head (9a5019e).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2377      +/-   ##
==========================================
+ Coverage   60.84%   61.73%   +0.88%     
==========================================
  Files         229      229              
  Lines       23839    23844       +5     
==========================================
+ Hits        14504    14719     +215     
+ Misses       8089     7869     -220     
- Partials     1246     1256      +10     
Files with missing lines Coverage Δ
router/core/flushwriter.go 81.81% <100.00%> (-1.08%) ⬇️
router/core/graphql_handler.go 68.65% <100.00%> (-1.27%) ⬇️

... and 19 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@dkorittki
Copy link
Contributor Author

Does Codecov recognize that tests have been added to the router-tests package? Seems like not.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
router-tests/http_subscriptions_test.go (1)

202-268: Consider verifying stream completion after the error.

The successful SSE test explicitly checks for an "event: complete" message (line 186), but this error test ends after asserting the two blank lines without verifying whether a completion event is sent or whether the stream closes cleanly with no further messages. Consider adding an assertion to confirm the expected stream termination behavior.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2c8275f and 4510955.

📒 Files selected for processing (1)
  • router-tests/http_subscriptions_test.go (2 hunks)
🧰 Additional context used
🧠 Learnings (6)
📓 Common learnings
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2141
File: router-tests/http_subscriptions_test.go:17-55
Timestamp: 2025-08-28T09:17:49.477Z
Learning: The Cosmo router uses a custom, intentionally rigid multipart implementation for GraphQL subscriptions. The multipart parsing in test files should remain strict and not be made more tolerant, as this rigidity is by design.
Learnt from: dkorittki
Repo: wundergraph/cosmo PR: 2273
File: router/core/graphql_handler.go:0-0
Timestamp: 2025-11-19T15:13:57.821Z
Learning: In the Cosmo router (wundergraph/cosmo), error handling follows a two-phase pattern: (1) Prehandler phase handles request parsing, validation, and setup errors using `httpGraphqlError` and `writeOperationError` (in files like graphql_prehandler.go, operation_processor.go, parse_multipart.go, batch.go); (2) Execution phase handles resolver execution errors using `WriteError` in GraphQLHandler.ServeHTTP. Because all `httpGraphqlError` instances are caught in the prehandler before ServeHTTP is invoked, any error type checks for `httpGraphqlError` in the execution-phase WriteError method are unreachable code.
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2141
File: router-tests/http_subscriptions_test.go:100-108
Timestamp: 2025-08-28T09:18:10.121Z
Learning: In router-tests/http_subscriptions_test.go heartbeat tests, the message ordering should remain strict with data messages followed by heartbeat messages, as the timing is deterministic and known by design in the Cosmo router implementation.
📚 Learning: 2025-08-28T09:18:10.121Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2141
File: router-tests/http_subscriptions_test.go:100-108
Timestamp: 2025-08-28T09:18:10.121Z
Learning: In router-tests/http_subscriptions_test.go heartbeat tests, the message ordering should remain strict with data messages followed by heartbeat messages, as the timing is deterministic and known by design in the Cosmo router implementation.

Applied to files:

  • router-tests/http_subscriptions_test.go
📚 Learning: 2025-10-01T20:39:16.113Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2252
File: router-tests/telemetry/telemetry_test.go:9684-9693
Timestamp: 2025-10-01T20:39:16.113Z
Learning: Repo preference: In router-tests/telemetry/telemetry_test.go, keep strict > 0 assertions for request.operation.*Time (parsingTime, normalizationTime, validationTime, planningTime) in telemetry-related tests; do not relax to >= 0 unless CI flakiness is observed.

Applied to files:

  • router-tests/http_subscriptions_test.go
📚 Learning: 2025-08-20T10:08:17.857Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2155
File: router/core/router.go:1857-1866
Timestamp: 2025-08-20T10:08:17.857Z
Learning: router/pkg/config/config.schema.json forbids null values for traffic_shaping.subgraphs: additionalProperties references $defs.traffic_shaping_subgraph_request_rule with type "object". Therefore, in core.NewSubgraphTransportOptions, dereferencing each subgraph rule pointer is safe under schema-validated configs, and a nil-check is unnecessary.

Applied to files:

  • router-tests/http_subscriptions_test.go
📚 Learning: 2025-09-17T20:55:39.456Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2172
File: router/core/graph_server.go:0-0
Timestamp: 2025-09-17T20:55:39.456Z
Learning: The Initialize method in router/internal/retrytransport/manager.go has been updated to properly handle feature-flag-only subgraphs by collecting subgraphs from both routerConfig.GetSubgraphs() and routerConfig.FeatureFlagConfigs.ConfigByFeatureFlagName, ensuring all subgraphs receive retry configuration.

Applied to files:

  • router-tests/http_subscriptions_test.go
📚 Learning: 2025-11-19T15:13:57.821Z
Learnt from: dkorittki
Repo: wundergraph/cosmo PR: 2273
File: router/core/graphql_handler.go:0-0
Timestamp: 2025-11-19T15:13:57.821Z
Learning: In the Cosmo router (wundergraph/cosmo), error handling follows a two-phase pattern: (1) Prehandler phase handles request parsing, validation, and setup errors using `httpGraphqlError` and `writeOperationError` (in files like graphql_prehandler.go, operation_processor.go, parse_multipart.go, batch.go); (2) Execution phase handles resolver execution errors using `WriteError` in GraphQLHandler.ServeHTTP. Because all `httpGraphqlError` instances are caught in the prehandler before ServeHTTP is invoked, any error type checks for `httpGraphqlError` in the execution-phase WriteError method are unreachable code.

Applied to files:

  • router-tests/http_subscriptions_test.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: build-router
  • GitHub Check: build_push_image (nonroot)
  • GitHub Check: image_scan (nonroot)
  • GitHub Check: build_push_image
  • GitHub Check: image_scan
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: build_test
  • GitHub Check: integration_test (./events)
  • GitHub Check: build_test
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (go)
🔇 Additional comments (3)
router-tests/http_subscriptions_test.go (3)

139-139: Good resource management practice.

Moving the defer immediately after acquiring the response ensures proper cleanup even if subsequent assertions fail.


202-268: Test effectively validates error propagation over SSE streams.

The test structure mirrors the successful SSE test appropriately and correctly simulates a subgraph error scenario. The middleware configuration and SSE parsing logic are sound.


260-266: SSE error responses end with two blank lines instead of one, creating an asymmetry with successful messages.

The test at lines 260-266 expects two consecutive blank lines after the error message. In contrast, successful SSE messages (lines 171, 181, 197), heartbeat events, and complete events all terminate with a single blank line. The error path writes event: next\ndata: followed by requestErrors.WriteResponse(w), then Flush(), while successful responses go through HttpFlushWriter with explicit \n\n separation. Verify whether this two-blank-line termination for errors is intentional per the GraphQL-core SSE specification or if the error response path should align with the single-blank-line pattern.

Copy link
Contributor

@alepane21 alepane21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@dkorittki
Copy link
Contributor Author

Note: Did some local tests to see if it works, but only with one simultaneous client. Gonna be sure and test with the Apollo client with some simultaneous connections.

@dkorittki
Copy link
Contributor Author

dkorittki commented Dec 8, 2025

I did some tests with a custom script using apollo client with both graphql-ws and graphql-sse as connection clients.

The error seems to be propagated correctly to clients on sse, now matching what ws returns. Here are log outputs:

ws-client

Starting subscription...
Received: {
  data: undefined,
  error: CombinedGraphQLErrors: My Test Error
      at Object.error (file:///[...]/apollo-client-demo/node_modules/@apollo/client/link/subscriptions/index.js:88:43)
      at Object.e12e92bd-23ca-497c-a384-e19a5651940e (file:///[...]/apollo-client-demo/node_modules/graphql-ws/dist/client.js:338:22)
      at emit (file:///[...]/apollo-client-demo/node_modules/graphql-ws/dist/client.js:75:58)
      at Object.emit (file:///[...]/apollo-client-demo/node_modules/graphql-ws/dist/client.js:100:11)
      at socket2.onmessage (file:///[...]/apollo-client-demo/node_modules/graphql-ws/dist/client.js:203:21)
      at [nodejs.internal.kHybridDispatch] (node:internal/event_target:827:20)
      at WebSocket.dispatchEvent (node:internal/event_target:762:26)
      at fireEvent (node:internal/deps/undici/undici:11663:14)
      at websocketMessageReceived (node:internal/deps/undici/undici:11685:7)
      at ByteParser.run (node:internal/deps/undici/undici:12307:19) {
    errors: [ [Object] ],
    data: undefined,
    extensions: undefined
  }
}

sse-client

Starting subscription...
Received: {
  data: undefined,
  error: CombinedGraphQLErrors: My Test Error
      at file:///[...]/apollo-client-demo/node_modules/@apollo/client/core/QueryManager.js:475:36
      at /[...]/apollo-client-demo/node_modules/rxjs/dist/cjs/internal/operators/map.js:10:37
      at OperatorSubscriber._this._next (/[...]/apollo-client-demo/node_modules/rxjs/dist/cjs/internal/operators/OperatorSubscriber.js:33:21)
      at Subscriber.next (/[...]/apollo-client-demo/node_modules/rxjs/dist/cjs/internal/Subscriber.js:51:18)
      at Subscriber._next (/[...]/apollo-client-demo/node_modules/rxjs/dist/cjs/internal/Subscriber.js:80:26)
      at Subscriber.next (/[...]/apollo-client-demo/node_modules/rxjs/dist/cjs/internal/Subscriber.js:51:18)
      at /[...]/apollo-client-demo/node_modules/rxjs/dist/cjs/internal/Subject.js:69:34
      at Object.errorContext (/[...]/apollo-client-demo/node_modules/rxjs/dist/cjs/internal/util/errorContext.js:22:9)
      at Subject.next (/[...]/apollo-client-demo/node_modules/rxjs/dist/cjs/internal/Subject.js:59:24)
      at Object.next (/[...]/apollo-client-demo/node_modules/rxjs/dist/cjs/internal/operators/share.js:69:58) {
    errors: [ [Object] ],
    data: null,
    extensions: undefined
  }
}

The only difference is that sse-client keeps reconnecting after the error. This seems to stem from the sse-client itself and has been the case before this change. It was just silent, no log output in that case, because the router never returned data on the connection prior to closing it.

I suspect the sse-client retries because we never sent a complete event after the error. According to the SSE specification, a complete event needs to be sent whenever we want to close the connection, which in this case is true. I'll see how we can add this to the code and if it really makes sense.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
router/core/websocket.go (1)

1050-1067: Consider also cleaning up subscriptions state when subscription setup fails

In the SubscriptionResponsePlan error path you now emit the error, flush, and send a complete, which is good for client semantics. However, this path doesn’t remove the entry from h.subscriptions, so operations that fail during AsyncResolveGraphQLSubscription will leave behind a mapping from msg.ID to subscriptionID.

By contrast, WebSocketConnectionHandler.Complete deletes the subscription ID before sending complete. To avoid accumulating stale entries on long‑lived connections, consider also deleting the subscription here (e.g., h.subscriptions.Delete(registration.msg.ID)), or refactoring through a helper that both sends error/complete and cleans up the state while preserving the desired flush/complete ordering.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5d2dd99 and 0b323e0.

📒 Files selected for processing (1)
  • router/core/websocket.go (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2141
File: router-tests/http_subscriptions_test.go:17-55
Timestamp: 2025-08-28T09:17:49.477Z
Learning: The Cosmo router uses a custom, intentionally rigid multipart implementation for GraphQL subscriptions. The multipart parsing in test files should remain strict and not be made more tolerant, as this rigidity is by design.
Learnt from: dkorittki
Repo: wundergraph/cosmo PR: 2273
File: router/core/graphql_handler.go:0-0
Timestamp: 2025-11-19T15:13:57.821Z
Learning: In the Cosmo router (wundergraph/cosmo), error handling follows a two-phase pattern: (1) Prehandler phase handles request parsing, validation, and setup errors using `httpGraphqlError` and `writeOperationError` (in files like graphql_prehandler.go, operation_processor.go, parse_multipart.go, batch.go); (2) Execution phase handles resolver execution errors using `WriteError` in GraphQLHandler.ServeHTTP. Because all `httpGraphqlError` instances are caught in the prehandler before ServeHTTP is invoked, any error type checks for `httpGraphqlError` in the execution-phase WriteError method are unreachable code.
📚 Learning: 2025-11-19T15:13:57.821Z
Learnt from: dkorittki
Repo: wundergraph/cosmo PR: 2273
File: router/core/graphql_handler.go:0-0
Timestamp: 2025-11-19T15:13:57.821Z
Learning: In the Cosmo router (wundergraph/cosmo), error handling follows a two-phase pattern: (1) Prehandler phase handles request parsing, validation, and setup errors using `httpGraphqlError` and `writeOperationError` (in files like graphql_prehandler.go, operation_processor.go, parse_multipart.go, batch.go); (2) Execution phase handles resolver execution errors using `WriteError` in GraphQLHandler.ServeHTTP. Because all `httpGraphqlError` instances are caught in the prehandler before ServeHTTP is invoked, any error type checks for `httpGraphqlError` in the execution-phase WriteError method are unreachable code.

Applied to files:

  • router/core/websocket.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: build-router
  • GitHub Check: build_test
  • GitHub Check: image_scan
  • GitHub Check: image_scan (nonroot)
  • GitHub Check: build_push_image (nonroot)
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: integration_test (./events)
  • GitHub Check: build_push_image
  • GitHub Check: Analyze (go)
  • GitHub Check: build_test
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
router/core/graphql_handler.go (1)

435-441: Consider logging flush errors instead of discarding them.

Both the websocket and HTTP flush error results are currently discarded. While this is consistent with the existing pattern, logging these errors could help diagnose issues where clients don't receive error responses due to flush failures (e.g., broken connections, write timeouts).

Apply this diff to log flush errors for both paths:

 	if wsRw, ok := w.(*websocketResponseWriter); ok {
-		_ = wsRw.Flush()
+		if err := wsRw.Flush(); err != nil {
+			requestLogger.Debug("Failed to flush websocket error response", zap.Error(err))
+		}
 	}

 	if httpRw, ok := w.(*HttpFlushWriter); ok {
-		_ = httpRw.Flush()
+		if err := httpRw.Flush(); err != nil {
+			requestLogger.Debug("Failed to flush HTTP error response", zap.Error(err))
+		}
 	}
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0b323e0 and 1973c6c.

📒 Files selected for processing (2)
  • router/core/graphql_handler.go (1 hunks)
  • router/core/websocket.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • router/core/websocket.go
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: dkorittki
Repo: wundergraph/cosmo PR: 2273
File: router/core/graphql_handler.go:0-0
Timestamp: 2025-11-19T15:13:57.821Z
Learning: In the Cosmo router (wundergraph/cosmo), error handling follows a two-phase pattern: (1) Prehandler phase handles request parsing, validation, and setup errors using `httpGraphqlError` and `writeOperationError` (in files like graphql_prehandler.go, operation_processor.go, parse_multipart.go, batch.go); (2) Execution phase handles resolver execution errors using `WriteError` in GraphQLHandler.ServeHTTP. Because all `httpGraphqlError` instances are caught in the prehandler before ServeHTTP is invoked, any error type checks for `httpGraphqlError` in the execution-phase WriteError method are unreachable code.
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2141
File: router-tests/http_subscriptions_test.go:17-55
Timestamp: 2025-08-28T09:17:49.477Z
Learning: The Cosmo router uses a custom, intentionally rigid multipart implementation for GraphQL subscriptions. The multipart parsing in test files should remain strict and not be made more tolerant, as this rigidity is by design.
📚 Learning: 2025-11-19T15:13:57.821Z
Learnt from: dkorittki
Repo: wundergraph/cosmo PR: 2273
File: router/core/graphql_handler.go:0-0
Timestamp: 2025-11-19T15:13:57.821Z
Learning: In the Cosmo router (wundergraph/cosmo), error handling follows a two-phase pattern: (1) Prehandler phase handles request parsing, validation, and setup errors using `httpGraphqlError` and `writeOperationError` (in files like graphql_prehandler.go, operation_processor.go, parse_multipart.go, batch.go); (2) Execution phase handles resolver execution errors using `WriteError` in GraphQLHandler.ServeHTTP. Because all `httpGraphqlError` instances are caught in the prehandler before ServeHTTP is invoked, any error type checks for `httpGraphqlError` in the execution-phase WriteError method are unreachable code.

Applied to files:

  • router/core/graphql_handler.go
🧬 Code graph analysis (1)
router/core/graphql_handler.go (1)
router/core/flushwriter.go (1)
  • HttpFlushWriter (34-47)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: build_test
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: integration_test (./events)
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: build_push_image (nonroot)
  • GitHub Check: build_test
  • GitHub Check: build_push_image
  • GitHub Check: image_scan
  • GitHub Check: image_scan (nonroot)
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (go)
🔇 Additional comments (1)
router/core/graphql_handler.go (1)

438-441: LGTM! Flush implementation correctly mirrors websocket pattern.

The added flush for HttpFlushWriter ensures that error responses are immediately sent to SSE and multipart subscription clients, fixing the issue where errors were buffered and not delivered. The implementation is consistent with the existing websocket flush pattern.

@github-actions
Copy link

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added Stale and removed Stale labels Dec 27, 2025
@github-actions
Copy link

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Jan 11, 2026
After an SSE error is flushed on the connection the router will
sent an complete event to tell the client the error is terminal.
Also a bug got fixed alongside this where too many newlines are inserted
between the error and complete event. This happened because
the payload already contained a newline. This is now correctly
trimmed before the seperators are added to the response payload
during flushing.
@dkorittki dkorittki removed the Stale label Jan 12, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
router/core/flushwriter.go (1)

146-148: Verify behavior with multiple trailing newlines.

The bytes.HasSuffix check detects a single trailing '\n', but bytes.TrimRight(resp, "\n") removes all trailing newlines. If resp ends with "\n\n", this will strip both.

If the intent is to remove only a single trailing newline, consider using bytes.TrimSuffix instead:

Suggested alternative if only one newline should be trimmed
 	if bytes.HasSuffix(resp, []byte{'\n'}) {
-		resp = bytes.TrimRight(resp, "\n")
+		resp = bytes.TrimSuffix(resp, []byte{'\n'})
 	}

If stripping all trailing newlines is intentional for normalization, the current code is correct—just note the behavior difference.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1973c6c and efb126e.

📒 Files selected for processing (4)
  • router-tests/http_subscriptions_test.go
  • router/core/flushwriter.go
  • router/core/graphql_handler.go
  • router/core/websocket.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • router/core/graphql_handler.go
🧰 Additional context used
🧠 Learnings (7)
📓 Common learnings
Learnt from: dkorittki
Repo: wundergraph/cosmo PR: 2273
File: router/core/graphql_handler.go:0-0
Timestamp: 2025-11-19T15:13:57.821Z
Learning: In the Cosmo router (wundergraph/cosmo), error handling follows a two-phase pattern: (1) Prehandler phase handles request parsing, validation, and setup errors using `httpGraphqlError` and `writeOperationError` (in files like graphql_prehandler.go, operation_processor.go, parse_multipart.go, batch.go); (2) Execution phase handles resolver execution errors using `WriteError` in GraphQLHandler.ServeHTTP. Because all `httpGraphqlError` instances are caught in the prehandler before ServeHTTP is invoked, any error type checks for `httpGraphqlError` in the execution-phase WriteError method are unreachable code.
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2141
File: router-tests/http_subscriptions_test.go:17-55
Timestamp: 2025-08-28T09:17:49.477Z
Learning: The Cosmo router uses a custom, intentionally rigid multipart implementation for GraphQL subscriptions. The multipart parsing in test files should remain strict and not be made more tolerant, as this rigidity is by design.
📚 Learning: 2025-08-28T09:18:10.121Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2141
File: router-tests/http_subscriptions_test.go:100-108
Timestamp: 2025-08-28T09:18:10.121Z
Learning: In router-tests/http_subscriptions_test.go heartbeat tests, the message ordering should remain strict with data messages followed by heartbeat messages, as the timing is deterministic and known by design in the Cosmo router implementation.

Applied to files:

  • router-tests/http_subscriptions_test.go
📚 Learning: 2025-10-01T20:39:16.113Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2252
File: router-tests/telemetry/telemetry_test.go:9684-9693
Timestamp: 2025-10-01T20:39:16.113Z
Learning: Repo preference: In router-tests/telemetry/telemetry_test.go, keep strict > 0 assertions for request.operation.*Time (parsingTime, normalizationTime, validationTime, planningTime) in telemetry-related tests; do not relax to >= 0 unless CI flakiness is observed.

Applied to files:

  • router-tests/http_subscriptions_test.go
📚 Learning: 2025-08-20T10:08:17.857Z
Learnt from: endigma
Repo: wundergraph/cosmo PR: 2155
File: router/core/router.go:1857-1866
Timestamp: 2025-08-20T10:08:17.857Z
Learning: router/pkg/config/config.schema.json forbids null values for traffic_shaping.subgraphs: additionalProperties references $defs.traffic_shaping_subgraph_request_rule with type "object". Therefore, in core.NewSubgraphTransportOptions, dereferencing each subgraph rule pointer is safe under schema-validated configs, and a nil-check is unnecessary.

Applied to files:

  • router-tests/http_subscriptions_test.go
📚 Learning: 2025-09-17T20:55:39.456Z
Learnt from: SkArchon
Repo: wundergraph/cosmo PR: 2172
File: router/core/graph_server.go:0-0
Timestamp: 2025-09-17T20:55:39.456Z
Learning: The Initialize method in router/internal/retrytransport/manager.go has been updated to properly handle feature-flag-only subgraphs by collecting subgraphs from both routerConfig.GetSubgraphs() and routerConfig.FeatureFlagConfigs.ConfigByFeatureFlagName, ensuring all subgraphs receive retry configuration.

Applied to files:

  • router-tests/http_subscriptions_test.go
📚 Learning: 2025-11-19T15:13:57.821Z
Learnt from: dkorittki
Repo: wundergraph/cosmo PR: 2273
File: router/core/graphql_handler.go:0-0
Timestamp: 2025-11-19T15:13:57.821Z
Learning: In the Cosmo router (wundergraph/cosmo), error handling follows a two-phase pattern: (1) Prehandler phase handles request parsing, validation, and setup errors using `httpGraphqlError` and `writeOperationError` (in files like graphql_prehandler.go, operation_processor.go, parse_multipart.go, batch.go); (2) Execution phase handles resolver execution errors using `WriteError` in GraphQLHandler.ServeHTTP. Because all `httpGraphqlError` instances are caught in the prehandler before ServeHTTP is invoked, any error type checks for `httpGraphqlError` in the execution-phase WriteError method are unreachable code.

Applied to files:

  • router-tests/http_subscriptions_test.go
  • router/core/websocket.go
📚 Learning: 2026-01-06T12:37:21.521Z
Learnt from: asoorm
Repo: wundergraph/cosmo PR: 2379
File: router/pkg/connectrpc/operation_registry_test.go:381-399
Timestamp: 2026-01-06T12:37:21.521Z
Learning: In Go code (Go 1.25+), prefer using sync.WaitGroup.Go(func()) to run a function in a new goroutine, letting the WaitGroup manage Add/Done automatically. Avoid manual wg.Add(1) followed by go func() { defer wg.Done(); ... }() patterns. Apply this guidance across all Go files in the wundergraph/cosmo repository where concurrency is used.

Applied to files:

  • router-tests/http_subscriptions_test.go
  • router/core/websocket.go
  • router/core/flushwriter.go
🧬 Code graph analysis (1)
router-tests/http_subscriptions_test.go (2)
router-tests/testenv/testenv.go (2)
  • SubgraphsConfig (374-387)
  • Environment (1735-1771)
router-tests/testenv/utils.go (1)
  • AwaitChannelWithT (10-19)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: image_scan (nonroot)
  • GitHub Check: build_test
  • GitHub Check: build_test
  • GitHub Check: integration_test (./events)
  • GitHub Check: integration_test (./. ./fuzzquery ./lifecycle ./modules)
  • GitHub Check: integration_test (./telemetry)
  • GitHub Check: build_push_image
  • GitHub Check: image_scan
  • GitHub Check: build_push_image (nonroot)
  • GitHub Check: Analyze (go)
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (3)
router/core/websocket.go (1)

1061-1070: LGTM! Correctly flushes and completes subscription on error.

This change ensures that when a subscription resolution error occurs, the error is flushed to the client and the subscription is properly marked as complete. This aligns the error path with the success path (lines 1059-1060) and fixes the issue where SSE/multipart clients were not receiving error payloads.

router-tests/http_subscriptions_test.go (2)

139-139: LGTM! Better resource cleanup placement.

Moving defer resp.Body.Close() immediately after client.Do ensures the body is closed even if subsequent operations (like require.Equal) fail.


202-276: LGTM! Comprehensive test for SSE error propagation.

This test effectively validates the PR's core fix by:

  1. Configuring the Employees subgraph to return a 403 Forbidden response
  2. Sending an SSE subscription request
  3. Asserting the complete SSE event sequence including the error payload and completion event

The test verifies that errors are now properly flushed to SSE clients, matching the behavior previously only available for WebSocket connections.

@dkorittki dkorittki marked this pull request as draft January 13, 2026 08:39
@dkorittki
Copy link
Contributor Author

dkorittki commented Jan 13, 2026

Added commits to make the router send a complete event after sending an error event. I also updated the pull request description with details about that. Please have a look at that for more context.

EDIT: Removed it again. Sending complete events is not the task of the WriteError method. That method is responsible for sending errors on the stream, not closing/completing them. I believe the router should sent them at various places where we also write an error to the stream, but we are better off fixing this in a separate pull request. It might involve a bit more work and it's an unrelated problem, really.

When writing errors no complete events should be
sent. This is not the task of the WriteError
method. The callee of WriteError might want to
keep the connection open. It's the responsibility
of the callee wether to send complete events or
not. This change makes the writeError behave the same
for sse as it does for websockets, where no
connections are marked as terminal either.
@dkorittki dkorittki marked this pull request as ready for review January 14, 2026 14:40
Comment on lines 433 to +439
if wsRw, ok := w.(*websocketResponseWriter); ok {
_ = wsRw.Flush()
}

if httpRw, ok := w.(*HttpFlushWriter); ok {
_ = httpRw.Flush()
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be checked against the subscription writer interface or like an inline "Flusher" interface to avoid the duplication?

Comment on lines +146 to +148
if bytes.HasSuffix(resp, []byte{'\n'}) {
resp = bytes.TrimRight(resp, "\n")
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment justifying this behaviour?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants