Skip to content

fix(sentinel): preserve seed nodes in sentinelRootNodes after topology update (#3237)#3306

Open
aartisonigra wants to merge 8 commits into
redis:masterfrom
aartisonigra:fix/sentinel-seed-nodes-3237
Open

fix(sentinel): preserve seed nodes in sentinelRootNodes after topology update (#3237)#3306
aartisonigra wants to merge 8 commits into
redis:masterfrom
aartisonigra:fix/sentinel-seed-nodes-3237

Conversation

@aartisonigra
Copy link
Copy Markdown
Contributor

@aartisonigra aartisonigra commented Jun 2, 2026

Problem

When sentinel reported its peer list (IP-based nodes), the client
replaced sentinelRootNodes entirely with those IPs, dropping the
original hostname-based seed nodes.

After a full outage where all sentinels restarted with new IPs, the
client had no working address to reconnect to — it kept retrying
stale IPs forever.

Fixes #3237

Root Cause

In transform():
this.#sentinelRootNodes = analyzed.sentinelList;
This dropped the original hostname seed nodes completely.

Fix

Added #mergeSentinelNodes() that:

  1. Always keeps seed nodes (hostnames) first
  2. Appends newly discovered nodes (IPs) without duplicates

So DNS-based hostnames are always available for reconnection
even after a full outage with new IPs.

Files Changed

  • packages/client/lib/sentinel/index.ts — core fix
  • packages/client/lib/sentinel/index.spec.ts — 2 new tests

Tests Added

  • seed nodes are preserved in sentinelRootNodes after sentinel list update
  • mergeSentinelNodes: seed hostnames always come first, IPs appended without duplicates

Note

Medium Risk
Changes Sentinel failover/reconnect address bookkeeping; wrong merging could affect discovery after outages, but scope is narrow and covered by new regression tests.

Overview
Fixes Sentinel clients losing reconnect targets after topology updates when discovery returns IP-only peers (#3237).

transform() no longer assigns sentinelRootNodes from analyzed.sentinelList alone. It uses new #mergeSentinelNodes(): configured seed nodes stay first (deduped by host:port), then newly discovered nodes are appended. SENTINE_LIST_CHANGE reports the merged list size. Initial sentinelRootNodes is seeded directly from options (same as seeds).

Tests drive internal transform() with IP-only discovery stubs and assert hostname seeds remain, merged size is 5, and discovered hostnames lead the list when seeds were IP literals.

Reviewed by Cursor Bugbot for commit e971220. Bugbot is set up for automated code reviews on this repo. Configure here.

Comment thread packages/client/lib/sentinel/index.ts Outdated
Comment thread packages/client/lib/sentinel/index.spec.ts
Comment thread packages/client/lib/sentinel/index.spec.ts
Comment thread packages/client/lib/sentinel/index.ts Outdated
@aartisonigra aartisonigra force-pushed the fix/sentinel-seed-nodes-3237 branch from 789da3a to c46c390 Compare June 2, 2026 11:57
@aartisonigra
Copy link
Copy Markdown
Contributor Author

Hi, I would like to work on this issue. Could you please assign it to me?

@nkaradzhov
Copy link
Copy Markdown
Collaborator

@aartisonigra thanks, i will try to have a look today!

@nkaradzhov
Copy link
Copy Markdown
Collaborator

IP-literal seeds get stuck:
The merge unconditionally prepends #sentinelSeedNodes. Right for hostnames — DNS re-resolution is the whole point. But if someone configures IP seeds and the cluster gets reprovisioned with new IPs, the old seed IPs stay in #sentinelRootNodes forever. They never recover, and observe() hits them first on every reconnect — dead-IP timeouts before reaching the live ones.

Could we filter at construction so only hostnames count as seeds?

this.#sentinelSeedNodes = Array.from(options.sentinelRootNodes)
  .filter(n => net.isIP(n.host) === 0);

Plus a test for IP-only seeds with disjoint discovery, to lock the behavior in.

Tests don't actually hit the new code

The first test reflects into _self by symbol-name fishing and has a fallback that asserts seedNodes.length === 2 — that passes no matter what. The second one reimplements the merge logic inline and checks the duplicate, never calls #mergeSentinelNodes. Can we drive this through transform() end-to-end? The full-outage integration test from #3188 is a good starting point — just have the sentinels come back with different IPs and assert recovery via the hostname seed.

Unrelated type change

#createClient went from AnyRedisClientOptions to the explicit generic. Not in the description and unrelated to the seed fix — split it or mention why.

SENTINE_LIST_CHANGE.size meaning changed

It's now the merged length, not the discovered count. Behavior change for any consumer of the event — worth a note in the description at least.

@aartisonigra
Copy link
Copy Markdown
Contributor Author

Thanks for the feedback.

I've updated the regression test to validate the behavior through the real transform() flow and removed the previous merge-logic style assertions. The test now verifies the emitted SENTINE_LIST_CHANGE.size from the implementation itself.

Please let me know if you'd like any additional coverage or changes.

Comment thread packages/client/lib/sentinel/index.ts
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Reviewed by Cursor Bugbot for commit fd9ed95. Configure here.

Comment thread packages/client/lib/sentinel/index.ts
@aartisonigra
Copy link
Copy Markdown
Contributor Author

I’ve addressed the review comments and updated the PR with the latest fixes.

Please let me know if any further changes are needed. Looking forward to your review.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Sentinel client not able to recover after all sentinels went down

2 participants