Skip to content

Change connection pool idle expiration logic (#19004)#216

Merged
andyedison merged 1 commit intorelease-21.0-githubfrom
backport-19004
Mar 11, 2026
Merged

Change connection pool idle expiration logic (#19004)#216
andyedison merged 1 commit intorelease-21.0-githubfrom
backport-19004

Conversation

@andyedison
Copy link

Description

This PR is a backport of upstream PR vitessio#19004, which is a follow up to another backport (vitessio#18967) that was applied to this branch and caused memory leak/pressure causing an incident. This PR is expected to resolve that issue

Related Issue(s)

fixes https://github.com/github/vitess/issues/1601

Checklist

  • "Backport to:" labels have been added if this change should be back-ported
  • Tests were added or are not required
  • Documentation was added or is not required

Deployment Notes

Signed-off-by: Arthur Schreiber <arthur@planetscale.com>
Copilot AI review requested due to automatic review settings March 11, 2026 16:28
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Backport adjusting smartconnpool idle-connection expiration to avoid lingering/accumulating expired connections that can cause memory pressure, plus test/benchmark coverage for the regression.

Changes:

  • Rework idle-closer logic to pop connections from stacks, close/reopen a bounded subset, and return the rest.
  • Update the existing leak regression test to run concurrent Get calls and to use non-fatal assertions inside Eventually.
  • Add a new regression test for lingering connections after idle-timeout cycling, plus a benchmark for idle cleanup when no connections are idle-expired.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
go/pools/smartconnpool/pool.go Changes idle timeout cleanup strategy to pop/partition valid vs expired connections, close/reopen expired ones, and return connections back to the pool.
go/pools/smartconnpool/pool_test.go Adjusts an existing regression test, adds a new regression test around lingering conns, and adds a benchmark for idle cleanup.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

//
// We will expire any additional connections in the next iteration of the idle closer.
expiredConnections := make([]*Pooled[C], 0, max(activeConnections/2, 1))
validConnections := make([]*Pooled[C], 0, activeConnections)
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validConnections := make(..., 0, activeConnections) preallocates space for all active conns for every non-empty stack on each idle-closer tick (runs at IdleTimeout/10). For large pools this can allocate substantial memory repeatedly even when only a few conns are popped. Consider capping the preallocation (or not preallocating) so the idle closer doesn’t introduce avoidable memory pressure.

Suggested change
validConnections := make([]*Pooled[C], 0, activeConnections)
validConnections := make([]*Pooled[C], 0)

Copilot uses AI. Check for mistakes.

// Wait a bit for the idle timeout worker to refresh connections
assert.Eventually(t, func() bool {
return p.Metrics.IdleClosed() > 10
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assertion says "Expected at least 10 connections" but the predicate is p.Metrics.IdleClosed() > 10, which actually requires 11+. This can make the test flaky if exactly 10 idle closes occur within the window. Use >= 10 (or adjust the message/threshold to match the intended behavior).

Suggested change
return p.Metrics.IdleClosed() > 10
return p.Metrics.IdleClosed() >= 10

Copilot uses AI. Check for mistakes.
Comment on lines +1447 to +1450
b.ResetTimer()

for b.Loop() {
p.closeIdleResources(time.Now())
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the benchmark loop, calling time.Now() each iteration can dominate the measured cost and make it harder to attribute time to closeIdleResources itself. Consider computing now := time.Now() once (or using a fixed time) outside the loop and reusing it inside.

Suggested change
b.ResetTimer()
for b.Loop() {
p.closeIdleResources(time.Now())
// Use a fixed time value to avoid measuring time.Now() cost in the benchmark loop.
now := time.Now()
b.ResetTimer()
for b.Loop() {
p.closeIdleResources(now)

Copilot uses AI. Check for mistakes.
Comment on lines +757 to +759
expiredConnections := make([]*Pooled[C], 0, max(activeConnections/2, 1))
validConnections := make([]*Pooled[C], 0, activeConnections)

Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pool.Active() returns an int64, but make([]*Pooled[C], 0, max(activeConnections/2, 1)) and make(..., 0, activeConnections) require an int capacity. As written, this won’t compile; cast after bounding to a safe int size (and consider using a smaller cap as needed).

Copilot uses AI. Check for mistakes.
@andyedison andyedison merged commit 865f789 into release-21.0-github Mar 11, 2026
93 of 104 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants