Change connection pool idle expiration logic (#19004)#216
Change connection pool idle expiration logic (#19004)#216andyedison merged 1 commit intorelease-21.0-githubfrom
Conversation
Signed-off-by: Arthur Schreiber <arthur@planetscale.com>
There was a problem hiding this comment.
Pull request overview
Backport adjusting smartconnpool idle-connection expiration to avoid lingering/accumulating expired connections that can cause memory pressure, plus test/benchmark coverage for the regression.
Changes:
- Rework idle-closer logic to pop connections from stacks, close/reopen a bounded subset, and return the rest.
- Update the existing leak regression test to run concurrent
Getcalls and to use non-fatal assertions insideEventually. - Add a new regression test for lingering connections after idle-timeout cycling, plus a benchmark for idle cleanup when no connections are idle-expired.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| go/pools/smartconnpool/pool.go | Changes idle timeout cleanup strategy to pop/partition valid vs expired connections, close/reopen expired ones, and return connections back to the pool. |
| go/pools/smartconnpool/pool_test.go | Adjusts an existing regression test, adds a new regression test around lingering conns, and adds a benchmark for idle cleanup. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // | ||
| // We will expire any additional connections in the next iteration of the idle closer. | ||
| expiredConnections := make([]*Pooled[C], 0, max(activeConnections/2, 1)) | ||
| validConnections := make([]*Pooled[C], 0, activeConnections) |
There was a problem hiding this comment.
validConnections := make(..., 0, activeConnections) preallocates space for all active conns for every non-empty stack on each idle-closer tick (runs at IdleTimeout/10). For large pools this can allocate substantial memory repeatedly even when only a few conns are popped. Consider capping the preallocation (or not preallocating) so the idle closer doesn’t introduce avoidable memory pressure.
| validConnections := make([]*Pooled[C], 0, activeConnections) | |
| validConnections := make([]*Pooled[C], 0) |
|
|
||
| // Wait a bit for the idle timeout worker to refresh connections | ||
| assert.Eventually(t, func() bool { | ||
| return p.Metrics.IdleClosed() > 10 |
There was a problem hiding this comment.
The assertion says "Expected at least 10 connections" but the predicate is p.Metrics.IdleClosed() > 10, which actually requires 11+. This can make the test flaky if exactly 10 idle closes occur within the window. Use >= 10 (or adjust the message/threshold to match the intended behavior).
| return p.Metrics.IdleClosed() > 10 | |
| return p.Metrics.IdleClosed() >= 10 |
| b.ResetTimer() | ||
|
|
||
| for b.Loop() { | ||
| p.closeIdleResources(time.Now()) |
There was a problem hiding this comment.
In the benchmark loop, calling time.Now() each iteration can dominate the measured cost and make it harder to attribute time to closeIdleResources itself. Consider computing now := time.Now() once (or using a fixed time) outside the loop and reusing it inside.
| b.ResetTimer() | |
| for b.Loop() { | |
| p.closeIdleResources(time.Now()) | |
| // Use a fixed time value to avoid measuring time.Now() cost in the benchmark loop. | |
| now := time.Now() | |
| b.ResetTimer() | |
| for b.Loop() { | |
| p.closeIdleResources(now) |
| expiredConnections := make([]*Pooled[C], 0, max(activeConnections/2, 1)) | ||
| validConnections := make([]*Pooled[C], 0, activeConnections) | ||
|
|
There was a problem hiding this comment.
pool.Active() returns an int64, but make([]*Pooled[C], 0, max(activeConnections/2, 1)) and make(..., 0, activeConnections) require an int capacity. As written, this won’t compile; cast after bounding to a safe int size (and consider using a smaller cap as needed).
Description
This PR is a backport of upstream PR vitessio#19004, which is a follow up to another backport (vitessio#18967) that was applied to this branch and caused memory leak/pressure causing an incident. This PR is expected to resolve that issue
Related Issue(s)
fixes https://github.com/github/vitess/issues/1601
Checklist
Deployment Notes