perf(stmtcache): optimize LRU cache with custom linked list and node pooling #2464

analytically · 2026-01-02T12:43:29Z

Replace container/list with custom typed doubly-linked list and add node pooling to eliminate allocations in steady-state eviction cycles.

Optimizations:

Custom typed linked list eliminates interface{} boxing and type assertions
Sentinel head/tail nodes remove nil checks in list operations
Node freelist reuses evicted nodes (zero allocations in steady-state)
clear() instead of map reallocation reduces GC pressure
Slice length reset preserves capacity for reuse

Benchmark improvements (capacity=512, matching pgx default):

High hit rate (99% cache hits): 9% faster
Steady-state churn: 59% faster, zero allocations (was 4 allocs/op)
DeallocateAll cycle: 29% faster
Put with eviction: 22% faster, 50% fewer allocations

The largest gains are in churn scenarios where the cache is at capacity and statements are constantly evicted/added - exactly what happens in production with a fixed-size prepared statement cache.

analytically · 2026-01-02T12:49:45Z

LRU Cache Optimization: 59% faster, zero allocations in steady-state

Get() — 9% faster

Aspect	Original	V2
Type assertion	Yes (`interface{}` → `*SD`)	No (typed field)
Validation checks	`e.list != l`, `l.root.next == e`	`node.prev == c.head`
Nil checks in list ops	Multiple	None (sentinels guarantee valid pointers)

Put() — 22% faster, 50% fewer allocs

Aspect	Original	V2
Node allocation	Every Put	Only when freelist empty
Steady-state allocs	1 per Put	0 per Put

RemoveInvalidated() — 50% faster

Aspect	Original	V2
Slice	Discards backing array	Reuses existing capacity
Map	New allocation	Clears in-place

InvalidateAll() — 29% faster

Aspect	Original	V2
Per-node type assertion	Yes	No
Node memory	Lost to GC	Recycled to freelist
Map	New allocation	Cleared in-place
List	New allocation	Reset pointers only

Assembly: Get() — 13% smaller

Metric	Original	V2
Code size	380 bytes	329 bytes (13% smaller)

Summary of Optimizations

Optimization	Location	Impact
Typed node field	`lruNodeV2.sd` vs `Element.Value`	No type assertion, direct field access
Sentinel nodes	`head`/`tail` dummy nodes	Eliminates nil checks in all list operations
Node pooling	`freelist`, `allocNode()`, `freeNode()`	Zero allocations in steady-state churn
Map reuse	`clear(c.m)` vs `make()`	Reduces GC pressure
Slice capacity reuse	`[:0]` vs `= nil`	Preserves backing array
Explicit length	`c.len` field vs `c.l.Len()`	Avoids method call overhead

analytically · 2026-01-02T12:51:02Z

Memory net result: ~40% less memory for node storage, plus the freelist retains nodes after eviction rather than creating GC pressure. The freelist can grow unbounded in theory, but in practice it's capped by the cache capacity since nodes cycle between active use and freelist.

…pooling Replace container/list with custom typed doubly-linked list and add node pooling to eliminate allocations in steady-state eviction cycles. Optimizations: - Custom typed linked list eliminates interface{} boxing and type assertions - Sentinel head/tail nodes remove nil checks in list operations - Node freelist reuses evicted nodes (zero allocations in steady-state) - clear() instead of map reallocation reduces GC pressure - Slice length reset preserves capacity for reuse Benchmark improvements (capacity=512, matching pgx default): - High hit rate (99% cache hits): 9% faster - Steady-state churn: 59% faster, zero allocations (was 4 allocs/op) - DeallocateAll cycle: 29% faster - Put with eviction: 22% faster, 50% fewer allocations The largest gains are in churn scenarios where the cache is at capacity and statements are constantly evicted/added - exactly what happens in production with a fixed-size prepared statement cache. Signed-off-by: Mathias Bogaert <[email protected]>

jackc · 2026-01-04T14:45:41Z

👍

analytically force-pushed the perf/lru branch from 4307df4 to 679f28e Compare January 2, 2026 13:24

jackc merged commit 777dec0 into jackc:master Jan 4, 2026
14 checks passed

analytically deleted the perf/lru branch January 4, 2026 19:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

perf(stmtcache): optimize LRU cache with custom linked list and node pooling #2464

perf(stmtcache): optimize LRU cache with custom linked list and node pooling #2464

analytically commented Jan 2, 2026

Uh oh!

analytically commented Jan 2, 2026

Uh oh!

analytically commented Jan 2, 2026

Uh oh!

Uh oh!

jackc commented Jan 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

perf(stmtcache): optimize LRU cache with custom linked list and node pooling #2464

perf(stmtcache): optimize LRU cache with custom linked list and node pooling #2464

Conversation

analytically commented Jan 2, 2026

Uh oh!

analytically commented Jan 2, 2026

LRU Cache Optimization: 59% faster, zero allocations in steady-state

Get() — 9% faster

Put() — 22% faster, 50% fewer allocs

RemoveInvalidated() — 50% faster

InvalidateAll() — 29% faster

Assembly: Get() — 13% smaller

Summary of Optimizations

Uh oh!

analytically commented Jan 2, 2026

Uh oh!

Uh oh!

jackc commented Jan 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants