Skip to content

Reduce contention in attribute metadata caches#912

Closed
Symmetricity wants to merge 2 commits into
systemed:masterfrom
Symmetricity:perf/attribute-metadata-contention
Closed

Reduce contention in attribute metadata caches#912
Symmetricity wants to merge 2 commits into
systemed:masterfrom
Symmetricity:perf/attribute-metadata-contention

Conversation

@Symmetricity
Copy link
Copy Markdown
Contributor

This PR is AI generated.

Summary

This reduces lock contention and repeated metadata work in two hot paths used
while writing OpenMapTiles attributes:

  • increase the small thread-local AttributeStore lookup caches from 64 to 256
    entries
  • cache vector-layer metadata updates per worker thread so repeated writes of
    the same layer/key/type do not repeatedly take vectorLayerMetadataMutex

On a warmed Austria OpenMapTiles run, measured with the same changes stacked on
#902, this reduced wall time by about 4.9%, system CPU by about 43%,
voluntary context switches by about 44%, and peak RSS by about 0.9%.

The patch also applies cleanly to current upstream master independently of
#902, which is how this PR is submitted.

Background

Large OpenMapTiles runs spend a lot of time repeatedly assigning attributes and
recording vector-layer metadata.

AttributeStore already has small thread-local direct-mapped caches to avoid
falling through to the shared sharded stores. This PR keeps that existing
approach, but makes the cache slightly larger so common attribute pairs and
sets survive more collision patterns.

OsmLuaProcessing::setVectorLayerMetadata() records the type of each
layer/key pair for the final vector_layers metadata table. The shared write
still needs a mutex, but most calls repeat the same layer/key/type combination
many times from the same worker thread. This PR skips the mutex path for those
same-thread exact repeats.

Implementation

The AttributeStore change is deliberately small:

  • cachedAttributePairPointers and cachedAttributePairIndexes grow from 64
    to 256 entries per thread
  • cachedAttributeSetPointers and cachedAttributeSetIndexes grow from 64 to
    256 entries per thread
  • the lookup behavior, shard selection, and locking behavior are otherwise
    unchanged

The vector-layer metadata change adds a thread-local cache keyed by layer and
attribute name:

  • the cache is reset if the worker starts using a different LayerDefinition
    instance
  • exact same-thread repeats of the same layer/key/type return before taking
    vectorLayerMetadataMutex
  • first sightings and type changes still update the shared
    layers.layers[layer].attributeMap under the existing mutex

This keeps the shared metadata map as the source of truth and does not remove
the synchronization that protects actual shared writes.

Performance

The performance numbers below were measured with these cache changes stacked on
#902, so they show the cost of these changes after the allocation-churn work in
#902. No separate upstream-master timing run was made for this PR.

Runtime fixture:

fixture: Austria Geofabrik extract
profile: OpenMapTiles profile
output: PMTiles
store: no --store
threads: 8
warmup: PBF plus coastline/landcover sidecar files
method: six alternating warmed runs with /usr/bin/time -v

Combined result:

metric #902 mean #902 + this PR mean delta delta %
wall time 32.795 s 31.177 s -1.618 s -4.93%
user CPU 189.337 s 198.053 s +8.717 s +4.60%
system CPU 41.465 s 23.575 s -17.890 s -43.14%
peak RSS 4,523,051 KiB 4,480,713 KiB -42,337 KiB -0.94%
minor faults 1,223,322 1,222,723 -599 -0.05%
filesystem outputs 1,111,761 1,111,955 +193 +0.02%
voluntary context switches 333,560 185,866 -147,695 -44.28%
involuntary context switches 2,438 2,132 -306 -12.54%

The user-CPU total increased, but the system-time and context-switch reductions
more than offset that in wall-clock time on this fixture.

Possible Regressions

The intended output behavior is unchanged.

Potential costs:

  • the AttributeStore thread-local caches retain a few more pointer/index
    entries per worker thread
  • the vector-layer metadata cache keeps a small per-thread map of layer/key
    types already seen by that worker
  • if a profile intentionally writes the same layer/key with different types,
    type changes still go through the existing mutex and update the shared map

This PR does not make the final metadata type-resolution semantics stricter or
more deterministic. It only avoids repeated same-thread writes that would have
stored the same type again.

Testing

Build:

cmake -S . -B build -DCMAKE_BUILD_TYPE=RelWithDebInfo
cmake --build build --parallel 8

Code style:

git diff --check

Semantic checks were run during candidate acceptance on the submitted/open PR
stack used for output comparisons:

fixture: Liechtenstein Geofabrik extract
profile: OpenMapTiles profile

AttributeStore TLS cache size:
  baseline repeat changed_tiles 0
  candidate repeat changed_tiles 0
  baseline -> candidate changed_tiles 0

Vector layer metadata cache:
  baseline repeat changed_tiles 0
  candidate repeat changed_tiles 0
  baseline -> candidate changed_tiles 0

Symmetricity and others added 2 commits June 4, 2026 12:18
Attribute pair and set lookups are very frequent during profile processing. The existing 64-entry thread-local caches leave many repeated lookups to fall back to the shared stores on larger fixtures.

Use a still-small 256-entry cache so more hot lookups are served locally without changing generated tile semantics. On the Austria fixture this was neutral to slightly faster and reduced native RSS in warmed alternating runs.

Co-authored-by: Codex <noreply@openai.com>
Layer metadata is shared across Lua processing threads and still needs the mutex added by the thread-safety fix in PR systemed#761. On larger fixtures, repeated Attribute() calls make that coarse lock a contention point.

Remember metadata types already recorded by the current thread for the active LayerDefinition, and skip the lock only for exact same-type repeats. This preserves the protected shared map update for first sightings and type changes while avoiding redundant lock traffic.

Measured with these cache changes stacked on PR systemed#902, the combined Austria fixture result reduced wall time by about 4.9%, system CPU by about 43%, and voluntary context switches by about 44% without changing decoded Liechtenstein output in the candidate semantic checks.

Co-authored-by: Codex <noreply@openai.com>
@Symmetricity Symmetricity force-pushed the perf/attribute-metadata-contention branch from c910c03 to 05f5e43 Compare June 4, 2026 10:23
@systemed
Copy link
Copy Markdown
Owner

systemed commented Jun 4, 2026

Thank you!

As it happens the repeated writes issue is being discussed over in #877. I have extracted the core of this PR (with some reference to the discussion in #877) and created a single, focused PR at #913.

We can do the 64->256 cache size change in a separate commit.

@Symmetricity
Copy link
Copy Markdown
Contributor Author

Symmetricity commented Jun 4, 2026

No worries! Sorry for not checking for related issues. I normally do, but just skipped it in this case for some reason.

Want me to close this PR and submit another one for the cache size change?

@systemed
Copy link
Copy Markdown
Owner

systemed commented Jun 4, 2026

Yes please!

@Symmetricity
Copy link
Copy Markdown
Contributor Author

Closed as requested. Replacement PR #914 contains only the AttributeStore 64->256 lookup-cache size change.

@Symmetricity Symmetricity deleted the perf/attribute-metadata-contention branch June 4, 2026 15:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants