Reduce contention in attribute metadata caches#912
Closed
Symmetricity wants to merge 2 commits into
Closed
Conversation
Attribute pair and set lookups are very frequent during profile processing. The existing 64-entry thread-local caches leave many repeated lookups to fall back to the shared stores on larger fixtures. Use a still-small 256-entry cache so more hot lookups are served locally without changing generated tile semantics. On the Austria fixture this was neutral to slightly faster and reduced native RSS in warmed alternating runs. Co-authored-by: Codex <noreply@openai.com>
Layer metadata is shared across Lua processing threads and still needs the mutex added by the thread-safety fix in PR systemed#761. On larger fixtures, repeated Attribute() calls make that coarse lock a contention point. Remember metadata types already recorded by the current thread for the active LayerDefinition, and skip the lock only for exact same-type repeats. This preserves the protected shared map update for first sightings and type changes while avoiding redundant lock traffic. Measured with these cache changes stacked on PR systemed#902, the combined Austria fixture result reduced wall time by about 4.9%, system CPU by about 43%, and voluntary context switches by about 44% without changing decoded Liechtenstein output in the candidate semantic checks. Co-authored-by: Codex <noreply@openai.com>
c910c03 to
05f5e43
Compare
Owner
Contributor
Author
|
No worries! Sorry for not checking for related issues. I normally do, but just skipped it in this case for some reason. Want me to close this PR and submit another one for the cache size change? |
Owner
|
Yes please! |
Contributor
Author
|
Closed as requested. Replacement PR #914 contains only the AttributeStore 64->256 lookup-cache size change. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR is AI generated.
Summary
This reduces lock contention and repeated metadata work in two hot paths used
while writing OpenMapTiles attributes:
entries
the same layer/key/type do not repeatedly take
vectorLayerMetadataMutexOn a warmed Austria OpenMapTiles run, measured with the same changes stacked on
#902, this reduced wall time by about 4.9%, system CPU by about 43%,
voluntary context switches by about 44%, and peak RSS by about 0.9%.
The patch also applies cleanly to current upstream master independently of
#902, which is how this PR is submitted.
Background
Large OpenMapTiles runs spend a lot of time repeatedly assigning attributes and
recording vector-layer metadata.
AttributeStorealready has small thread-local direct-mapped caches to avoidfalling through to the shared sharded stores. This PR keeps that existing
approach, but makes the cache slightly larger so common attribute pairs and
sets survive more collision patterns.
OsmLuaProcessing::setVectorLayerMetadata()records the type of eachlayer/key pair for the final
vector_layersmetadata table. The shared writestill needs a mutex, but most calls repeat the same layer/key/type combination
many times from the same worker thread. This PR skips the mutex path for those
same-thread exact repeats.
Implementation
The AttributeStore change is deliberately small:
cachedAttributePairPointersandcachedAttributePairIndexesgrow from 64to 256 entries per thread
cachedAttributeSetPointersandcachedAttributeSetIndexesgrow from 64 to256 entries per thread
unchanged
The vector-layer metadata change adds a thread-local cache keyed by layer and
attribute name:
LayerDefinitioninstance
vectorLayerMetadataMutexlayers.layers[layer].attributeMapunder the existing mutexThis keeps the shared metadata map as the source of truth and does not remove
the synchronization that protects actual shared writes.
Performance
The performance numbers below were measured with these cache changes stacked on
#902, so they show the cost of these changes after the allocation-churn work in
#902. No separate upstream-master timing run was made for this PR.
Runtime fixture:
Combined result:
The user-CPU total increased, but the system-time and context-switch reductions
more than offset that in wall-clock time on this fixture.
Possible Regressions
The intended output behavior is unchanged.
Potential costs:
entries per worker thread
types already seen by that worker
type changes still go through the existing mutex and update the shared map
This PR does not make the final metadata type-resolution semantics stricter or
more deterministic. It only avoids repeated same-thread writes that would have
stored the same type again.
Testing
Build:
cmake -S . -B build -DCMAKE_BUILD_TYPE=RelWithDebInfo cmake --build build --parallel 8Code style:
Semantic checks were run during candidate acceptance on the submitted/open PR
stack used for output comparisons: