Skip to content

perf(util): memoize parseUrl to avoid redundant REGEX_URL runs#246

Open
caffodian wants to merge 1 commit into
cloudinary-community:mainfrom
caffodian:perf/memoize-parseurl
Open

perf(util): memoize parseUrl to avoid redundant REGEX_URL runs#246
caffodian wants to merge 1 commit into
cloudinary-community:mainfrom
caffodian:perf/memoize-parseurl

Conversation

@caffodian
Copy link
Copy Markdown

Description

parseUrl runs REGEX_URL — a large URL-parsing regex — and the same src is commonly parsed many times:

  • Image loaders call the loader once per srcset candidate width (typically 6–8×) on an identical src, on every render — e.g. next-cloudinary's <CldImage> rendering through next/image. getPublicId / getTransformations each call parseUrl again on the same URL too.
  • The same image URLs recur across renders and across requests.

Under server-side-render load this made REGEX_URL the single largest JS CPU consumer we observed in production profiling of an image-heavy page — it was being re-run on identical strings with no caching.

parseUrl is pure (src → parts), so this memoizes it with a bounded LRU (Map keyed by src, 5000-entry cap, least-recently-used eviction). On a representative image-heavy SSR workload this collapsed the per-render srcset duplicates to a single regex execution per distinct URL — ~97% of parseUrl CPU eliminated — with no change to output or error behavior.

Behavior is preserved

  • Each call returns a fresh object, including freshly-copied transformations and queryParams, so callers can mutate the result without affecting the cache (matches the previous "new object per call" contract).
  • Invalid input throws exactly as before, and failures are never cached.
  • A guarded undefined branch is never cached or copied.
  • Only parseUrl changes internally; the existing logic is preserved verbatim as a private parseUrlUncached. Public API is unchanged.

Tests: added a parseUrl memoization block (cache hit serves repeats without re-parsing — verified via a decodeURIComponent spy seam; deep-copy independence so a mutated result can't poison the cache; failures re-throw and aren't cached). All existing parseUrl tests pass unchanged. tsc, eslint, tsup build, and vitest all green locally.

Issue Ticket Number

No existing issue — happy to open a tracking issue if you'd prefer. Filed as a performance fix surfaced by production profiling.

Type of change

  • Bug fix (non-breaking change which fixes an issue) — specifically a performance fix; no API or behavior change
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Fix or improve the documentation
  • This change requires a documentation update

Checklist

  • I have followed the contributing guidelines of this project (conventional-commit message for semantic-release)
  • I have created an issue ticket for this PR (see note above — happy to file one)
  • I have checked to ensure there aren't other open Pull Requests for the same update/change
  • I have performed a self-review of my own code
  • I have run tests locally to ensure they all pass
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes needed to the documentation (none needed — no API change)

🤖 This PR was written by Claude Code on behalf of @caffodian.

parseUrl runs REGEX_URL, a large URL-parsing regex, and the same `src` is
commonly parsed many times: image loaders (e.g. next-cloudinary's <CldImage>
via next/image) call the loader once per srcset candidate width — typically
6–8x — on an identical src, on every render, and the same URLs recur across
renders/requests. Under server-side-render load this made REGEX_URL the single
largest JS CPU consumer we observed in production profiling.

parseUrl is pure (src -> parts), so memoize it with a bounded (LRU, 5000-entry)
Map keyed by src. Output and error behavior are unchanged:

- each call returns a fresh object, including freshly-copied `transformations`
  and `queryParams`, so callers can mutate the result without affecting the
  cache (matches the previous "new object per call" contract);
- invalid input throws exactly as before and failures are never cached;
- a guarded undefined branch is never cached or copied.

Collapses the per-render srcset duplicates to a single regex execution per
distinct URL.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented Jun 5, 2026

@caffodian is attempting to deploy a commit to the Cloudinary DevX Team on Vercel.

A member of the Team first needs to authorize it.

@caffodian caffodian marked this pull request as draft June 5, 2026 17:33
@caffodian
Copy link
Copy Markdown
Author

Disclosure: the PR description above is an AI-generated summary of an investigation, so please treat the specific figures as indicative rather than a rigorous upstream benchmark.

Long story short: we applied a local patch for this and it seems promising. On pages with a lot of Cloudinary React components — especially under Next.js SSR — this one regex was responsible for a massive amount of CPU time. Memoizing it helped a lot in our testing.

That said, I'm genuinely not sure whether this belongs here in @cloudinary-util/util vs. branching the logic higher up in the React/Next component (or the loader) layer, where the per-srcset-width duplication actually originates. Happy to move it if you'd prefer that shape — wanted to open the discussion with a working, tested version of the in-library approach.

🤖 This comment was written by Claude Code on behalf of @caffodian.

@caffodian caffodian marked this pull request as ready for review June 5, 2026 17:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant