Skip to content

RFC: [x2cpg] Added ForkJoinParallelCpgPassWithAccumulator#5877

Closed
max-leuthaeuser wants to merge 1 commit intomasterfrom
max/forkJoinParallelWithAccumulator
Closed

RFC: [x2cpg] Added ForkJoinParallelCpgPassWithAccumulator#5877
max-leuthaeuser wants to merge 1 commit intomasterfrom
max/forkJoinParallelWithAccumulator

Conversation

@max-leuthaeuser
Copy link
Copy Markdown
Contributor

@max-leuthaeuser max-leuthaeuser commented Mar 11, 2026

This allows to get rid of Global typesSeen contention. I adapted jssrc2cpg to demo this.

If thats something we want, this can go into https://github.com/ShiftLeftSecurity/codepropertygraph/blob/master/codepropertygraph/src/main/scala/io/shiftleft/passes/CpgPass.scala

@bbrehm This also revealed a double call of finish() both in ForkJoinParallelCpgPass.createAndApply() and ForkJoinParallelCpgPass.runWithBuilder()

This allows to get rid of Global typesSeen contention.
I adapted jssrc2cpg to demo this.

@bbrehm This also revealed a double call of finish() both in ForkJoinParallelCpgPass.createAndApply() and ForkJoinParallelCpgPass.runWithBuilder()
@max-leuthaeuser max-leuthaeuser marked this pull request as draft March 11, 2026 10:34
@max-leuthaeuser max-leuthaeuser requested review from bbrehm and ml86 March 11, 2026 10:35
@max-leuthaeuser
Copy link
Copy Markdown
Contributor Author

Waiting for ShiftLeftSecurity/codepropertygraph#1841, then I can remove the "finish() only once" hack here.

Copy link
Copy Markdown
Contributor

@ml86 ml86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use of a thread local variable seems sub-optimal. Why not have runOnPart return a type X and accumulate this along side the diff graph in the stream.collect consumer function?

@max-leuthaeuser
Copy link
Copy Markdown
Contributor Author

max-leuthaeuser commented Mar 11, 2026

Because runOnPart runs on a single part of work (i.e., in our case a single, parsed file).
You would end up with thousands of HashSet instances carrying the result of the runOnPart invocation on that particular part of work. We want map-reduce style of merging at thread level.
Also, this approach works without touching any of the ForkJoinParallelCpgPass API so other frontends can adapt/switch at a later point in time.

@ml86
Copy link
Copy Markdown
Contributor

ml86 commented Mar 11, 2026

It seems to depend on the accumulator use case. If one would end up writing lots of duplicate data in the accumulator like for our seen types than I would also expect the thread local implementation to be better.
Anyway I think we can go ahead with this but please put the new CPG pass along side the others.

@max-leuthaeuser
Copy link
Copy Markdown
Contributor Author

Waiting for ShiftLeftSecurity/codepropertygraph#1844 to adapt this PR here.

@bbrehm
Copy link
Copy Markdown
Contributor

bbrehm commented Mar 11, 2026

I think the appropriate way would be for CpgPasses to be able to bring their own subclass of DiffGraphBuilder. That can then carry the hashset along.

This is exacly what the stream collector interface is made for, so we should use that.

@max-leuthaeuser
Copy link
Copy Markdown
Contributor Author

But that would be per part again right? Anyway, feel free to open a PR in codepropertygraph. I guess the pass API will change too.

@max-leuthaeuser
Copy link
Copy Markdown
Contributor Author

Will do a new PR for jssrc2pg using ForkJoinParallelCpgPassWithAccumulator from ShiftLeftSecurity/codepropertygraph#1847 once that release is available.

@max-leuthaeuser max-leuthaeuser deleted the max/forkJoinParallelWithAccumulator branch March 12, 2026 13:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants