ffi: add V8 fast-call path by bengl · Pull Request #63140 · nodejs/node

bengl · 2026-05-06T04:05:59Z

HEADS UP: This likely isn't done yet. It's mainly here to compare approaches with @ShogunPanda.

Add a parallel dispatch path that uses V8 fast API calls instead of libffi for eligible native calls. At DynamicLibrary.getFunction time, generate a per-function JIT'd trampoline that strips V8's receiver argument and tail-calls the target. Signatures with callbacks, unsupported argument types, or more register-passed args than the platform ABI permits transparently fall back to libffi.

Stub emitters cover Linux/macOS/FreeBSD on x86_64 and AArch64, Windows on x86_64 and AArch64, and Linux on AArch32. JIT memory is allocated per isolate via direct mmap with MAP_JIT on macOS and W^X enforcement elsewhere. The JS wrapper validates each argument per declared type, mirroring the libffi slow callback so the contract is identical across both paths and across V8 optimization tiers.

The path is gated behind --experimental-ffi and can be disabled at build time with --without-ffi-fastcall. The previous shared-buffer JS fast path is removed, replaced by this fast-call path.

Some benchmarks:

                                   confidence improvement accuracy (*)     (**)    (***)
  ffi/add-f64.js n=10000000               ***    746.34 %      ±46.27%  ±64.21%  ±89.28%
  ffi/add-i32.js n=10000000               ***    807.83 %      ±44.39%  ±61.59%  ±85.62%
  ffi/getpid.js n=10000000                ***   1173.51 %     ±156.05% ±216.56% ±301.12%
  ffi/many-args.js n=10000000             ***    481.97 %       ±6.47%   ±8.96%  ±12.42%
  ffi/pointer-bigint.js n=10000000        ***    901.32 %      ±61.08%  ±84.75% ±117.82%
  ffi/sum-buffer.js n=10000000             **      9.51 %       ±5.38%   ±7.44%  ±10.28%

nodejs-github-bot · 2026-05-06T04:06:04Z

Review requested:

@nodejs/ffi
@nodejs/gyp
@nodejs/startup
@nodejs/tsc

ronag · 2026-05-06T05:17:00Z

With this. How does ffi compare to napi? Do we have any guidelines in regards to when to use which?

ShogunPanda · 2026-05-06T05:52:20Z

@ronag On my PR (which yields similar results as Bryan's) we're beating it by like 100x.

ronag · 2026-05-06T05:53:16Z

So ffi is 100x faster than napi?

ShogunPanda · 2026-05-06T05:55:01Z

So ffi is 100x faster than napi?

Sorry, I thought you meant https://github.com/node-ffi-napi/node-ffi-napi.

We probably have to measure that once we agree on the direction.

dmazzella · 2026-05-06T06:16:17Z

So ffi is 100x faster than napi?

Sorry, I thought you meant https://github.com/node-ffi-napi/node-ffi-napi.

We probably have to measure that once we agree on the direction.

node-ffi-napi isn't a good benchmark for performance, but koffi is. Version 3 performs even better than version 2; it's still in beta (but can already be used for testing).

Add a parallel dispatch path that uses V8 fast API calls instead of libffi for eligible native calls. At DynamicLibrary.getFunction time, generate a per-function JIT'd trampoline that strips V8's receiver argument and tail-calls the target. Signatures with callbacks, unsupported argument types, or more register-passed args than the platform ABI permits transparently fall back to libffi. Stub emitters cover Linux/macOS/FreeBSD on x86_64 and AArch64, Windows on x86_64 and AArch64, and Linux on AArch32. JIT memory is allocated per isolate via direct mmap with MAP_JIT on macOS and W^X enforcement elsewhere. The JS wrapper validates each argument per declared type, mirroring the libffi slow callback so the contract is identical across both paths and across V8 optimization tiers. The path is gated behind --experimental-ffi and can be disabled at build time with --without-ffi-fastcall. The previous shared-buffer JS fast path is removed, replaced by this fast-call path. Signed-off-by: Bryan English <bryan@bryanenglish.com>

Adds CFunctionInfo::HasReceiver = kNo. When set, V8's TurboFan and Turboshaft fast-call lowering omits the JS receiver from the C call — the C function pointer is invoked with user args only, no receiver in the first parameter register. For Node's FFI fast-call path this means the receiver-strip JIT stub is no longer needed: dlsym'd target functions can be registered directly with V8 as the C address. Eliminates ~7 instructions per call (AArch64) plus V8's own receiver-into-arg0 setup. Yields +3-24% over the prior validators+stub path on FFI microbenchmarks (largest gains on many-args and pointer-bigint), and beats the pre-fix silent-truncation thin wrapper on every numeric and pointer benchmark while preserving strict validation. The change is gated by an enum on CFunctionInfo (default kYes, backward-compatible). Existing fast-call users (DOM bindings, V8 internals) are unaffected. Patches in deps/v8 cover the API header, the constructor, the GetFastApiCallTarget overload-matching, and the js-call-reducer input-layout setup; the simplified-lowering and turboshaft graph-builder loops already iterate from input 0 over ArgumentCount() inputs and pick up the new layout automatically. Signed-off-by: Bryan English <bryan@bryanenglish.com>

With v8 fast-call patched to support HasReceiver=kNo (previous commit), the dlsym'd target function pointer is registered directly with v8 as the C function address — no JIT'd receiver-strip trampoline is needed. This removes the entire stub-emitter and JIT-memory infrastructure, along with the platform-specific argument-cap logic (v8's own 8-arg fast-call cap takes over) and the boot-time self-test that verified JIT pages were executable. Deleted: src/ffi/fastcall/jit_memory.{h,cc}, stub_emitter.h, the four per-platform stub_emitter_*.cc files (aarch64, arm, x64_sysv, x64_win), test/cctest/test_ffi_fastcall_{emitter,jit}.cc, and the related node.gyp / config.gypi entries. The CFunctionInfoBundle drops its arg_classes and result_class members (they only existed to feed the stub emitters). IsFastCallEligible loses the per-platform GP/FP/Win64 caps and the AArch32 i64-arg rejection — v8 now handles those uniformly. FastCall- State drops stub_entry/stub_alloc_size; its destructor no longer needs to free JIT pages. Net change: ~2000 lines deleted across src/ and test/cctest/. Functional behavior is unchanged: all 15 FFI tests and the FFI cctests still pass. The benchmark gain over the prior shipping wrapper is +2-24% on AArch64 macOS (largest on many-args at +24% — saved register- shifts in the deleted stub). Signed-off-by: Bryan English <bryan@bryanenglish.com>

bengl · 2026-05-06T18:51:36Z

Okay, so, new benchmark:

                                   confidence improvement accuracy (*)     (**)    (***)
  ffi/add-f64.js n=10000000               ***   2828.52 %     ±200.38% ±278.11% ±386.81%
  ffi/add-i32.js n=10000000               ***   2696.98 %     ±205.33% ±284.98% ±396.37%
  ffi/getpid.js n=10000000                ***   1353.74 %       ±7.47%  ±10.09%  ±13.43%
  ffi/many-args.js n=10000000             ***   5098.46 %      ±20.96%  ±29.08%  ±40.44%
  ffi/pointer-bigint.js n=10000000        ***   1040.00 %       ±4.04%   ±5.46%   ±7.26%
  ffi/sum-buffer.js n=10000000            ***     14.96 %       ±5.37%   ±7.42%  ±10.27%

The main reasons this improved are:

I minimized the the impact of the wrapper functions.
I took @addaleax's suggestion on @ShogunPanda's PR to see if we can get some wins from not having to do the extra trampoline on the native side. It makes things simpler (apart from the V8 change) and reduces V8 fast API calls to effectively its own FFI implementation.

The net result is that there's not a whole lot extra going on that V8 wouldn't have to do regardless, so I'd guess we're approaching limits here.

I'll hold off on making the V8 upstream patch until we also see @ShogunPanda's latest approach and decide which way we want to go. I also want to put together a benchmark repo that also tests this against existing userland FFI implementations, and also against NAPI (without and without the C++ wrapper lib), and non-NAPI addons.

nodejs-github-bot added lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. labels May 6, 2026

bengl force-pushed the bengl/ffi-fastcalls branch from a8f73a3 to 8c767ff Compare May 6, 2026 04:12

addaleax added ffi Issues and PRs related to experimental Foreign Function Interface support. labels May 6, 2026

bengl force-pushed the bengl/ffi-fastcalls branch 2 times, most recently from ff4f871 to 50fcbb2 Compare May 6, 2026 18:40

bengl added 4 commits May 6, 2026 14:43

fixup! ffi: add V8 fast-call path

c308d4d

bengl force-pushed the bengl/ffi-fastcalls branch from 50fcbb2 to 1fd74c1 Compare May 6, 2026 18:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ffi: add V8 fast-call path#63140

ffi: add V8 fast-call path#63140
bengl wants to merge 4 commits intonodejs:mainfrom
bengl:bengl/ffi-fastcalls

bengl commented May 6, 2026 •

edited

Loading

Uh oh!

nodejs-github-bot commented May 6, 2026

Uh oh!

ronag commented May 6, 2026

Uh oh!

ShogunPanda commented May 6, 2026

Uh oh!

ronag commented May 6, 2026

Uh oh!

ShogunPanda commented May 6, 2026

Uh oh!

dmazzella commented May 6, 2026

Uh oh!

bengl commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Uh oh!

Conversation

bengl commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nodejs-github-bot commented May 6, 2026

Uh oh!

ronag commented May 6, 2026

Uh oh!

ShogunPanda commented May 6, 2026

Uh oh!

ronag commented May 6, 2026

Uh oh!

ShogunPanda commented May 6, 2026

Uh oh!

dmazzella commented May 6, 2026

Uh oh!

bengl commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

bengl commented May 6, 2026 •

edited

Loading