Host colectable GC references#55
Conversation
Replace GcRefStore integer-ID indirection with direct Java Object references in the interpreter path. Java's GC now handles liveness of Wasm GC structs, arrays, and i31 values naturally. Core design: MStack gains a lazy Object[] refs array (null until first pushRef). push()/pop() are unchanged — zero overhead for non-GC workloads. GC refs use pushRef()/popRef() to store actual WasmStruct/WasmArray/WasmI31Ref objects. Key changes: - MStack: lazy Object[] with pushRef/popRef/peekRef/clearRefsTo - StackFrame: Object[] localRefs parallel to long[] locals - WasmStruct/WasmArray: dual long[]+Object[] for fields/elements - GlobalInstance: Object refValue for ref-typed globals - TableInstance: Object[] objRefs for GC-typed tables - ValType.isGcReference(): distinguishes any-hierarchy from func/extern - StorageType.isReference()/isGcReference(): field type helpers - Instance.heapTypeMatchRef(Object,...): type matching on Objects - ConstantEvaluators: ConstantResult carries both long[] and Object - InterpreterMachine: ~30 GC instructions updated - Machine.call(int,long[],Object[]): overload for ref args - WasmI31Ref: equals/hashCode for ref.eq value equality GcRefStore is NOT yet removed — the compiler still uses it (Phase 2). Compiler GC tests are expected to fail until Phase 2.
Add callGc/applyGc methods to Machine, ExportFunction, and WasmFunctionHandle for passing real Java Objects as Wasm GC refs. Users can now receive WasmStruct/WasmArray/WasmI31Ref directly from exported functions, and Java GC manages their lifecycle. Key changes: - Machine.callGc(int, Object[]) / ExportFunction.applyGc(Object...) - InterpreterMachine overrides callGc with native Object path - WasmFunctionHandle.applyGc for GC-aware host functions - WasmExternRef can wrap either long or Object (for extern.convert_any) - ValType.isGcReference() cached during resolve() — zero-cost check - isGcReference correctly classifies concrete func types as non-GC - Table init populates objRefs for GC-typed tables - Instance.registerGcRef/gcRef/array deprecated - OpcodeImpl.boxForTable/unboxFromTable deprecated - Compiler: GC refs use Object.class, call bridge threads Object[] Compiler Phase 2: GC refs flow as Objects in generated bytecode. CompilerUtil maps GC ref types to Object.class/OBJECT_TYPE. Shaded helpers take/return Object for GC refs. Generated call_N bridges accept Object[] refArgs. BR_ON_NULL/NON_NULL checks distinguish GC (ifnull) from funcref (if_icmpeq). Non-GC code paths are completely unchanged — zero overhead.
Delete GcRefStore and its epoch-based mark-sweep collector. Wasm GC references (structs, arrays, i31) are now managed entirely by Java's garbage collector through the Object[] refs arrays in MStack, StackFrame, WasmStruct, WasmArray, GlobalInstance, and TableInstance. Changes: - Delete GcRefStore.java and GcRefStoreTest.java - Instance: remove gcRefs field, gcSafePoint() - Instance.registerGcRef/gcRef/array: throw UnsupportedOperationException - ExportFunction.apply(long...) throws on functions with GC params/returns directing users to applyGc(Object...) instead - Remove gcSafePoint() calls from Instance.Exports and initialization Users must migrate from apply(long...) to applyGc(Object...) for functions that use GC reference types (structs, arrays, i31, anyref). Non-GC functions (funcref, externref, numeric types) continue to work with apply(long...) unchanged.
Fix all remaining interpreter GC reference bugs: - MStack.popRef/peekRef: handle null refs array gracefully - REF_TEST/CAST_TEST/BR_ON_CAST: dispatch based on source type (popRef for GC refs, pop for funcref/externref) - ARRAY_GET/SET: use isGcReference() not isReference() for field type checks — funcref/externref elements stay in long[] path - ARRAY_NEW_DEFAULT: fill with REF_NULL_VALUE for non-GC ref types - ConstantEvaluators ARRAY_NEW_DEFAULT: same fix for global init - callGc: convert REF_NULL_VALUE to null for all ref return types - apply(long...): throw UnsupportedOperationException only after execution succeeds (traps propagate correctly) - WasmExternRef: can wrap Object for extern.convert_any round-trips Test generator (JavaTestGen): - Emit applyGc() for functions with GC params/returns - Null ref assertions use assertNull() for all ref types - WasmValue: toGcArgsValue, toGcResultValue, toGcAssertion methods - WasmValueType.isGcReference() helper
Fix all remaining compiler and interpreter GC reference bugs: Compiler: - Generate callGc override in compiled Machine class - Add int-based variants for refTest/castTest/heapTypeMatch (non-GC refs) - Add GC table operations: tableGetRef, tableSetRef, tableGrowRef, tableFillRef - Add extern conversion helpers in Shaded - Fix THROW to handle GC ref tag params (createWasmExceptionGc) - Fix CATCH_UNBOX_PARAMS to load GC refs from refArgs - Fix arrayNewElem/arrayInitElem to use computeConstant for GC elements - Fix WasmAnalyzer to track source types for REF_TEST dispatch Interpreter: - BR_ON_NULL/BR_ON_NON_NULL: check Object side for GC refs, long side for funcref - MStack.push(): clear stale refs (if refs != null) for correctness - ConstantEvaluators: use isGcReference() not isReference() for field type checks - STRUCT_NEW_DEFAULT/ARRAY_NEW_DEFAULT: fill REF_NULL_VALUE for non-GC ref fields - WasmException: carry Object[] refArgs for GC ref exception payloads Bridge: - CompilerInterpreterMachine.CALL: use callGc for functions with GC returns - Pass refArgs through compiled-to-interpreted boundary Tests: - BrOnNullTest: use applyGc for GC functions - All approval snapshots updated
Compiler fixes: - Shaded.structNewDefault/arrayNewDefault: fill REF_NULL_VALUE for funcref fields (non-GC refs need -1 for null, not 0) - emitBoxValuesOnStack: handle Object refs (store 0L placeholder) - Remove dead code: CompilerUtil.isGcRef, hasGcRefReturns, Context.isGcTable Infrastructure: - ValType.isObjectRef(): cached flag for "uses Object on JVM stack" (GC refs + externref, NOT funcref). Ready for future externref-as-Object. - StorageType.isObjectRef() delegates to ValType Tests: - GcEdgeCasesTest: struct.new_default funcref null, extern round-trip (interpreter only — compiler externref-as-Object deferred), applyGc - GcStressTest: allocate 100K struct chains 100 times, verifies Java GC collects unreachable Wasm GC refs (no OOM with -Xmx64m)
Make externref use Object representation (same as GC refs). Only funcref stays as int. extern.convert_any / any.convert_extern are now identity operations, satisfying the Wasm spec requirement that composing them yields the original value. API: - CallResult record: zero-boxing return type with long[] + Object[] - Machine.callWithRefs(int, long[], Object[]) returns CallResult - ExportFunction.applyWithRefs(long[], Object[]) returns CallResult - applyGc(Object...) stays as convenience wrapper - apply(long...) backward compatible for non-ref functions Internal: - isObjectRef() (GC refs + externref) replaces isGcReference() at all representation-deciding sites (compiler + interpreter) - isGcReference() kept only for Wasm type system checks - Tables: externref tables use Object[] objRefs (isObjectRef) - Shaded: extern conversions are identity (Object → Object) - structNewDefault/arrayNewDefault: only funcref needs REF_NULL_VALUE Zero overhead for non-GC non-externref modules.
- apply(long...) throws for isObjectRef (GC + externref), not just isGcReference - Delete callGc from Machine and InterpreterMachine - Delete boxReturnValue, all instanceof Number/Long boxing - Delete Long.valueOf externref wrapping - InterpreterMachine: zero boxing in Machine layer - Host functions: applyWithRefs(Instance, long[], Object[]) returns CallResult - WasmModuleTest: externref test uses applyWithRefs with real Objects - Remaining boxing only in Instance.applyGc (caller convenience edge) Breaking: annotation processor externref integration test fails (generated code uses apply(long...) for externref — needs migration to applyWithRefs in the annotation processor module).
Delete applyGc from ExportFunction, Instance, WasmFunctionHandle. Zero boxing anywhere in the runtime. API (final): - apply(long...) — numeric + funcref only, throws for GC/externref - applyWithRefs(long[], Object[]) → CallResult — everything, zero boxing ArgsAdapter extended with addRef() + applyWithRefs() for clean test code generation. Test generator migrated from applyGc to ArgsAdapter.builder().add(n).addRef(obj).applyWithRefs(func). Remaining: annotation processor needs migration to applyWithRefs for externref functions (annotations-it test).
Fix 5 confirmed bugs from code review, document 3 known limitations: MStack: NULL_REF sentinel distinguishes "null GC ref" from "no ref". pushRef(null) stores NULL_REF in refs[], popRef/peekRef convert back. topIsRef() checks the raw refs[] value. Fixes null GC refs lost in doControlTransfer, REF_IS_NULL, BR_ON_NULL, and exception catch. Instance: apply() guard uses isObjectRef() — externref functions now correctly throw, directing users to applyWithRefs(). Shaded: tableFillRef uses long arithmetic for overflow-safe bounds. Compiler: null-check refArgs before loading in compileCallFunction. InterpreterMachine: pushExceptionArgs uses tag type to determine ref positions instead of ambiguous null check. Known limitations (documented with TODO comments): - Cross-module call_indirect GC ref returns discarded (Bug 4) - Multi-value returns with Object[] vs LALOAD mismatch (Bug 5) - Host/cross-module calls lose GC ref params in long[] (Bug 6) Tests: GcReviewFixesTest with WAT module exercising null refs through blocks, exceptions, table.fill bounds, and externref via applyWithRefs.
Fix 6 bugs where GC ref Objects were lost at long[] serialization boundaries in the compiler. Root cause: the compiler used long[] to pass args/returns across module boundaries, host calls, tail calls, and too-many-params. Object refs can't fit in long[], so they were discarded (POP + 0L). Fix: dual long[] + Object[] at every boundary, matching the Machine.callWithRefs(int, long[], Object[]) → CallResult pattern. Shaded: add WithRefs overloads for callHostFunction, callIndirect, setTailCall, setTailCallIndirect, resolveTailCall. Old long[]-only methods kept for backward compat with existing compiled code. Instance.TailCallPending: add Object[] refArgs field. Emitters: emitBoxValuesOnStackWithRefs creates dual arrays. emitUnboxResult uses AALOAD for Object[] multi-value returns. emitTailCallCheck uses resolveTailCallWithRefs → CallResult. RETURN_CALL variants use WithRefs when params have Object refs. Compiler: emitBoxArgumentsWithRefs, emitUnboxCallResult for host/cross-module paths. compileMachineCallInvoke host path uses callHostFunctionWithRefs. Zero overhead for non-GC: all WithRefs paths gated on isObjectRef() at emit time. Null Object[] when no refs.
Update ModuleInterfaceCodegen to generate applyWithRefs calls for functions with externref params/returns. externref maps to Object.class (not long.class). Export codegen: builds positional long[] + Object[] arrays, calls applyWithRefs, reads from CallResult. Non-ref functions use the original apply() path unchanged. Import codegen: generates WasmFunctionHandle with applyWithRefs override for functions with Object ref types. ExternRefExampleTest: host functions use Object for externref.
Performance: - FunctionType.hasObjectRefParams/Returns() — replaces all stream().anyMatch() calls with method on FunctionType - CompilerInterpreterMachine only allocated when module has interpreted functions or Object ref types Correctness: - Multi-value returns with GC refs: compileCallFunction converts Object[] to long[] for the call_xxx bridge method type - Interpreted fallback: uses WithRefs path for Object ref functions - Dead bytecodes removed from emitUnboxResult Tests: - GcMultiValueTest: 5 multi-value return interpolations (int+ref, ref+int, two refs, two ints+ref) on both engines Javadoc: CallResult, applyWithRefs, callWithRefs
|
Notes:
|
…ak, perf BUG-1: RETURN_CALL/RETURN_CALL_INDIRECT on imported functions now dispatch to applyWithRefs when function type has Object refs. BUG-2: CALL_INDIRECT cross-instance dispatch uses callWithRefs when function type has Object refs. Linear memory path unchanged. BUG-3: MStack.pop() now clears refs[count] to prevent stale Object refs from leaking. doControlTransfer reads ref BEFORE pop to avoid the clearing race. BUG-4: GcStressTest uses assertEquals instead of assert. CONCERN-1: FunctionType.hasObjectRefParams/Returns() cached lazily (computed on first call, no stream allocation on hot paths). CONCERN-2: WasmStruct/WasmArray skip Object[] allocation for numeric-only types. fieldRefs/elementRefs is null until first setFieldRef/setRef call.
|
Fix 5 bugs found during code review of PR #55 (GC references). All are in the interpreter path. BUG-1: RETURN_CALL to imported host function drops GC refs BUG-2: RETURN_CALL_INDIRECT to imported host function drops GC refs BUG-3: CALL_INDIRECT cross-instance call drops GC refs BUG-4: ARRAY_COPY uses isReference() instead of isObjectRef() BUG-5: MStack.pop() doesn't clear refs[] — memory leak For BUGs 1-3, use the regular CALL import path (lines 131-158 in the same file) as the reference implementation — it correctly handles the hasObjectRefs check and applyWithRefs dispatch. |
… alloc - ARRAY_COPY: use isObjectRef() not isReference() — funcref arrays stay in long[] path, not incorrectly routed through Object[] - Builder.isGcReference: remove typeIdx>=0 fallthrough that misclassified concrete func types as GC refs without TypeSection - CallResult V128 slot indexing: use composite slot counter (not per-return index) in emitUnboxCallResult and emitTailCallCheck - emitBoxFieldsForStruct: skip Object[] allocation when no ref fields (null instead of empty array)
Generate callWithRefs_N bridges for compiled functions with Object
ref params/returns. The compiled Machine's callWithRefs now dispatches
directly to these bridges instead of going through the interpreter.
Before: Host → callWithRefs → compilerInterpreterMachine (interpreter)
→ StackFrame → eval → CALL → compiled func_N
After: Host → callWithRefs → MachineCall.callWithRefs → callWithRefs_N
→ compiled func_N (direct, no interpreter overhead)
callWithRefs_N bridges are only generated when the function type has
Object refs — zero overhead for linear memory modules.
Interpreted functions still fall back to compilerInterpreterMachine.
WIP to fix #36