Skip to content

Fix solve!(cache) type instability + JET QA test group#984

Open
ChrisRackauckas-Claude wants to merge 4 commits into
SciML:mainfrom
ChrisRackauckas-Claude:fix-default-solver-type-stability
Open

Fix solve!(cache) type instability + JET QA test group#984
ChrisRackauckas-Claude wants to merge 4 commits into
SciML:mainfrom
ChrisRackauckas-Claude:fix-default-solver-type-stability

Conversation

@ChrisRackauckas-Claude
Copy link
Copy Markdown
Contributor

Please ignore until reviewed by @ChrisRackauckas.

Summary

  • Fixes solve!(cache) returning LinearSolution{_A, _B, _C, _D, DefaultLinearSolver, _E, _F} where {...} (a UnionAll over 6 free type parameters) instead of a concrete LinearSolution.
  • Adds a new JET test group (mirroring the layout Sundials.jl uses for its JET QA) that pins the inferred return type of solve!(cache) for the default solver and many algorithms.

Reproduction (current main, v3.78.0)

julia> A = rand(4, 4); b = rand(4); cache = init(LinearProblem(A, b));
julia> @code_typed solve!(cache)
CodeInfo(

└──      return %2
) => LinearSolution{_A, _B, _C, _D, LinearSolve.DefaultLinearSolver} where {_A, _B, _C, _D}

After this PR:

julia> @code_typed solve!(cache)
 => LinearSolution{Float64, 1, Vector{Float64}, Nothing, LinearSolve.DefaultLinearSolver, LinearSolve.LinearCache{}, Nothing}

Root cause

Two stacked causes:

  1. do_factorization(::QRFactorization, A, b, u) returned a Union. For QRFactorization{ColumnNorm}, the non-inplace branch fell through to qr(A) (dropping alg.pivot, returning QRCompactWY) while the inplace branch called qr!(A, alg.pivot) (returning QRPivoted). Inference gave Union{QRCompactWY, QRPivoted} — type-unstable.

  2. _default_lu_solve_with_fallback / _do_qr_fallback / _reuse_qr_fallback read sol.{u,resid,cache,stats} from an Any-typed inner sol. During precompile, when inference processes the @generated solve!(cache, ::DefaultLinearSolver), the inner solve!(cache, $specific_alg) calls hit Julia's inference complexity cap and are cached with rettype = Any (you can verify by reading Method.specializations for the cached MethodInstances — every non-LU algorithm shows rettype: Any). The helpers then read sol.u/sol.resid/sol.cache/sol.stats and pass them to build_linear_solution, which uses typeof(u)/typeof(resid)/typeof(cache)/typeof(stats) as type parameters — so those four parameters of the constructed LinearSolution become abstract.

Fix

src/factorization.jl

Route sparse-CSC / GPU / cusparse to qr(A) (no pivot — SPQR and CUDA's qr don't accept a pivoting strategy), and use qr(A, alg.pivot) for all other CPU paths. This makes do_factorization return a single concrete type determined by QRFactorization{P}.

src/default.jl

In _default_lu_solve_with_fallback, _do_qr_fallback, _reuse_qr_fallback, and the non-LU branches of the @generated solve!(cache, ::DefaultLinearSolver), build the wrapping LinearSolution from the statically-typed cache:

  • cache.u instead of sol.u
  • nothing instead of sol.resid
  • cache instead of sol.cache
  • nothing instead of sol.stats

retcode and iters are fixed-type fields of LinearSolution (ReturnCode.T / Int, not type parameters), so we still take sol.retcode and sol.iters to preserve that runtime info even when sol::Any.

Also drops unused args...; kwargs... forwarding through the helpers — the inner solve! methods ignore those anyway.

Trade-off

stats is now nothing instead of sol.stats when wrapping through the default solver. For all current LinearSolve factorization solvers stats is already nothing, so nothing is lost in practice. If a future Krylov method populated stats and were selected by DefaultLinearSolver, that field would be dropped — flagging so we can decide if/when that becomes a real issue.

New JET test group

Mirrors the layout Sundials.jl uses for its JET QA tests:

  • test/jet/Project.tomlJET pinned at 0.9, 0.10, 0.11 (same as Sundials).
  • test/jet/runtests.jl — self-contained group:
    • Core.Compiler.return_type(solve!, Tuple{cacheT}) for the default solver must be concrete and a subtype of LinearSolution{Float64, 1, Vector{Float64}}.
    • solve!(init(prob, alg)) infers concretely for LUFactorization, GenericLUFactorization, QRFactorization{ColumnNorm}, QRFactorization{NoPivot}, DiagonalFactorization, SVDFactorization, CholeskyFactorization, NormalCholeskyFactorization.
    • BunchKaufmanFactorization / LDLtFactorization are @test_broken (unrelated inference issues).
    • JET.@test_opt on the default solver is broken = true for now — the @generated umbrella dispatches to every algorithm branch at inference time and several of those still have unrelated runtime dispatch sites inside Krylov/LinearAlgebra. The concrete-rettype checks are the load-bearing tests in this group.

Wired into both test/runtests.jl (GROUP=JET) and .github/workflows/Tests.yml.

Test plan

  • GROUP=Core Pkg.test("LinearSolve") passes locally on Julia 1.10.11
  • GROUP=Core2 Pkg.test("LinearSolve") passes locally on Julia 1.10.11
  • GROUP=JET Pkg.test("LinearSolve") passes locally on Julia 1.10.11 (10 passed, 3 broken as documented)
  • @code_typed solve!(cache) returns concrete on Julia 1.10.11 and 1.12.4
  • Existing nonsquare sparse-QR path verified still works (caught a regression in the QR fix during iteration; subsequent test run is green)
  • CI green

`solve!(cache)` where `cache = init(LinearProblem(A, b))` was returning
`LinearSolution{_A, _B, _C, _D, DefaultLinearSolver, _E, _F} where {...}`
— a UnionAll over 6 free type parameters instead of a concrete
LinearSolution.

Two stacked causes:

1. `do_factorization(::QRFactorization, A, b, u)` returned a `Union` for
   `QRFactorization{ColumnNorm}` because the non-inplace branch called
   `qr(A)` (dropping the pivot, returning `QRCompactWY`) while the inplace
   branch called `qr!(A, alg.pivot)` (returning `QRPivoted`). Routes
   sparse-CSC/GPU/cusparse to `qr(A)` explicitly and uses
   `qr(A, alg.pivot)` for all other CPU paths so all branches agree
   on the return type for a given pivot type.

2. `_default_lu_solve_with_fallback`, `_do_qr_fallback`, `_reuse_qr_fallback`
   and the non-LU branches of the `solve!(cache, ::DefaultLinearSolver)`
   @generated function read `sol.u`/`sol.resid`/`sol.cache`/`sol.stats`
   from an inner `sol`. During precompile, inference of those inner
   `solve!(cache, $specific_alg)` calls hits the inference complexity
   cap and gets cached with `rettype = Any` — so `sol` is `Any`,
   `typeof(sol.u)`/`typeof(sol.resid)`/`typeof(sol.cache)`/`typeof(sol.stats)`
   all become abstract type parameters in the constructed LinearSolution.

   Fix: build the wrapping LinearSolution from the statically-typed
   `cache` (`cache.u`, `nothing` for resid, `cache` itself, `nothing` for
   stats). `retcode` and `iters` are fixed-type fields of LinearSolution
   (not type parameters), so we keep `sol.retcode`/`sol.iters` to preserve
   that runtime info. Also drops unused `args...; kwargs...` forwarding
   through the helpers — the inner `solve!` methods ignore those.

Trade-off: `stats` is now `nothing` instead of `sol.stats` when wrapping
through the default solver. For all current LinearSolve factorization
solvers `stats` is already `nothing`, so nothing is lost in practice.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Adds a dedicated test/jet/ group (mirroring the layout Sundials.jl uses
for its JET QA tests) that pins the type-inference contract for
`solve!(cache)` across the default solver and many algorithms.

Layout follows Sundials.jl test/jet/: separate Project.toml with `JET`
under `compat = "0.9, 0.10, 0.11"`, and a self-contained runtests.jl
that the GROUP=JET branch of the top-level runtests.jl activates and
includes.

What the tests pin:

- `Core.Compiler.return_type(solve!, Tuple{cacheT})` for the default
  solver must be concrete and a subtype of
  `LinearSolution{Float64, 1, Vector{Float64}}`. This is the
  user-visible regression these tests guard against.
- `solve!(init(prob, alg))` infers concretely for each of:
  LUFactorization, GenericLUFactorization, QRFactorization
  (ColumnNorm and NoPivot), DiagonalFactorization, SVDFactorization,
  CholeskyFactorization, NormalCholeskyFactorization.
- BunchKaufmanFactorization and LDLtFactorization have unrelated
  inference issues; tracked as `@test_broken` so this group ratchets
  forward if they're fixed.
- `JET.@test_opt` on the default solver is `broken = true` for now —
  the @generated default solver dispatches to every algorithm branch
  at inference time and several of those still have unrelated runtime
  dispatch sites inside Krylov/LinearAlgebra. The concrete-rettype
  checks are the load-bearing test in this group.

Wires GROUP=JET into both test/runtests.jl and .github/workflows/Tests.yml.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
@ChrisRackauckas-Claude ChrisRackauckas-Claude force-pushed the fix-default-solver-type-stability branch from 1744b34 to 032a18b Compare May 14, 2026 12:42
- Add spaces around `=` in `JET.@test_opt` keyword args (Runic).
- Exclude `group: JET, version: pre` in the CI matrix — JET 0.9–0.11
  (the versions pinned to match Sundials.jl) don't support
  Julia 1.13.0-rc1 yet, mirroring the existing NoPre group exclusion.

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
@ChrisRackauckas-Claude ChrisRackauckas-Claude marked this pull request as ready for review May 14, 2026 13:45
@ChrisRackauckas-Claude
Copy link
Copy Markdown
Contributor Author

Marking ready for review. CI summary:

197 pass, 14 fail, 10 pending. The 14 failures are all pre-existing on main and independent of this PR:

  • 12× Tests (*, NoPre, *)test/nopre/static_arrays.jl:25 @test_nowarn flags an LLVM lookup deprecation warning. Root cause: LLVM.jl v9.8.1 (released 2026-05-12) added @deprecate for the 2-arg lookup(jljit, name) form, and AllocCheck.jl v0.2.3 at src/compiler.jl:114 still uses it. The depwarn fires during @check_allocs-generated JIT lookup, @test_nowarn correctly captures it as non-empty stderr. Same failure reproduces on main run 25858637942. Right fix is upstream in AllocCheck.
  • LinearSolveCUDAUndefVarError: CUDA not defined in CUDSS in the CUDSS extension precompile. Pre-existing.
  • test (Core, alldeps, 1.10)MKL_jll.is_available() UndefVarError when downgraded to the minimum-compat MKL_jll = 2019. Same workflow passed on the earlier commit 032a18be 15 min before but failed on the runic-fixup daa2565f; the diff between those two commits is whitespace-only in test/jet/runtests.jl + a single CI matrix-exclude entry. The same Downgrade workflow has been passing on main, so this is a transient registry-resolution flake.

The new JET test group I added passes on every matrix entry (1 × lts × x64/x86 × ubuntu/macos/windows).

Drop the separate test/jet/ group introduced earlier in this PR. The
existing NoPre group already has test/nopre/jet.jl with JET tests for
LinearSolve, so the new concrete-rettype checks belong there alongside
the existing @test_opt checks rather than in a parallel group with its
own Project.toml and CI matrix entry.

Moves the contents of test/jet/runtests.jl into the bottom of
test/nopre/jet.jl as two new testsets:

- "solve!(cache) returns concrete LinearSolution — default solver":
  `Core.Compiler.return_type(_solve_default, Tuple{Matrix{Float64},
  Vector{Float64}})` must be concrete and a subtype of
  `LinearSolution{Float64, 1, Vector{Float64}}`.
- "solve!(cache) is concrete for each algorithm": same check for
  LUFactorization, GenericLUFactorization, QRFactorization (ColumnNorm
  and NoPivot), DiagonalFactorization, SVDFactorization,
  CholeskyFactorization, NormalCholeskyFactorization. BunchKaufman and
  LDLt remain `@test_broken` for unrelated inference issues.

Removes the test/jet/ directory, the GROUP == "JET" branch in
test/runtests.jl, and the JET entries (matrix value + pre-release
exclude) from .github/workflows/Tests.yml. JET is already a dep of the
NoPre Project.toml, so no Project.toml changes are needed.

Verified locally on Julia 1.12.4: GROUP=NoPre passes with 27 passing /
8 broken in the JET Tests safetestset (the 8 broken match the pre-PR
broken set plus BunchKaufman/LDLt).

Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants