Tally 32-bit Overflow Fix#3960
Open
jtramm wants to merge 11 commits into
Open
Conversation
A combined mesh+energy tally can exceed 2^31 filter bins (e.g. a 97.8M-cell mesh times 70 energy groups = 6.85e9 bins). With int32_t storage the bin count overflowed to a negative value that init_results() then cast to size_t, requesting a ~1.8e19-element vector and throwing std::length_error before transport even started. Widen n_filter_bins_, the per-filter strides_, and their accessors to int64_t. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
set_strides() multiplied the per-filter bin counts into a plain int accumulator, so the running product overflowed before being stored into the now-int64_t n_filter_bins_. Promote the accumulator to int64_t so the product is formed in 64-bit. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
index(sr, g) computes sr * negroups_ + g in 64-bit (sr is int64_t) but truncated the result back to int on return. With more than ~30.7M source regions at 70 groups the flattened element index exceeds 2^31, yielding a wrapped/negative offset and an out-of-bounds access into the correctly-sized flux arrays during the solve. Return int64_t. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
FilterBinIter accumulates the flat results_ index across the tally filters in index_. For tallies with more than 2^31 filter bins this overflowed. Widen to int64_t so the random ray tally-task setup, which copies filter_iter.index_ into each TallyTask, records correct indices. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Three scoring helpers accumulated filter_index in a 32-bit int before indexing results_, which overflows for tallies exceeding 2^31 filter bins. Widen the locals to int64_t. Note: the continuous-energy and multigroup score_general_ce/score_general_mg paths still take filter_index as an int parameter; those are not exercised by the random ray solver and are left for a separate change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
TallyTask stores the flat tally results_ index for each source-region/energy-group element. With more than 2^31 filter bins the int member truncated the now-64-bit filter index taken from FilterBinIter::index_, sending random ray scores to the wrong (or out-of-bounds) results_ bins. Widen the member and the constructor parameter to int64_t. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The flux-tally volume-normalization loop in random_ray_tally iterated filter bins with an int counter compared against n_filter_bins(), which is now int64_t. For tallies with more than 2^31 bins the int counter cannot reach the end (it overflows and the comparison wraps), so the upper bins would never be volume-normalized. Use an int64_t loop variable. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
score_general_ce_nonanalog, score_general_ce_analog, and score_general_mg took the flat results_ filter index as an int parameter. Their callers compute it from FilterBinIter::index_ (now int64_t), so the value was narrowed back to 32 bits at the call boundary and could index the wrong or out-of-bounds bin for a regular Monte Carlo tally exceeding 2^31 filter bins. This path is not used by the random ray solver; widen the parameter so large continuous-energy and multigroup MC tallies are correct too. Each function uses filter_index only for its final tally.results_(filter_index, ...) write, so widening the parameter is sufficient. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds a Catch2 case to the existing tests/cpp_unit_tests/test_tally.cpp (already in TEST_NAMES, so no CMakeLists change). It builds a tally with two ~50,000-bin energy filters whose bin-count product (2.5e9) exceeds INT32_MAX, calls set_strides(), and asserts n_filter_bins() equals the exact 64-bit product and is positive. This pins the arithmetic fixed on this branch; it runs in milliseconds on ~1 MB and does not allocate the results tensor. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
StructuredMesh::n_bins() and n_surface_bins() formed their bin counts in 32-bit int and silently overflowed for very large meshes (n_bins near ~2.1B cells; n_surface_bins ~12x sooner, ~179M cells in 3D). The resulting negative count surfaced downstream as the same cryptic std::length_error this branch fixes, but originating from the mesh layer. Compute both counts in int64_t and raise a clear fatal_error naming the mesh and count when they exceed the 2^31 tally-indexing limit. This does not lift the limit (the per-event bin index and Filter::n_bins_ are still 32-bit) -- it converts a silent overflow into an actionable message. The unstructured (MOAB/LibMesh) counts are left as-is, since a >2^31-element unstructured mesh is not realistic. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix 32-bit integer overflow in tally filter-bin indexing
Problem
A tally with more than 2³¹ (~2.15 billion) filter bins aborts during initialization:
(
Tally::init_results→tensor::Tensor<double>constructor →std::vector(__n = 18446744068490538100).)This is reachable from ordinary random-ray FW-CADIS runs. The weight-window flux tally
carries one bin per mesh element per energy group, so a 97,860,906-element mesh at 70
groups produces 6,850,263,420 filter bins. The same model at 4 groups (391M bins) runs
fine; the crash appears only once the bin count crosses 2³¹.
I hit this bug from a real world use case with the JET model, where I was creating 70 group weight windows on a (513, 638, 299) resolution mesh. To be clear: the error stems from the normal OpenMC tallying machinery on a mesh of that size with that many energy bins, the error is not coming from the random ray machinery. As such, the error could be hit from normal Monte Carlo usage as well if a large mesh + significant number of energy filter bins are used.
Root cause
Tally::n_filter_bins_andstrides_areint32_t, andTally::set_strides()accumulates the product of the per-filter bin counts in a plain
int. At 6.85e9 binsthis overflows and wraps to
-1,739,671,172.Tally::init_results()then formsstatic_cast<size_t>(n_filter_bins_) * n_scores * 3; the negative value sign-extends toa huge
size_t(× 3 =18446744068490538100), which exceedsvector::max_size()andthrows.
The same 32-bit width is used at every step that later consumes the flat filter index,
so widening only the allocation would still mis-score above 2³¹ bins:
FilterBinIter::index_— the accumulated flat filter index;TallyTask::filter_idx— the random-ray per-element stored index;filter_indexaccumulators and thefilter_indexparameters ofscore_general_ce_nonanalog/score_general_ce_analog/score_general_mg.Separately, random ray's
SourceRegionContainer::index()computessr * negroups_ + gin 64-bit (since
srisint64_t) but returnedint, truncating the flux-array offsetpast ~30.7M source regions at 70 groups.
The bin count overflows one layer lower too:
StructuredMesh::n_bins()andn_surface_bins()compute their counts inint(the latter is4 * n_dim * n_bins, i.e.~12x the element count in 3D), so a single very large mesh produces a negative count that
reaches
init_results()as the samelength_error. A surface-current tally hits thisfirst, near ~179M 3D cells.
Fix
Widen the bin count and the flat result index to
int64_talong the whole chain:Tally::n_filter_bins_,strides_, their accessors, and theset_strides()accumulator.FilterBinIter::index_;TallyTask::filter_idx(member +constructor); the CE/MG
filter_indexaccumulators andscore_general_*parameters;the random-ray flux-tally volume-normalization loop counter.
SourceRegionContainer::index()return type.tensor::Tensoralready computes its size and offsets insize_t, so it is unchanged.The result index is now 64-bit end-to-end on both the random-ray and the CE/MG scoring
paths.
Mesh bin-count guard. The widening above does not lift the limit on a single mesh:
Filter::n_bins_and the per-eventFilterMatch::bins_remainint, so one mesh or filterpast 2³¹ bins is still unsupported.
StructuredMesh::n_bins()/n_surface_bins()nowcompute their counts in
int64_tand raise a clearfatal_error(naming the mesh andcount) when the value exceeds
int, instead of silently overflowing into the samelength_error. This makes the 32-bit boundary safe and explicit rather than lifting it.(Unstructured MOAB/LibMesh counts are left as-is.)
Why a guard instead of widening everything?
We widened the indexing chain, so the obvious question is why not also widen
Filter::n_bins_andFilterMatch::bins_and support a single mesh or filter past 2³¹ binsoutright, instead of leaving a guarded limit.
Not because that case is unrealistic — a multi-billion-element structured mesh is perfectly
reasonable (e.g. ITER-scale shielding), the mesh itself is cheap to store, and the tally
memory (tens of GB) sits comfortably on a compute node. The reason is that lifting the limit
is a substantially larger, separable change than the two remaining type flips it looks like:
(
openmc_mesh_*), and the Python bindings all assume 32-bit bin indices, so it carriesformat and compatibility surface, not just internal types.
FilterMatch::bins_is written and readon every scored event; widening it is a (likely small, but unmeasured) cost on every
simulation, so it warrants a benchmark rather than an assumption.
end-to-end needs a big-memory integration test, not the kilobyte unit test that covers the
arithmetic in this PR.
So this PR fixes the crash and makes indexing correct up to the 2³¹-per-filter boundary, and
the guard ensures nothing past that boundary is left as a silent bug: an over-limit tally
now stops immediately with a clear message instead of corrupting indices or dying with a
cryptic
length_error. Lifting the boundary outright — full 64-bit mesh and filterindexing — can be done in the future if and when the need arrises. At that point, hopefully the performance CI can be used to reveal any performance regressions caused by that change.
Testing
The PR adds a Catch2 case to
tests/cpp_unit_tests/test_tally.cpp(run withctest;the file is already in
TEST_NAMES, so noCMakeLists.txtchange is needed). It buildsa
Tallywith two energy filters of 50,000 bins each, whose bin counts multiply to2.5e9 (above
INT32_MAX), then callsTally::set_strides()directly (it is public andis what
init_results()relies on) and asserts the count is the exact 64-bit product:This pins exactly the
set_strides()/n_filter_bins_arithmetic that overflowed,runs in milliseconds on ~1 MB, and fails on the old
int32_tcode (the product wrapsnegative). The large
results_allocation ininit_results()is intentionally notexercised; the defect is in the index arithmetic, not in the size of the results array.
Does the Fix Work?
The JET model has previously been running fine for weight window generation with random ray but I had been limiting my testing to 4 groups. I have been doing some parameter sensitivity studies and was trying to do 70 groups -- that's when I hit this issue. I can confirm that the fix does indeed work and I'm able to apply the 70 group energy mesh on that mesh now.
Checklist