Skip to content

[Major Rewrite] Index/nd.size/nd.shape int→long#596

Open
Nucs wants to merge 81 commits intomasterfrom
longindexing
Open

[Major Rewrite] Index/nd.size/nd.shape int→long#596
Nucs wants to merge 81 commits intomasterfrom
longindexing

Conversation

@Nucs
Copy link
Copy Markdown
Member

@Nucs Nucs commented Mar 26, 2026

Summary

Migrates all index, stride, offset, and size operations from int (int32) to long (int64), aligning NumSharp with NumPy's npy_intp type. This enables support for arrays exceeding 2GB (int32 max = 2.1B elements) and ensures compatibility with NumPy 2.x behavior.

Motivation

NumPy uses npy_intp (equivalent to Py_ssize_t) for all indexing operations, which is 64-bit on x64 platforms. NumSharp's previous int32 limitation prevented working with large arrays and caused silent overflow bugs when array sizes approached int32 limits.

Key drivers:

  • Support arrays with >2.1 billion elements
  • Align with NumPy 2.x npy_intp semantics
  • Eliminate overflow risks in index calculations
  • Enable large-scale scientific computing workloads

What Changed

  • Shape fields: size, dimensions, strides, offset, bufferSizelong
  • Shape methods: GetOffset(), GetCoordinates(), TransformOffset()long parameters and return types
  • Shape constructors: primary constructor now takes long[], int[] overloads delegate to long[]
  • Shape.Unmanaged: pointer parameters int*long* for strides/shapes
  • IArraySlice interface: all index parameters → long
  • IMemoryBlock interface: Count property → long
  • ArraySlice: Count property and all index parameters → long
  • UnmanagedStorage: Count property → long
  • UnmanagedStorage.Getters: all index parameters → long, added long[] overloads
  • UnmanagedStorage.Setters: all index parameters → long, added long[] overloads
  • UnmanagedMemoryBlock: allocation size and index parameters → long
  • NDArray: size, len properties → long
  • NDArray: shape, strides properties → long[]
  • NDArray indexers: added long[] coordinate overloads, int[] delegates to long[]
  • NDArray typed getters/setters: added long[] overloads
  • NDIterator: offset delegate Func<int[], int>Func<long[], long>
  • MultiIterator: coordinate handling → long[]
  • NDCoordinatesIncrementor: coordinates → long[]
  • NDCoordinatesAxisIncrementor: coordinates → long[]
  • NDCoordinatesLeftToAxisIncrementor: coordinates → long[]
  • NDExtendedCoordinatesIncrementor: coordinates → long[]
  • NDOffsetIncrementor: offset tracking → long
  • ValueOffsetIncrementor: offset tracking → long
  • ILKernelGenerator: all loop counters, delegate signatures, and IL emission updated for long
  • ILKernelGenerator: Ldc_I4Ldc_I8, Conv_I4Conv_I8 where appropriate
  • DefaultEngine operations: loop counters and index variables → long
  • DefaultEngine.Transpose: stride calculations → long
  • DefaultEngine.Broadcast: shape/stride calculations → long
  • SimdMatMul: matrix indices and loop counters → long
  • SimdKernels: loop counters → long
  • np.arange(int) and np.arange(int, int, int) now return int64 arrays (NumPy 2.x alignment)
  • np.argmax / np.argmin: return type → long
  • np.nonzero: return type → long[][]
  • Hashset: upgraded to long-based indexing with 33% growth factor for large collections
  • StrideDetector: pointer parameters int*long*, local stride calculations → long
  • LongIndexBuffer: new utility for temporary long index arrays

Breaking Changes

Change Impact Migration
NDArray.size returns long Low Cast to int if needed, or use directly
NDArray.shape returns long[] Medium Update code expecting int[]
NDArray.strides returns long[] Medium Update code expecting int[]
np.arange(int) returns int64 dtype Medium Use .astype(NPTypeCode.Int32) if int32 needed
np.argmax/np.argmin return long Low Cast to int if needed
np.nonzero returns long[][] Low Update code expecting int[][]
Shape[dim] returns long Low Cast to int if needed
Iterator coordinate arrays are long[] Low Internal change, minimal user impact

Performance Impact

Benchmarked at 1-3% overhead for scalar loops, <1% overhead for SIMD-optimized paths. This is acceptable given the benefits of large array support.

  • Pointer arithmetic natively supports long offsets (zero overhead)
  • SIMD paths unaffected (vector operations don't use index type)
  • Scalar loops have minor overhead from 64-bit counter increment
  • Memory layout unchanged (data types unaffected)

What Stays int

Item Reason
NDArray.ndim / Shape.NDim Maximum ~32 dimensions, never exceeds int
Slice.Start / Stop / Step Python slice semantics use int
Dimension loop indices (for (int d = 0; d < ndim; d++)) Iterating over dimensions, not elements
NPTypeCode enum values Small fixed set
Vector lane counts in SIMD Hardware-limited constants

Related

@Nucs Nucs changed the title [Major Rewrite] Index/NDArray.size int→long [Major Rewrite] Index/NDArray.size/nd.dimensions int→long Mar 26, 2026
@Nucs Nucs changed the title [Major Rewrite] Index/NDArray.size/nd.dimensions int→long [Major Rewrite] Index/NDArray.size/nd.shape int→long Mar 26, 2026
@Nucs Nucs changed the title [Major Rewrite] Index/NDArray.size/nd.shape int→long [Major Rewrite] Index/nd.size/nd.shape int→long Mar 26, 2026
Nucs and others added 27 commits March 26, 2026 18:56
Extended the keepdims fix to all remaining reduction operations:
- ReduceAMax (np.amax, np.max)
- ReduceAMin (np.amin, np.min)
- ReduceProduct (np.prod)
- ReduceStd (np.std)
- ReduceVar (np.var)

Also fixed np.amax/np.amin API layer which ignored keepdims when axis=null.

Added comprehensive parameterized test covering all reductions with
multiple dtypes (Int32, Int64, Single, Double, Int16, Byte) to prevent
regression.

All 7 reduction functions now correctly preserve dimensions with
keepdims=true, matching NumPy 2.x behavior.
Apply .gitattributes normalization across all text files.
No code changes - only CRLF → LF conversion.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…N handling

This commit adds comprehensive SIMD acceleration for reduction operations
and fixes several NumPy compatibility issues.

- AllSimdHelper<T>(): SIMD-accelerated boolean all() with early-exit on first zero
- AnySimdHelper<T>(): SIMD-accelerated boolean any() with early-exit on first non-zero
- ArgMaxSimdHelper<T>(): Two-pass SIMD: find max value, then find index
- ArgMinSimdHelper<T>(): Two-pass SIMD: find min value, then find index
- NonZeroSimdHelper<T>(): Collects indices where elements != 0
- CountTrueSimdHelper(): Counts true values in bool array
- CopyMaskedElementsHelper<T>(): Copies elements where mask is true
- ConvertFlatIndicesToCoordinates(): Converts flat indices to per-dimension arrays

- **np.any axis-based reduction**: Fixed inverted logic in ComputeAnyPerAxis<T>.
  Was checking `Equals(default)` (returning true when zero found) instead of
  `!Equals(default)` (returning true when non-zero found). Also fixed return
  value to indicate computation success.

- **ArgMax/ArgMin NaN handling**: Added NumPy-compatible NaN propagation where
  first NaN always wins. For both argmax and argmin, NaN takes precedence over
  any other value including Infinity.

- **ArgMax/ArgMin empty array**: Now throws ArgumentException on empty arrays
  matching NumPy's ValueError behavior.

- **ArgMax/ArgMin Boolean support**: Added Boolean type handling. For argmax,
  finds first True; for argmin, finds first False.

- np.all(): Now uses AllSimdHelper for linear (axis=None) reduction
- np.any(): Now uses AnySimdHelper for linear reduction
- np.nonzero(): Added SIMD fast path for contiguous arrays
- Boolean masking (arr[mask]): Added SIMD fast path using CountTrueSimdHelper
  and CopyMaskedElementsHelper

Added comprehensive ownership/responsibility documentation to all
ILKernelGenerator partial class files explaining the architecture:
- ILKernelGenerator.cs: Core infrastructure and type mapping
- ILKernelGenerator.Binary.cs: Same-type binary operations
- ILKernelGenerator.MixedType.cs: Mixed-type with promotion
- ILKernelGenerator.Unary.cs: Unary element-wise operations
- ILKernelGenerator.Comparison.cs: Comparison operations
- ILKernelGenerator.Reduction.cs: Reductions and SIMD helpers

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ions

Implements all missing kernel operations and routes SIMD helpers through
IKernelProvider interface for future backend abstraction.

- Power: IL kernel with Math.Pow scalar operation
- FloorDivide: np.floor_divide with NumPy floor-toward-negative-infinity semantics
- LeftShift/RightShift: np.left_shift, np.right_shift with SIMD Vector.ShiftLeft/Right

- Truncate: Vector.Truncate SIMD support
- Reciprocal: np.reciprocal (1/x) with SIMD
- Square: np.square optimized (x*x instead of power(x,2))
- Cbrt: np.cbrt cube root
- Deg2Rad/Rad2Deg: np.deg2rad, np.rad2deg (np.radians/np.degrees aliases)
- BitwiseNot: np.invert, np.bitwise_not with Vector.OnesComplement

- Var/Std: SIMD two-pass algorithm with interface integration
- NanSum/NanProd: np.nansum, np.nanprod (ignore NaN values)
- NanMin/NanMax: np.nanmin, np.nanmax (ignore NaN values)

- Route 6 SIMD helpers through IKernelProvider interface:
  - All<T>, Any<T>, FindNonZero<T>, ConvertFlatToCoordinates
  - CountTrue, CopyMasked<T>
- Clip kernel: SIMD Vector.Min/Max (~620→350 lines)
- Modf kernel: SIMD Vector.Truncate (.NET 9+)

- ATan2: Fixed wrong pointer type (byte*) for x operand in all non-byte cases

- ILKernelGenerator.Clip.cs, ILKernelGenerator.Modf.cs
- Default.{Cbrt,Deg2Rad,FloorDivide,Invert,Rad2Deg,Reciprocal,Shift,Square,Truncate}.cs
- np.{cbrt,deg2rad,floor_divide,invert,left_shift,nanprod,nansum,rad2deg,reciprocal,right_shift,trunc}.cs
- np.{nanmax,nanmin}.cs
- ShiftOpTests.cs, BinaryOpTests.cs (ATan2 tests)
This commit concludes a comprehensive audit of all np.* and DefaultEngine
operations against NumPy 2.x specifications.

- **ATan2**: Fixed non-contiguous array handling by adding np.broadcast_arrays()
  and .copy() materialization before pointer-based processing
- **NegateBoolean**: Removed buggy linear-indexing path, now routes through
  ExecuteUnaryOp with new UnaryOp.LogicalNot for proper stride handling
- **np.square(int)**: Now preserves integer dtype instead of promoting to double
- **np.invert(bool)**: Now uses logical NOT (!x) instead of bitwise NOT (~x)

- **np.power(NDArray, NDArray)**: Added array-to-array power overloads
- **np.logical_and/or/not/xor**: New functions in Logic/np.logical.cs
- **np.equal/not_equal/less/greater/less_equal/greater_equal**: 18 new
  comparison functions in Logic/np.comparison.cs
- **argmax/argmin keepdims**: Added keepdims parameter matching NumPy API

- Renamed `outType` parameter to `dtype` in 19 np.*.cs files to match NumPy
- Added UnaryOp.LogicalNot to KernelOp.cs for boolean array negation

- Created docs/KERNEL_API_AUDIT.md tracking Definition of Done criteria
- Updated .claude/CLAUDE.md with DOD section and current status

- Added NonContiguousTests.cs with 35+ tests for strided/broadcast arrays
- Added DtypeCoverageTests.cs with 26 parameterized tests for all 12 dtypes
- Added np.comparison.Test.cs for new comparison functions
- Updated KernelMisalignmentTests.cs to verify fixed behaviors

Files: 43 changed, 5 new files added
Tests: 3058 passed (93% of 3283 total)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Bug #126 - Empty array comparison returns scalar (FIXED):
- All 6 comparison operators now return empty boolean arrays
- Files: NDArray.Equals.cs, NotEquals.cs, Greater.cs, Lower.cs

Bug #127 - Single-element axis reduction shares memory (FIXED):
- Changed Storage.Alias() and squeeze_fast() to return copies
- Fixed 8 files: Add, AMax, AMin, Product, Mean, Var, Std, CumAdd
- Added 20 memory isolation tests

Bug #128 - Empty array axis reduction returns scalar (FIXED):
- Proper empty array handling for all 9 reduction operations
- Sum→zeros, Prod→ones, Min/Max→ValueError, Mean/Std/Var→NaN
- Added 22 tests matching NumPy behavior

Bug #130 - np.unique NaN sorts to beginning (FIXED):
- Added NaNAwareDoubleComparer and NaNAwareSingleComparer
- NaN now sorts to end (NaN > any non-NaN value)
- Matches NumPy: [-inf, 1, 2, inf, nan]

Test summary: +54 new tests, all passing

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace 20K-line Regen template with clean 300-line implementation:

- ILKernelGenerator.MatMul.cs: Cache-blocked SIMD kernels for float/double
  - 64x64 tile blocking for L1/L2 cache optimization
  - Vector256 with FMA (Fused Multiply-Add) when available
  - IKJ loop order for sequential memory access on B matrix
  - Parallel execution for matrices > 65K elements

- Default.MatMul.2D2D.cs: Clean dispatcher with fallback
  - SIMD fast path for contiguous same-type float/double
  - Type-specific pointer loops for int/long
  - Generic double-accumulator fallback for mixed types

| Size    | Float32 | Float64 |
|---------|---------|---------|
| 32x32   | 34x     | 18x     |
| 64x64   | 38x     | 29x     |
| 128x128 | 15x     | 58x     |
| 256x256 | 183x    | 119x    |

- Before: 19,862 lines (Regen templates, 1728 type combinations)
- After: 284 lines (clean, maintainable)

Old Regen template preserved as .regen_disabled for reference.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
IL Kernel Infrastructure:
- Add ILKernelGenerator.Scan.cs for CumSum scan kernels with SIMD V128/V256/V512 paths
- Extend ILKernelGenerator.Reduction.cs with Var/Std/ArgMax/ArgMin axis reduction support
- Extend ILKernelGenerator.Clip.cs with strided/broadcast array helpers
- Extend ILKernelGenerator.Modf.cs with special value handling (NaN, Inf, -0)
- Add IKernelProvider interface extensions for new kernel types

DefaultEngine Migrations:
- Default.Reduction.Var.cs: IL fast path for contiguous arrays, single-element fix
- Default.Reduction.Std.cs: IL fast path for contiguous arrays, single-element fix
- Default.Reduction.CumAdd.cs: IL scan kernel integration
- Default.Reduction.ArgMax.cs: IL axis reduction with proper coordinate tracking
- Default.Reduction.ArgMin.cs: IL axis reduction with proper coordinate tracking
- Default.Power.cs: Scalar exponent path migrated to IL kernels
- Default.Clip.cs: Unified IL path (76% code reduction, 914→240 lines)
- Default.NonZero.cs: Strided IL fallback path
- Default.Modf.cs: Unified IL with special float handling

Bug Fixes:
- np.var.cs / np.std.cs: ddof parameter now properly passed through
- Var/Std single-element arrays now return double (matching NumPy)

Tests (3,500+ lines added):
- ArgMaxArgMinComprehensiveTests.cs: 480 lines covering all dtypes, shapes, axes
- VarStdComprehensiveTests.cs: 462 lines covering ddof, empty arrays, edge cases
- CumSumComprehensiveTests.cs: 381 lines covering accumulation, overflow, dtypes
- np_nonzero_strided_tests.cs: 221 lines for strided/transposed array support
- 7 NumPyPortedTests files: Edge cases from NumPy test suite

Code Impact:
- Net reduction: 543 lines removed (6,532 added - 2,172 removed from templates)
- ReductionTests.cs removed (884 lines) - replaced by comprehensive per-operation tests
- Eliminated ~1MB of switch/case template code via IL generation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… ClipEdgeCaseTests

- Fix BeOfValues params array unpacking: Cast GetData<T>() to object[] for proper params expansion
- Mark Power_Integer_LargeValues as Misaligned: Math.Pow precision loss for large integers is expected
- Fix np.full argument order in Clip tests: NumSharp uses (fill_value, shapes) not NumPy's (shape, fill_value)
- Mark Base_ReductionKeepdims_Size1Axis_ReturnsView as OpenBugs: view optimization not implemented

Test results: 3,879 total, 3,868 passed, 11 skipped, 0 failed
Breaking change: Migrate from int32 to int64 for array indexing.

Core type changes:
- Shape: size, dimensions[], strides[], offset, bufferSize -> long
- Slice: Start, Stop, Step -> long
- SliceDef: Start, Step, Count -> long
- NDArray: shape, size, strides properties -> long/long[]

Helper methods:
- Shape.ComputeLongShape() for int[] -> long[] conversion
- Shape.Vector(long) overload

Related to #584
- NDArray constructors: int size -> long size
- NDArray.GetAtIndex/SetAtIndex: int index -> long index
- UnmanagedStorage.GetAtIndex/SetAtIndex: int index -> long index
- ValueCoordinatesIncrementor.Next(): int[] -> long[]
- DefaultEngine.MoveAxis: int[] -> long[]

Build still failing - cascading changes needed in:
- All incrementors (NDCoordinatesIncrementor, NDOffsetIncrementor, etc.)
- NDIterator and all cast files
- UnmanagedStorage.Cloning
- np.random.shuffle, np.random.choice

Related to #584
- this[long index] indexer
- GetIndex/SetIndex with long index
- Slice(long start), Slice(long start, long length)
- Explicit IArraySlice implementations

Build has 439 cascading errors remaining across 50+ files.
Most are straightforward loop index changes (int → long).

Related to #584
…int[] convenience

Pattern applied:
- Get*(params long[] indices) - primary implementation calling Storage
- Get*(params int[] indices) - delegates to long[] via Shape.ComputeLongShape()
- Set*(value, params long[] indices) - primary implementation
- Set*(value, params int[] indices) - delegates to long[] version

Covers: GetData, GetBoolean, GetByte, GetChar, GetDecimal, GetDouble,
GetInt16, GetInt32, GetInt64, GetSingle, GetUInt16, GetUInt32, GetUInt64,
GetValue, GetValue<T>, SetData (3 overloads), SetValue (3 overloads),
SetBoolean, SetByte, SetInt16, SetUInt16, SetInt32, SetUInt32, SetInt64,
SetUInt64, SetChar, SetDouble, SetSingle, SetDecimal

Related to #584
…check

- Add overflow check when string length exceeds int.MaxValue
- Explicitly cast Count to int with comment explaining .NET string limitation
- Part of int32 to int64 indexing migration (#584)
- Add overflow check in AsString() instead of Debug.Assert
- Implement empty SetString(string, int[]) wrapper to call long[] version
- Change GetStringAt/SetStringAt offset parameter from int to long
- Part of int32 to int64 indexing migration (#584)
…ndices

- GetValue(int[]) -> GetValue(long[])
- GetValue<T>(int[]) -> GetValue<T>(long[])
- All direct getters (GetBoolean, GetByte, etc.) -> long[] indices
- SetValue<T>(int[]) -> SetValue<T>(long[])
- SetValue(object, int[]) -> SetValue(object, long[])
- SetData(object/NDArray/IArraySlice, int[]) -> long[] indices
- All typed setters (SetBoolean, SetByte, etc.) -> long[] indices
- Fix int sliceSize -> long sliceSize in GetData

Part of int32 to int64 indexing migration (#584)
- NDArray`1.cs: Add long[] indexer, int[] delegates to it
- UnmanagedStorage.cs: Add Span overflow check (Span limited to int)
- UnmanagedStorage.Cloning.cs: Add ArraySlice allocation overflow check
- NDIterator.cs: Change size field from int to long

Note: ~900 cascading errors remain from:
- ArraySlice (needs long count)
- Incrementors (need long coords)
- Various Default.* operations
- IKernelProvider interface

Part of int32 to int64 indexing migration (#584)
- NDCoordinatesIncrementor: Next() returns long[], Index is long[]
- NDCoordinatesIncrementorAutoResetting: all fields long
- NDOffsetIncrementor: Next() returns long, index/offset are long
- NDOffsetIncrementorAutoresetting: same changes
- ValueOffsetIncrementor: Next() returns long
- ValueOffsetIncrementorAutoresetting: same changes
- NDCoordinatesAxisIncrementor: constructor takes long[]
- NDCoordinatesLeftToAxisIncrementor: dimensions/Index are long[]
- NDExtendedCoordinatesIncrementor: dimensions/Index are long[]

Part of int64 indexing migration (#584)
- ArraySlice.cs: Change Allocate count parameter handling for long
- UnmanagedMemoryBlock: Adjust for long count
- np.random.choice.cs: Add explicit casts for int64 indices
- np.random.shuffle.cs: Update index handling for long
- ValueCoordinatesIncrementor.cs: Add long[] Index property
- NDArray.cs: Remove duplicate/dead code (112 lines)
MatMul.2D2D.cs:
- M, K, N parameters now long throughout
- All method signatures updated (long M, long K, long N)
- Loop counters changed to long
- Coordinate arrays changed to long[]

NDArray.unique.cs:
- len variable changed to long
- getOffset delegate now Func<long, long>
- Loop counters changed to long

NDArray.itemset.cs:
- Parameters changed from int[] to long[]

NdArray.Convolve.cs:
- Explicit (int) casts for size - acceptable because convolution
  on huge arrays is computationally infeasible (O(n*m))

NDArray.matrix_power.cs:
- Cast shape[0] to int for np.eye (pending np.eye long support)

np.linalg.norm.cs:
- Fixed bug: was casting int[] to long[] incorrectly

Remaining work:
- IL kernel interfaces still use int for count/size
- SIMD helpers (SimdMatMul) expect int parameters
- Default.Clip, Default.ATan2, Default.Transpose, Default.NonZero
  all need coordinated IL kernel + caller updates
….Unmanaged

- IKernelProvider: Changed interface to use long for size/count parameters
- Default.Transpose: Fixed int/long coordinate and stride handling
- ILKernelGenerator.Clip: Updated to use long loop counters
- TensorEngine: Updated method signatures for long indexing
- UnmanagedStorage.Slicing: Fixed slice offset to use long
- Shape.Unmanaged: Fixed unsafe pointer methods for long indices
- SimdMatMul.MatMulFloat accepts long M, N, K (validates <= int.MaxValue internally)
- MatMul2DKernel delegate uses long M, N, K
- np.nonzero returns NDArray<long>[] instead of NDArray<int>[]
- NDArray pointer indexer changed from int* to long*
- SwapAxes uses long[] for permutation
- AllSimdHelper<T> parameter: int totalSize → long totalSize
- Loop counters and vectorEnd: int → long
- Part of int64 indexing migration
ILKernelGenerator.Clip.cs:
- All loop counters and vectorEnd variables changed from int to long
- Scalar loops also changed to use long iterators

Default.Dot.NDMD.cs:
- contractDim, lshape, rshape, retShape → long/long[]
- Method signatures updated for TryDotNDMDSimd, DotNDMDSimdFloat/Double
- ComputeIterStrides, ComputeBaseOffset, ComputeRhsBaseOffset → long
- DotProductFloat, DotProductDouble → long parameters
- DotNDMDGeneric → long coordinates and iterators
- DecomposeIndex, DecomposeRhsIndex → long parameters
… fixed statements

ILKernelGenerator.Clip.cs:
- Changed 'int offset = shape.TransformOffset' to 'long offset'

Default.ATan2.cs:
- Changed fixed (int* ...) to fixed (long* ...) for strides and dimensions
- Updated ClassifyATan2Path signature to use long*
- Updated ExecuteATan2Kernel fixed statements

Note: StrideDetector and MixedTypeKernel delegate still need updating
- IsContiguous: int* strides/shape -> long* strides/shape
- IsScalar: int* strides -> long* strides
- CanSimdChunk: int* params -> long*, innerSize/lhsInner/rhsInner -> long
- Classify: int* params -> long*
- expectedStride local -> long
Comprehensive guide for developers continuing the migration:
- Decision tree for when to use long vs int
- 7 code patterns with before/after examples
- Valid exceptions (Span, managed arrays, complexity limits)
- What stays int (ndim, dimension indices, Slice)
- Checklist for each file migration
- Common error patterns and fixes
- File priority categories
- Quick reference table
Nucs added 22 commits March 26, 2026 18:56
Added patterns discovered from analyzing 38 commits on longindexing branch:

- Pattern 8: LongRange helper replacing Enumerable.Range
- Pattern 9: SIMD block loops (outer long, inner int for cache constants)
- Pattern 10: Random sampling with NextLong and int->long delegation
- Return type changes section (nonzero, argsort, argmax/argmin return int64)
- NDArray accessor methods (GetAtIndex/SetAtIndex use long)
- Parallel.For removal in axis reductions (single-threaded with long)
- Files changed summary categorized by component
Comprehensive audit of codebase for int64 migration violations:

HIGH Priority (3):
- NDArray/UnmanagedStorage.Getters missing long[] overloads (14+ methods)
- NDArray typed setters missing 9 long[] overloads
- np.vstack uses int[] for shape

MEDIUM Priority (5):
- IL kernel comments reference int* but code uses long*
- np.save/np.load internal processing uses int[]
- Shape.InferNegativeCoordinates has int[] version

LOW Priority (8):
- Acceptable .NET boundary exceptions documented
- Span, String, Array.CreateInstance limitations
- Dimension iteration with Enumerable.Range (bounded by ndim)

Includes verification commands and fix patterns.
Added 4 more HIGH priority issues:
- H4: np.repeat uses int count for per-element repeats
- H5: NDArray.unique SortUniqueSpan uses int count
- H6: np.searchsorted empty array returns int instead of long
- H7: nanmean/nanstd/nanvar allocate managed arrays with (int) cast

Added 1 more MEDIUM priority issue:
- M6: np.load internal int total accumulator

Added 2 more LOW priority issues:
- L9: Hashset<T>.Count is int (acceptable)
- L10: IMemoryBlock.ItemLength is int (acceptable)

Updated checklist with actionable items.
Additional issues found via deep codebase analysis:

HIGH Priority (new):
- H8: np.linspace uses int num parameter + int loop counters
- H9: np.roll uses int shift parameter
- H10: UnmanagedHelper.CopyTo uses int countOffsetDestination
- H11: np.array<T>(IEnumerable, int size) needs long overload

MEDIUM Priority (new):
- M7: np.save internal int total accumulator
- M8: NdArrayToJaggedArray loops use int for large arrays

Search techniques used:
- Grep for int parameters in public API signatures
- Buffer.MemoryCopy offset parameter analysis
- Loop counter variable type analysis
- Creation function parameter audit
Added 14 new issues found via grep/code search:
- H12: SimdMatMul.MatMulFloatSimple int M,N,K parameters
- H13: ArgMax/ArgMin SIMD helpers return int, take int totalSize
- H14: Default.Dot ExpandStartDim/ExpandEndDim return int[]
- H15: NDArray.Normalize int loop counters
- H16: Slice.Index uses ToInt32 cast in selection code
- H17: Shape dimension parsing uses List<int>
- H18: NdArrayFromJaggedArr uses List<int>
- H19: Arrays.GetDimensions uses List<int>
- H20: np.asarray uses new int[0] for scalar shape
- H21: ArrayConvert uses int[] for dimensions
- H22: UnmanagedStorage FromMultiDimArray uses int[] dim
- M9: NDArray<T> generic only has int size constructors
- M10: np.arange(int) returns int32 (NumPy 2.x returns int64)
- M11: Default.Transpose uses int[] for permutation
- ILKernelGenerator.Reduction.Axis.Simd: bridge long interface to int helper with bounds check
- np.random.choice: add System namespace for NotSupportedException
- np.random.shuffle: add int.MaxValue check for n (shape[0] is now long)
- H23: NumSharp.Bitmap shape casts without overflow check
- L11: SetData uses int[0] instead of long[] (acceptable)
- L12: NdArrayToMultiDimArray int[] for .NET boundary (acceptable)

Updated totals: 23 HIGH, 11 MEDIUM, 13 LOW
Fixed HIGH priority issues:
- H4: np.repeat - GetInt32→GetInt64, int count/j→long for per-element repeats
- H6: np.searchsorted - empty array returns typeof(long) for consistency
- H10: UnmanagedHelper.CopyTo - offset parameter int→long
- H12: SimdMatMul.MatMulFloatSimple - int M,N,K→long, all loop counters→long
- H14: Default.Dot.ExpandStartDim/ExpandEndDim - returns long[] instead of int[]
- H15: NDArray.Normalize - loop counters int col/row→long col/row
- H16: Slice.Index in Selection Getter/Setter - ToInt32→ToInt64
- H20: np.asarray - new int[0]→Array.Empty<long>() for scalar shapes

Reclassified as LOW (not bugs):
- H3: np.vstack - dead code (commented out)
- H17,H18,H19,H21,H22: .NET boundary (Array.Length/GetLength return int)

Updated docs/LONG_INDEXING_ISSUES.md with fix status and reclassifications.
Fixed HIGH priority issues:
- H8: np.linspace - added long num overloads, changed all loop counters int→long
- H9: np.roll/NDArray.roll - added long shift primary overloads
- H11: np.array<T>(IEnumerable, size) - added long size overload
- H23: NumSharp.Bitmap - added overflow checks before casting shape to int

All int overloads now delegate to long overloads for backward compatibility.
Total fixed this session: 12 issues (8 in batch 1, 4 in batch 2).
Fixed HIGH priority issues:
- H1: Confirmed already fixed (all typed getters have long[] overloads)
- H2: Added missing long[] overloads for 9 typed setters in NDArray:
  SetBoolean, SetByte, SetInt16, SetUInt16, SetUInt32, SetUInt64,
  SetChar, SetSingle, SetDecimal
- H13: ArgMax/ArgMin now return Int64 indices (supports >2B arrays):
  - ReductionKernel.cs: ResultType returns NPTypeCode.Int64 for ArgMax/ArgMin
  - ArgMaxSimdHelper/ArgMinSimdHelper: return long, take long totalSize
  - All loop counters/indices changed from int to long

Total fixed this session: 15 issues
Remaining HIGH: H5, H7 (both acceptable - protected by .NET limits)
M1: IL kernel comments updated from int* to long* (5 files)
- ILKernelGenerator.Comparison.cs (2 locations)
- ILKernelGenerator.MixedType.cs
- ILKernelGenerator.Reduction.cs
- ILKernelGenerator.Unary.cs

M4: Confirmed complete - long* version exists in Shape.Unmanaged.cs
- Added cross-reference comment to int* version in Shape.cs

M6: np.load int total accumulator changed to long
M7: np.save int total accumulator changed to long
M9: Confirmed already fixed - NDArray<T> has long size constructors

New issues found and fixed:
- M12: np.random.randn loop counter int -> long
- M13: ILKernelGenerator.Scan.cs loop counters int -> long
  (42 replacements: outer, inner, i over outerSize/innerSize/axisSize)

Updated LONG_INDEXING_ISSUES.md with batch 4 status
Implement Vector256/Vector128 SIMD for all NaN-aware statistics:
- nanmean, nanvar, nanstd: Two-pass algorithm with sum+count tracking
- nansum, nanprod: Identity masking (NaN → 0 for sum, NaN → 1 for prod)
- nanmin, nanmax: Sentinel masking (NaN → ±∞) with all-NaN detection

SIMD Algorithm (NaN masking via self-comparison):
  nanMask = Equals(vec, vec)     // True for non-NaN, false for NaN
  cleaned = BitwiseAnd(vec, nanMask)  // Zero out NaN values
  countMask = BitwiseAnd(oneVec, nanMask)  // Count non-NaN elements

Performance (1M elements, ~10% NaN):
- nanmean: ~3ms/call (Vector256<double> = 4 elements/vector)
- nanvar:  ~4.5ms/call (two-pass: mean, then squared differences)
- nanstd:  ~4ms/call

Files changed:
- ILKernelGenerator.Masking.NaN.cs: SIMD helpers for float/double
- ILKernelGenerator.Reduction.NaN.cs: NEW - IL generation infrastructure
- ILKernelGenerator.Reduction.Axis.NaN.cs: Axis reduction SIMD
- np.nanmean/var/std/sum/prod/min/max.cs: Simplified to use SIMD helpers

Tested: 70 tests covering edge cases, boundary sizes, large arrays,
sliced/strided arrays, axis reductions, and float32/float64 dtypes.
All results match NumPy 2.4.2 exactly.
…arge collections

Hashset<T> now supports collections exceeding int.MaxValue elements with
long-based indexing throughout. Key changes:

Core implementation (Hashset`1.cs):
- All index fields converted to long: m_count, m_lastIndex, m_freeList, Slot.next
- Buckets array changed to long[] for large collection support
- New LongCount property returns count as long
- Count property throws OverflowException if count > int.MaxValue
- CopyTo methods support long arrayIndex and count parameters

HashHelpersLong (new helper class):
- Extended primes table with values up to ~38 billion
- IsPrime(long), GetPrime(long), ExpandPrime(long) for large capacities
- 33% growth (1.33x) for collections >= 1 billion elements
- Standard 2x growth for smaller collections
- LargeGrowthThreshold = 1_000_000_000L constant

BitHelperLong (new helper class):
- Uses long[] instead of int[] for bit marking
- Supports marking/checking bits beyond int.MaxValue positions
- ToLongArrayLength() calculates required array size for n bits

ConcurrentHashset<T> updates:
- Added LongCount property for thread-safe long count access
- Updated CopyTo to use long parameters

Tests (HashsetLongIndexingTests.cs - 24 tests):
- Basic functionality: Add, Remove, Clear, Enumeration
- Long indexing: LongCount, long capacity constructor
- HashHelpersLong: IsPrime, GetPrime, 33% expansion verification
- BitHelperLong: MarkBit, IsMarked, ToLongArrayLength
- Set operations: Union, Intersect, Except, SymmetricExcept
- Stress test: 1 million elements
- Edge cases: TrimExcess, TryGetValue, SetEquals, Overlaps
- Remove duplicate zeros/zeros<T> overloads in np.zeros.cs
- Remove duplicate SetIndex method in ArraySlice<T>
- Remove duplicate constructor in ValueCoordinatesIncrementor
- Fix int/long type handling: axis/ndim stay as int (dimension indices),
  element indices (size, offset, strides) use long
- Fix np.moveaxis to convert long[] axes to int[] internally
- Fix np.linalg.norm axis parameter handling
- Simplify ILKernelGenerator axis reduction to use long* directly
np.arange(10) now returns int64 instead of int32, matching NumPy 2.x behavior.
Integer overloads delegate to long overloads to ensure consistent dtype.

Verified against NumPy 2.4.2:
- np.arange(10).dtype = int64
- np.arange(0, 10, 1).dtype = int64
Audited 67 locations with (int) casts that could silently overflow
for arrays > 2 billion elements. Fixed 9 critical issues:

Storage/Allocation:
- UnmanagedStorage.cs: Remove (int)shape.size casts - long overload exists
- IndexCollector.cs: Add MaxCapacity check to prevent growth beyond Array.MaxLength

Shape/Reshape:
- np.meshgrid.cs: Use reshape(long, long) overload directly
- np.nanvar.cs, np.nanstd.cs, np.nanmean.cs: Remove (int) cast in List<long>.Add

Array Conversion:
- NdArrayToMultiDimArray.cs: Add overflow check before converting to int[]

Verified 30+ locations already have proper guards (Span creations, string ops,
SIMD gather fallbacks, Shape operators with checked(), etc.)

Documented 7 known .NET limitations (Array.IndexOf, GetLength return int).

Added docs/INT32_CAST_LANDMINES.md tracking all findings.

Build: 0 errors | Tests: 3887 passed
INT64 Developer Guide compliance for recent rebase commits:

ILKernelGenerator fixes:
- Reduction.Arg.cs: All ArgMax/ArgMin helpers return long, take long totalSize
- Reduction.Axis.Simd.cs: Size/stride params, loop counters, arrays -> long
- Reduction.Axis.cs: Loop counters -> long
- Reduction.NaN.cs: Fix sealed -> static (rebase conflict)
- Masking.cs: Shape parameters -> long[]

DefaultEngine fixes:
- Default.Reduction.Nan.cs: All size/offset/stride/arrays -> long
- Default.NonZero.cs: CountNonZero returns long, arrays -> long[]
- Default.BooleanMask.cs: Size/count variables -> long

API fixes:
- TensorEngine.cs: CountNonZero returns long, remove duplicates
- np.count_nonzero.cs: Returns long
- np.any.cs, np.all.cs: Shape arrays -> long[], loop vars -> long

Rebase conflict fixes:
- np.random.rand.cs: Remove duplicate random() methods
- Delete duplicate Default.Op.Boolean.template.cs

Remaining: ~96 errors in Shape.Broadcasting.cs, NDArray`1.cs
See docs/INT64_MIGRATION_PROGRESS.md for full details
Complete migration of int32 to int64 for indices, sizes, strides, offsets
across 16 files. Build now compiles successfully.

Core changes:
- Shape.Broadcasting.cs: All dimension/stride arrays now long[]
- NDArray.cs: Added long size constructor overloads
- TensorEngine/np.nonzero: Return NDArray<long>[] for index arrays
- ILKernelGenerator.Reduction.Axis.Simd: AxisReductionKernel delegate
  now uses long* for strides/shapes and long for sizes
- np.size: Return type changed to long
- np.array: Stride variables changed to long for pointer arithmetic
- NDArray.Indexing.Masking: Shape arrays and counts now long

Random functions:
- np.random.choice/shuffle: Added overflow checks for int.MaxValue limit
  (Random.Next only supports int; full long support deferred)

Build infrastructure:
- NumSharp.Core.csproj: Exclude *.template.cs and *.regen_disabled files

Test status: 193 failures due to memory corruption - needs investigation
in stride/offset calculations.
…4 default

Root cause identified: "memory corruption" errors were NOT actual corruption.
Tests were calling GetInt32() on Int64 arrays (np.arange now returns int64).

Fixes:
- np.random.shuffle.cs: NextInt64 → NextLong (correct method name)
- np.random.shuffle.cs: SwapSlicesAxis0 int → long parameters
- BattleProofTests.cs: GetInt32 → GetInt64 for arange-based tests
- np.transpose.Test.cs: long[] → int[] for axis array (axes stay int)
- ReadmeExample.cs: cast n_samples to int for np.ones() calls
- NpApiOverloadTests: int → long for count_nonzero return, NDArray<int>[] → NDArray<long>[] for nonzero
- BooleanIndexing.BattleTests.cs: shape.SequenceEqual(new[]) → shape.SequenceEqual(new long[])
- Updated INT64_MIGRATION_PROGRESS.md with root cause analysis
- Default.All.cs: int i -> long i for size iteration
- Default.Any.cs: int i -> long i for size iteration
- StackedMemoryPool.cs: int i -> long i (count param is long)
- np.random.poisson.cs: int i -> long i for size iteration
- np.random.bernoulli.cs: int i -> long i for size iteration
- np.random.randn.cs: int i -> long i for size iteration
- NDArray.Indexing.Masking.cs:
  - int idx -> long idx for trueCount iteration
  - GetInt32 -> GetInt64 (nonzero returns NDArray<long>[])
  - int valueIdx -> long valueIdx for mask.size iteration

All changes follow INT64_DEVELOPER_GUIDE.md patterns.
…ters

- NdArrayToJaggedArray.cs: Add overflow checks for managed array limits,
  explicit (int) casts with validation before allocation, loop comparisons
  against .Length instead of shape[x]
- NDArray.matrix_power.cs: Add overflow check for np.eye dimension
- NDArray.Indexing.Masking.cs: Fix loop counter int→long for mask.size iteration
- Update INT64_MIGRATION_PROGRESS.md with session 5 fixes and audit results
  (np.load.cs, np.save.cs confirmed as valid exceptions)
- IndexingEdgeCaseTests.cs: GetInt32 → GetInt64 for arange-based arrays
- LinearAlgebraTests.cs: GetInt32 → GetInt64 for dot product tests
- NDArray.Base.Test.cs: GetInt32 → GetInt64 for base memory tests

np.arange now returns Int64 (NumPy 2.x alignment), so tests must use
GetInt64() instead of GetInt32() to access values correctly.
Nucs added 3 commits March 26, 2026 19:34
… int64 default

NumPy 2.x returns int64 from arange() by default. This batch updates tests
that used GetInt32/MakeGeneric<int> on arange-sourced arrays to use the
correct int64 accessors.

Test updates:
- Change GetInt32 -> GetInt64 for arange-sourced arrays
- Change MakeGeneric<int>() -> MakeGeneric<long>() for arange results
- Change np.array(new int[]) -> np.array(new long[]) where comparing to arange
- Fix NDIterator<int> -> NDIterator<long> for arange iteration
- Restore GetInt32 for explicit int32 arrays (np.array(42), np.array(new int[]))

Bug fix (Default.ClipNDArray.cs):
- Fixed mixed-dtype clip bug where int32 min/max arrays were read as int64
- Now casts min/max arrays to output dtype before calling SIMD kernel
- This prevented garbage values like 34359738376 (8 * 2^32+1)

Dtype preservation tests:
- Clip_Int32_PreservesDtype: use explicit int32 array
- Ravel_PreservesDtype_Int32: use explicit int32 array
- Reshape_Int32: use explicit int32 array
- Roll_PreservesDtype_Int32: use explicit int32 array

Test results: 103 failures -> 29 failures (74 tests fixed)
…type handling

Test fixes:
- Change shape comparisons from int[] to long[] (shape now returns long[])
- Fix Array.Empty<int>() to Array.Empty<long>() for scalar shape comparisons
- Fix GetInt32 -> GetInt64 for arange-sourced arrays in NDArray.Base.Test.cs
- Fix ToArray<int> -> ToArray<long> for arange-sourced data
- Fix GetInt64 -> GetInt32 for explicit int32 scalars (NDArray.Scalar(42))

Bug fix (np.repeat.cs):
- Fixed GetInt64() calls on repeats array that could be int32
- Now uses Convert.ToInt64(GetAtIndex()) to handle any integer dtype
- This fixes "index < Count" errors when repeat counts are int32

Test results: 29 failures -> 19 failures (10 tests fixed)
Tests were using wrong getter methods for array dtypes:
- np.arange() returns Int64 (NumPy 2.x) → use GetInt64()
- np.array(new[] { int }) returns Int32 → use GetInt32()
- NDArray.Scalar(int) returns Int32 → use GetInt32()

Fixes:
- NdArray.Roll.Test.cs: int[,] → long[,] for arange result cast
- np.concatenate.Test.cs: GetInt32 → GetInt64 for arange-based tests
- np.empty_like.Test.cs: GetInt32 → GetInt64 for arange-based tests
- NDArray.Base.Test.cs: GetInt64 → GetInt32 for int[] literal array
- NpBroadcastFromNumPyTests.cs: GetInt64 → GetInt32 for int scalar

Reduces test failures from 6 to 0 (excluding unrelated stack overflow).
@Nucs Nucs added this to the NumPy 2.x Compliance milestone Mar 27, 2026
@Nucs Nucs added architecture Cross-cutting structural changes affecting multiple components NumPy 2.x Compliance Aligns behavior with NumPy 2.x (NEPs, breaking changes) core Internal engine: Shape, Storage, TensorEngine, iterators labels Mar 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

architecture Cross-cutting structural changes affecting multiple components core Internal engine: Shape, Storage, TensorEngine, iterators NumPy 2.x Compliance Aligns behavior with NumPy 2.x (NEPs, breaking changes)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Core] Migrate from int32 to int64 indexing (NumPy npy_intp alignment)

1 participant