Namespace NumSharp.Backends.Iteration
Classes
- NDExpr
Abstract expression node. Subclasses describe computations over NDIter operands; Compile() produces an NDInnerLoopFunc.
- NDFlatIterator
Flat (1-D, C-order) element iterator — the NumSharp analog of NumPy's
flatiter, used by iters.It wraps an operand already broadcast (via broadcast_to(NDArray, Shape)) to the broadcast result shape, and yields each logical element — in C-order, expanding stride-0 (broadcast) dimensions — exactly like NumPy's
broadcast.iters[i]:// numpy: np.broadcast([1,2,3], [[10],[20]]).iters[0] -> 1,2,3,1,2,3 // .iters[1] -> 10,10,10,20,20,20The broadcast expansion is the same Shape/stride machinery NDIter uses; element access resolves the (possibly stride-0) coordinates per step, so no buffer is materialized.
- NDIter
Static iterator helper methods (backward compatible API).
NUMSHARP DIVERGENCE: These methods support unlimited dimensions via dynamic allocation. Dimension arrays are allocated on demand and freed after use.
- NDIterBufferManager
Buffer management for NDIter. Handles allocation, copy-in, and copy-out of iteration buffers.
- NDIterCasting
Type casting utilities for NDIter. Validates casting rules and performs type conversions.
- NDIterCoalescing
Axis coalescing logic for NDIter. Merges adjacent compatible axes to reduce iteration overhead.
NUMSHARP DIVERGENCE: This implementation supports unlimited dimensions. Uses StridesNDim for stride array indexing (allocated based on actual ndim).
- NDIterConstants
NDIter-related bit-packing constants that don't belong on the flag enums.
- NDIterExecution
Execution helpers for different paths.
- NDIterFlagMasks
Bit masks that partition the NDIter flag space into global (bits 0-15) and per-operand (bits 16-31) regions. Matches NumPy's NPY_ITER_GLOBAL_FLAGS and NPY_ITER_PER_OP_FLAGS macros.
- NDIterPathSelector
Execution path selection logic.
- NDIterUtils
Helper utilities for NDIter op_axes encoding/decoding.
Structs
- ComplexArgAccumulator
ArgMin/ArgMax accumulator for Complex — lexicographic best plus the same running-index / NaN-index bookkeeping as HalfArgAccumulator.
- ComplexMaxKernel
Complex max via lexicographic (real, then imaginary) compare. On the first NaN-bearing element the kernel stores it verbatim and aborts (see ComplexMinMaxAccumulator).
- ComplexMinMaxAccumulator
Min/Max accumulator for Complex: the running lexicographic extremum, a "seen any value" flag, and a NaN flag. On the first element whose real OR imaginary part is NaN the kernel stores that element VERBATIM in Best and aborts — matching NumPy's minimum/maximum, which return the NaN-bearing operand as-is (e.g. min([1+1j, nan+0j]) → (nan,0), not (nan,nan)). Because the iterator runs in NPY_KEEPORDER, the element captured is the first NaN in MEMORY order, which is exactly what NumPy's reduce returns for a non-C-contiguous (e.g. transposed) array.
- ComplexProdKernel
Complex product. The cross-term multiply (a+bi)(c+di) cannot be expressed as an independent-lane SIMD reduction, so this is a scalar fold; the win over the delegate path is devirtualization + a register-held accumulator.
- ComplexSumKernel
Complex sum. When the inner loop is contiguous (stride == 16 bytes = one Complex) and Vector256 is hardware-accelerated, the chunk is summed as a flat double stream with two Vector256<double> lanes (real/imag interleaved survive the lane reduction), then the tail is added scalar. Non-contiguous chunks add scalar. The SIMD reassociation differs from a strict left fold only at ULP level (same class as the codebase's pairwise reductions).
- HalfArgAccumulator
ArgMin/ArgMax accumulator for Half. Cur is the running C-order flat index of the chunk's first element (callers MUST use NPY_CORDER). BestIdx starts at -1 as the "no value yet" sentinel. SawNaNIdx is the flat index of the first NaN (NumPy: argmin/argmax of an array containing NaN return the first NaN's index); -1 until a NaN is seen.
- HalfMaxKernel
Half max. Contiguous chunks (stride == 2) run a 4-accumulator unroll that breaks the per-element dependency chain (~1.3× the scalar fold); other strides take the scalar branch. NaN propagates: the kernel aborts the moment a NaN is seen so the caller returns Half.NaN.
- HalfMinKernel
Half min — mirror of HalfMaxKernel with the comparison and the unroll seeds inverted (+inf).
- HalfMinMaxAccumulator
Min/Max accumulator for Half: running extremum (held in double, the precision the codebase's f16 reductions already use), a "seen any value" flag for the empty/first-element guard, and a NaN flag. Any NaN ⇒ result NaN (NumPy: min/max with NaN propagates), so the kernel aborts on the first NaN it sees.
- NDIterRef
High-performance multi-operand iterator matching NumPy's nditer API.
- NDIterState
Core iterator state with dynamically allocated arrays for both dimensions and operands.
NUMSHARP DIVERGENCE: Unlike NumPy's fixed NPY_MAXDIMS=64 and NPY_MAXARGS=64, NumSharp supports unlimited dimensions AND unlimited operands. All arrays are allocated dynamically based on actual NDim and NOp values.
- NDMaxAxisKernel<T>
Max reduction kernel for axis operations.
- NDMinAxisKernel<T>
Min reduction kernel for axis operations.
- NDProdAxisKernel<T>
Product reduction kernel for axis operations.
- NDSumAxisKernel<T>
Sum reduction kernel for axis operations.
- NanMeanAccumulator
Accumulator for nanmean: running sum and count of non-NaN elements.
- NanMeanComplexAccumulator
nanmean accumulator for Complex: running Complex sum and non-NaN count.
- NanMinMaxDoubleAccumulator
Accumulator for NanMin/NanMax on double arrays.
- NanMinMaxFloatAccumulator
Accumulator for NanMin/NanMax: running extremum plus a flag indicating whether any non-NaN element has been seen. Returns NaN if all elements were NaN.
Interfaces
- INDAxisNumericReductionKernel<T>
Generic numeric axis reduction kernel interface. Used by NDAxisIter for sum, prod, min, max along an axis.
- INDInnerLoop
Struct-generic inner loop — zero-alloc alternative to NDInnerLoopFunc. Implementations should be
readonly struct; JIT specializes ExecuteGeneric<TKernel>(TKernel) per type and inlines the call.
- INDIterKernel
Interface for kernels that work with NDIter.
- INDReducingInnerLoop<TAccum>
Reduction variant — the accumulator is threaded through the outer loop so each inner-loop invocation can accumulate into the same scalar. Return false to abort iteration (early exit for Any/All).
Enums
- MemOverlap
Result of a memory-overlap query. Matches NumPy's
mem_overlap_t(mem_overlap.h:15-21).
- NDArrayMethodFlags
Flags characterizing the transfer (cast/copy) functions set up by an iterator. Matches NumPy's NPY_ARRAYMETHOD_FLAGS (dtype_api.h:66).
Packed into the top 8 bits of ItFlags at offset TRANSFERFLAGS_SHIFT (=24). Retrieved via GetTransferFlags() — the preferred way to check whether the iteration can run without the GIL (in NumPy) or might set FP errors.
- NDExprReduceKind
Reduction kinds supported by ReduceNode.
- NDIterExecutionPath
Execution path for NDIter operations.
- NDIterFlags
Iterator-level flags. Conceptually matches NumPy's NPY_ITFLAG_* constants.
NOTE: Bit positions differ from NumPy's implementation:
- NumPy uses bits 0-7 for IDENTPERM, NEGPERM, HASINDEX, etc.
- NumSharp reserves bits 0-7 for legacy compatibility flags (SourceBroadcast, SourceContiguous, DestinationContiguous)
- NumPy-equivalent flags are shifted to bits 8-15
This layout maintains backward compatibility with existing NumSharp code while adding NumPy parity flags. The semantic meaning of each flag matches NumPy, only the bit positions differ.
- NDIterGlobalFlags
Global flags passed to iterator construction. Bit values match NumPy's NPY_ITER_* constants exactly (see numpy/_core/include/numpy/ndarraytypes.h).
- NDIterOpFlags
Per-operand flags during iteration. Matches NumPy's NPY_OP_ITFLAG_* constants.
- NDIterPerOpFlags
Per-operand flags passed to iterator construction. Bit values match NumPy's NPY_ITER_* per-operand constants exactly (see numpy/_core/include/numpy/ndarraytypes.h). All values occupy the high 16 bits per NumPy's NPY_ITER_PER_OP_FLAGS mask (0xffff0000).
- NPY_CASTING
Casting rules enumeration matching NumPy's NPY_CASTING.
- NPY_ORDER
Iteration order enumeration matching NumPy's NPY_ORDER.
Delegates
- NDInnerLoopFunc
Inner-loop callback matching NumPy's
PyUFuncGenericFunction. Invoked once per outer iteration; processescountelements starting atdataptrs[op] with per-operand byte stridestrides[op].
- NDIterGetMultiIndexFunc
Function to get multi-index at current position.
- NDIterInnerLoopFunc
Inner loop kernel called by iterator.
- NDIterNextFunc
Function to advance iterator to next position. Returns true if more iterations remain.