Class NDIterCoalescing
Axis coalescing logic for NDIter. Merges adjacent compatible axes to reduce iteration overhead.
NUMSHARP DIVERGENCE: This implementation supports unlimited dimensions. Uses StridesNDim for stride array indexing (allocated based on actual ndim).
public static class NDIterCoalescing
- Inheritance
-
NDIterCoalescing
- Inherited Members
Methods
CoalesceAxes(ref NDIterState)
Coalesce adjacent axes that have compatible strides for all operands. Reduces ndim, improving iteration efficiency.
public static void CoalesceAxes(ref NDIterState state)
Parameters
stateNDIterState
FlipNegativeStrides(ref NDIterState)
Flip axes with all-negative strides for memory-order iteration.
NumPy's npyiter_flip_negative_strides():
- For each axis, check if ALL operands have negative or zero strides
- If so, negate the strides, adjust base pointers to start at the end, and mark the axis as flipped in the Perm array (perm[d] = -1 - perm[d])
- Sets NEGPERM flag and clears IDENTPERM
This allows the iterator to traverse memory in ascending order even for reversed arrays, improving cache efficiency.
public static bool FlipNegativeStrides(ref NDIterState state)
Parameters
stateNDIterStateIterator state to modify
Returns
- bool
True if any axes were flipped
IsContiguousForCoalescing(ref NDIterState)
Check if all operands are contiguous in the current internal axis order. This determines whether coalescing would preserve the iteration semantics for C/F order iteration.
For coalescing to preserve iteration order, all operands must be contiguous such that stride[i] * shape[i] == stride[i+1] for adjacent axes.
public static bool IsContiguousForCoalescing(ref NDIterState state)
Parameters
stateNDIterState
Returns
RemoveUnitAxes(ref NDIterState)
Remove size-1 axes from the internal representation. Each contributes exactly one coordinate (always 0) and stride 0 (fill invariant), so removal never changes the element-visit sequence — it restores a meaningful innermost axis for EXLOOP/kernels and the buffer-manager linearity test. NumPy reaches the same state through its UNCONDITIONAL npyiter_coalesce_axes (the strict trivial branch absorbs every stride-0 size-1 axis); NumSharp's full coalesce is gated on all-operands-contiguous, so the non-coalesced branch calls this instead — without it, a trailing size-1 axis sits innermost and collapses EXLOOP to one-element inner loops ((N,1) strided views ran N kernel invocations of count 1).
Must NOT be used when a multi-index or flat index is tracked — index reconstruction needs the original axis structure (NumPy likewise skips coalescing there).
public static void RemoveUnitAxes(ref NDIterState state)
Parameters
stateNDIterState
ReorderAxes(ref NDIterState)
Reorder axes for optimal memory access pattern. Prioritizes axes with stride=1 as innermost.
[Obsolete("Use ReorderAxesForCoalescing with order parameter instead")]
public static void ReorderAxes(ref NDIterState state)
Parameters
stateNDIterState
ReorderAxesForCoalescing(ref NDIterState, NPY_ORDER, bool)
Reorder axes for iteration based on the specified order. This is called BEFORE CoalesceAxes to enable full coalescing of contiguous arrays.
Order semantics (matching NumPy):
- C-order (NPY_CORDER): Last axis innermost (row-major logical order) Forces axes to [n-1, n-2, ..., 0] order regardless of memory layout
- F-order (NPY_FORTRANORDER): First axis innermost (column-major logical order) Forces axes to [0, 1, ..., n-1] order regardless of memory layout
- K-order (NPY_KEEPORDER): Follow memory layout (smallest stride innermost) Sorts by stride to maximize cache efficiency
- A-order (NPY_ANYORDER): Same as K-order
The Perm array tracks the mapping: Perm[internal_axis] = original_axis This allows GetMultiIndex to return coordinates in the original axis order.
public static void ReorderAxesForCoalescing(ref NDIterState state, NPY_ORDER order, bool forCoalescing = true)
Parameters
stateNDIterStateIterator state to modify
orderNPY_ORDERIteration order
forCoalescingboolIf true, sort for coalescing (ascending). If false, sort for memory-order iteration with MULTI_INDEX (descending). Only affects K-order; C and F orders are deterministic.
TryCoalesceInner(ref NDIterState)
Try to coalesce the inner dimension for better vectorization. Returns true if inner loop size increased.
public static bool TryCoalesceInner(ref NDIterState state)
Parameters
stateNDIterState