Class ILKernelGenerator
Generates per-chunk IL kernels for NDIter-driven execution.
Kernels emitted here are called as the inner loop of an NDIter iteration — once per chunk, with dataptrs/strides/count provided by the iterator. The kernel does no axis or stride walking of its own.
Add new kernel families in ILKernelGenerator.<Op>.cs
partial files. See DirectILKernelGenerator for the
legacy whole-array kernels currently being migrated to this model.
public static class ILKernelGenerator
- Inheritance
-
ILKernelGenerator
- Inherited Members
Methods
GetCumSumInnerLoop(NPTypeCode, NPTypeCode)
Returns the NDIter-driven cumulative-sum inner loop for
inType → accType. The returned
delegate matches NDInnerLoopFunc; drive it with an
iterator whose scan axis has been removed (see
DefaultEngine.AccumulateAxis), passing a pointer to a
ILKernelGenerator.ScanAxisAux as the kernel's auxdata.
public static NDInnerLoopFunc GetCumSumInnerLoop(NPTypeCode inType, NPTypeCode accType)
Parameters
inTypeNPTypeCodeaccTypeNPTypeCode
Returns
GetReduceInnerLoop(ReduceKernelKey)
Returns the cached per-chunk reduction kernel for the given (op, input, accumulator) triple, or null when no NDIter-driven kernel exists yet (caller falls back to the DirectILKernelGenerator path). The returned delegate matches NDInnerLoopFunc; hand it to an iterator built by NewReduce(NDArray, NDArray, int, NDIterGlobalFlags).
public static NDInnerLoopFunc GetReduceInnerLoop(ILKernelGenerator.ReduceKernelKey key)
Parameters
Returns
MeanDivideByCount(NDArray, long)
Divide every element of output by count in
place — the post-pass that turns an accumulated axis Sum into a Mean. For Complex
this divides both components by the real count (NumPy: mean = sum / n),
which is exactly what the legacy MeanAxisComplex did per element but without
its per-output-row NDArray allocation. Writes through
SetAtIndex(object, long) so any output layout is honored.
public static void MeanDivideByCount(NDArray output, long count)
Parameters
SeedReduceIdentity(NDArray, ReductionOp)
Pre-fill output with the reduction identity for
op before driving a REDUCE iterator. Required because
the per-chunk kernels fold into the existing output slot(s). Writes
through SetAtIndex(object, long) so any output
layout (contiguous fresh alloc or user-supplied view) is honored.
public static void SeedReduceIdentity(NDArray output, ReductionOp op)
Parameters
outputNDArrayopReductionOp