Namespace NumSharp.Backends.Kernels

Classes

ILKernelGenerator: Binary operations (same-type) - contiguous kernels and generic helpers.

ReductionOpExtensions: Extension methods for ReductionOp.

ReductionTypeExtensions: Extension methods for NPTypeCode related to reductions.

High-performance SIMD matrix multiplication with cache blocking and panel packing. Single-threaded implementation achieving ~20 GFLOPS on modern CPUs.

Key optimizations:

GEBP (General Block Panel) algorithm with cache blocking
Full panel packing: A as [kc][MR] panels, B as [kc][NR] panels
8x16 micro-kernel with 16 Vector256 accumulators
FMA (Fused Multiply-Add) for 2x FLOP throughput
4x k-loop unrolling for instruction-level parallelism

StrideDetector: Stride-based pattern detection for selecting optimal SIMD execution paths. All methods are aggressively inlined for minimal dispatch overhead.

Structs

AxisReductionKernelKey: Cache key for axis reduction kernels. Reduces along a specific axis, producing an array with one fewer dimension.

BinaryScalarKernelKey: Cache key for binary scalar operation kernels. Identifies a unique kernel by LHS type, RHS type, result type, and operation.

ComparisonKernelKey: Cache key for comparison operation kernels. Identifies a unique kernel by LHS type, RHS type, operation, and execution path. Result type is always bool (NPTypeCode.Boolean).

ComparisonScalarKernelKey: Cache key for comparison scalar operation kernels. Identifies a unique kernel by LHS type, RHS type, and comparison operation. Result type is always bool.

CumulativeAxisKernelKey: Cache key for cumulative axis reduction kernels (cumsum along axis, etc.). Output has same shape as input, cumulative accumulation along specified axis.

CumulativeKernelKey: Cache key for cumulative reduction kernels (cumsum, etc.). Output has same shape as input, each element is accumulation of elements before it.

ElementReductionKernelKey: Cache key for element-wise (full array) reduction kernels. Reduces all elements to a single scalar value.

IndexCollector: A growable buffer for collecting long indices, backed by NDArray storage. Replaces LongIndexBuffer - uses NumSharp's existing unmanaged memory infrastructure.

MixedTypeKernelKey: Cache key for mixed-type binary operation kernels. Identifies a unique kernel by LHS type, RHS type, result type, operation, and execution path.

UnaryKernelKey: Cache key for unary operation kernels. Identifies a unique kernel by input type, output type, operation, and whether contiguous.

UnaryScalarKernelKey: Cache key for unary scalar operation kernels. Identifies a unique kernel by input type, output type, and operation.

Enums

BinaryOp: Binary operations supported by kernel providers.

ComparisonOp: Comparison operations supported by kernel providers. All comparison operations return bool (NPTypeCode.Boolean).

ExecutionPath: Execution paths for binary operations, selected based on stride analysis.

ReductionOp: Reduction operations supported by kernel providers.

UnaryOp: Unary operations supported by kernel providers.

Delegates

AxisReductionKernel: Delegate for axis reduction kernels. Reduces along a specific axis, writing to output array.

ComparisonKernel: Comparison operation kernel signature using void pointers. LHS and RHS may have different types, but result is always bool. Type conversion is handled internally by the generated IL.

ContiguousKernel<T>: Delegate for contiguous (SimdFull) binary operations. Simplified signature - no strides needed since both arrays are contiguous.

CumulativeAxisKernel: Delegate for cumulative axis reduction kernels. Computes running accumulation along a specific axis.

CumulativeKernel: Delegate for cumulative reduction kernels (cumsum, etc.). Output has same shape as input.

ElementReductionKernel: Delegate for element-wise reduction kernels. Reduces all elements of an array to a single value.

ILKernelGenerator.ShiftArrayKernel<T>: Delegate for shift operation with per-element shift amounts. This is the scalar loop path for element-wise shifts.

ILKernelGenerator.ShiftScalarKernel<T>: Delegate for shift operation with scalar shift amount. This is the SIMD-optimized path for uniform shifts.

MatMul2DKernel<T>: Kernel delegate for 2D matrix multiplication: C = A * B A is [M x K], B is [K x N], C is [M x N] All matrices are row-major contiguous.

MixedTypeKernel: Mixed-type binary operation kernel signature using void pointers. Handles operations where LHS, RHS, and result may have different types. Type conversion is handled internally by the generated IL.

TypedElementReductionKernel<TResult>: Delegate for typed element-wise reduction kernels. Returns the reduced value directly without boxing.

UnaryKernel: Unary operation kernel signature using void pointers. Handles operations where input and output may have different types. Type conversion is handled internally by the generated IL.

WhereKernel<T>: Delegate for where operation kernels.