Table of Contents

Namespace NumSharp.Backends.Kernels

Classes

ILKernelGenerator

Binary operations (same-type) - contiguous kernels and generic helpers.

ReductionOpExtensions

Extension methods for ReductionOp.

ReductionTypeExtensions

Extension methods for NPTypeCode related to reductions.

SimdMatMul

High-performance SIMD matrix multiplication with cache blocking and panel packing. Single-threaded implementation achieving ~20 GFLOPS on modern CPUs.

Key optimizations:

  • GEBP (General Block Panel) algorithm with cache blocking
  • Full panel packing: A as [kc][MR] panels, B as [kc][NR] panels
  • 8x16 micro-kernel with 16 Vector256 accumulators
  • FMA (Fused Multiply-Add) for 2x FLOP throughput
  • 4x k-loop unrolling for instruction-level parallelism
StrideDetector

Stride-based pattern detection for selecting optimal SIMD execution paths. All methods are aggressively inlined for minimal dispatch overhead.

Structs

AxisReductionKernelKey

Cache key for axis reduction kernels. Reduces along a specific axis, producing an array with one fewer dimension.

BinaryScalarKernelKey

Cache key for binary scalar operation kernels. Identifies a unique kernel by LHS type, RHS type, result type, and operation.

ComparisonKernelKey

Cache key for comparison operation kernels. Identifies a unique kernel by LHS type, RHS type, operation, and execution path. Result type is always bool (NPTypeCode.Boolean).

ComparisonScalarKernelKey

Cache key for comparison scalar operation kernels. Identifies a unique kernel by LHS type, RHS type, and comparison operation. Result type is always bool.

CumulativeAxisKernelKey

Cache key for cumulative axis reduction kernels (cumsum along axis, etc.). Output has same shape as input, cumulative accumulation along specified axis.

CumulativeKernelKey

Cache key for cumulative reduction kernels (cumsum, etc.). Output has same shape as input, each element is accumulation of elements before it.

ElementReductionKernelKey

Cache key for element-wise (full array) reduction kernels. Reduces all elements to a single scalar value.

IndexCollector

A growable buffer for collecting long indices, backed by NDArray storage. Replaces LongIndexBuffer - uses NumSharp's existing unmanaged memory infrastructure.

MixedTypeKernelKey

Cache key for mixed-type binary operation kernels. Identifies a unique kernel by LHS type, RHS type, result type, operation, and execution path.

UnaryKernelKey

Cache key for unary operation kernels. Identifies a unique kernel by input type, output type, operation, and whether contiguous.

UnaryScalarKernelKey

Cache key for unary scalar operation kernels. Identifies a unique kernel by input type, output type, and operation.

Enums

BinaryOp

Binary operations supported by kernel providers.

ComparisonOp

Comparison operations supported by kernel providers. All comparison operations return bool (NPTypeCode.Boolean).

ExecutionPath

Execution paths for binary operations, selected based on stride analysis.

ReductionOp

Reduction operations supported by kernel providers.

UnaryOp

Unary operations supported by kernel providers.

Delegates

AxisReductionKernel

Delegate for axis reduction kernels. Reduces along a specific axis, writing to output array.

ComparisonKernel

Comparison operation kernel signature using void pointers. LHS and RHS may have different types, but result is always bool. Type conversion is handled internally by the generated IL.

ContiguousKernel<T>

Delegate for contiguous (SimdFull) binary operations. Simplified signature - no strides needed since both arrays are contiguous.

CumulativeAxisKernel

Delegate for cumulative axis reduction kernels. Computes running accumulation along a specific axis.

CumulativeKernel

Delegate for cumulative reduction kernels (cumsum, etc.). Output has same shape as input.

ElementReductionKernel

Delegate for element-wise reduction kernels. Reduces all elements of an array to a single value.

ILKernelGenerator.ShiftArrayKernel<T>

Delegate for shift operation with per-element shift amounts. This is the scalar loop path for element-wise shifts.

ILKernelGenerator.ShiftScalarKernel<T>

Delegate for shift operation with scalar shift amount. This is the SIMD-optimized path for uniform shifts.

MatMul2DKernel<T>

Kernel delegate for 2D matrix multiplication: C = A * B A is [M x K], B is [K x N], C is [M x N] All matrices are row-major contiguous.

MixedTypeKernel

Mixed-type binary operation kernel signature using void pointers. Handles operations where LHS, RHS, and result may have different types. Type conversion is handled internally by the generated IL.

TypedElementReductionKernel<TResult>

Delegate for typed element-wise reduction kernels. Returns the reduced value directly without boxing.

UnaryKernel

Unary operation kernel signature using void pointers. Handles operations where input and output may have different types. Type conversion is handled internally by the generated IL.

WhereKernel<T>

Delegate for where operation kernels.