Table of Contents

Namespace NumSharp.Backends.Kernels

Classes

ILKernelGenerator

Binary operations (same-type) - contiguous kernels and generic helpers.

ReductionOpExtensions

Extension methods for ReductionOp.

ReductionTypeExtensions

Extension methods for NPTypeCode related to reductions.

SimdMatMul

High-performance SIMD matrix multiplication with cache blocking and panel packing. Single-threaded implementation achieving ~20 GFLOPS on modern CPUs.

Key optimizations:

  • GEBP (General Block Panel) algorithm with cache blocking
  • Full panel packing: A as [kc][MR] panels, B as [kc][NR] panels
  • 8x16 micro-kernel with 16 Vector256 accumulators
  • FMA (Fused Multiply-Add) for 2x FLOP throughput
  • 4x k-loop unrolling for instruction-level parallelism
SimdThresholds

Minimum element counts for SIMD to be beneficial. Below these thresholds, the overhead of SIMD setup may exceed the benefits. Based on Vector256 (32 bytes) width.

StrideDetector

Stride-based pattern detection for selecting optimal SIMD execution paths. All methods are aggressively inlined for minimal dispatch overhead.

TypeRules

Shared type rules for kernel providers.

Structs

AxisReductionKernelKey

Cache key for axis reduction kernels. Reduces along a specific axis, producing an array with one fewer dimension.

BinaryScalarKernelKey

Cache key for binary scalar operation kernels. Identifies a unique kernel by LHS type, RHS type, result type, and operation.

BinaryScalarKey

Cache key for scalar binary operations. Used for element-by-element operations in general/broadcast paths.

ComparisonKernelKey

Cache key for comparison operation kernels. Identifies a unique kernel by LHS type, RHS type, operation, and execution path. Result type is always bool (NPTypeCode.Boolean).

ComparisonScalarKernelKey

Cache key for comparison scalar operation kernels. Identifies a unique kernel by LHS type, RHS type, and comparison operation. Result type is always bool.

ComparisonScalarKey

Cache key for scalar comparison operations. Used for element-by-element comparisons in general/broadcast paths.

ContiguousKernelKey

Cache key for same-type contiguous binary operations. Used for fast-path SIMD kernels when both operands are contiguous with identical type.

CumulativeAxisKernelKey

Cache key for cumulative axis reduction kernels (cumsum along axis, etc.). Output has same shape as input, cumulative accumulation along specified axis.

CumulativeKernelKey

Cache key for cumulative reduction kernels (cumsum, etc.). Output has same shape as input, each element is accumulation of elements before it.

ElementReductionKernelKey

Cache key for element-wise (full array) reduction kernels. Reduces all elements to a single scalar value.

MixedTypeKernelKey

Cache key for mixed-type binary operation kernels. Identifies a unique kernel by LHS type, RHS type, result type, operation, and execution path.

UnaryKernelKey

Cache key for unary operation kernels. Identifies a unique kernel by input type, output type, operation, and whether contiguous.

UnaryScalarKernelKey

Cache key for unary scalar operation kernels. Identifies a unique kernel by input type, output type, and operation.

UnaryScalarKey

Cache key for scalar unary operations. Used for element-by-element operations in general/broadcast paths.

Enums

BinaryOp

Binary operations supported by kernel providers.

ComparisonOp

Comparison operations supported by kernel providers. All comparison operations return bool (NPTypeCode.Boolean).

ExecutionPath

Execution paths for binary operations, selected based on stride analysis.

ReductionOp

Reduction operations supported by kernel providers.

ReductionPath

Execution path for reduction operations.

UnaryOp

Unary operations supported by kernel providers.

Delegates

AxisReductionKernel

Delegate for axis reduction kernels. Reduces along a specific axis, writing to output array.

BinaryKernel<T>

Unified binary operation kernel signature. All binary operations (Add, Sub, Mul, Div, Mod) use this interface. The kernel handles pattern detection and dispatch internally.

BinaryScalar<TLhs, TRhs, TOut>

Binary scalar function delegate. Used for element-by-element binary operations in broadcasting and general paths.

ComparisonKernel

Comparison operation kernel signature using void pointers. LHS and RHS may have different types, but result is always bool. Type conversion is handled internally by the generated IL.

ComparisonScalar<TLhs, TRhs>

Comparison scalar function delegate. Used for element-by-element comparisons returning boolean.

ContiguousKernel<T>

Delegate for contiguous (SimdFull) binary operations. Simplified signature - no strides needed since both arrays are contiguous.

CumulativeAxisKernel

Delegate for cumulative axis reduction kernels. Computes running accumulation along a specific axis.

CumulativeKernel

Delegate for cumulative reduction kernels (cumsum, etc.). Output has same shape as input.

ElementReductionKernel

Delegate for element-wise reduction kernels. Reduces all elements of an array to a single value.

ILKernelGenerator.ShiftArrayKernel<T>

Delegate for shift operation with per-element shift amounts. This is the scalar loop path for element-wise shifts.

ILKernelGenerator.ShiftScalarKernel<T>

Delegate for shift operation with scalar shift amount. This is the SIMD-optimized path for uniform shifts.

MatMul2DKernel<T>

Kernel delegate for 2D matrix multiplication: C = A * B A is [M x K], B is [K x N], C is [M x N] All matrices are row-major contiguous.

MixedTypeKernel

Mixed-type binary operation kernel signature using void pointers. Handles operations where LHS, RHS, and result may have different types. Type conversion is handled internally by the generated IL.

SimpleReductionKernel<T>

Simple contiguous reduction kernel returning single value. Used for full-array reductions when input is contiguous.

TypedAxisReductionKernel<T>

Typed axis reduction kernel with generic element type. Reduces along a specific axis with strongly-typed pointers.

TypedElementReductionKernel<TResult>

Delegate for typed element-wise reduction kernels. Returns the reduced value directly without boxing.

UnaryKernel

Unary operation kernel signature using void pointers. Handles operations where input and output may have different types. Type conversion is handled internally by the generated IL.

UnaryKernelStrided<TIn, TOut>

Strided unary operation kernel with explicit offset and stride parameters. Used when input/output arrays are not contiguous in memory.

UnaryScalar<TIn, TOut>

Unary scalar function delegate. Used for element-by-element operations in broadcasting and general paths.