Namespace NumSharp.Backends.Kernels
Classes
- ILKernelGenerator
Binary operations (same-type) - contiguous kernels and generic helpers.
- ReductionOpExtensions
Extension methods for ReductionOp.
- ReductionTypeExtensions
Extension methods for NPTypeCode related to reductions.
- SimdMatMul
High-performance SIMD matrix multiplication with cache blocking and panel packing. Single-threaded implementation achieving ~20 GFLOPS on modern CPUs.
Key optimizations:
- GEBP (General Block Panel) algorithm with cache blocking
- Full panel packing: A as [kc][MR] panels, B as [kc][NR] panels
- 8x16 micro-kernel with 16 Vector256 accumulators
- FMA (Fused Multiply-Add) for 2x FLOP throughput
- 4x k-loop unrolling for instruction-level parallelism
- SimdThresholds
Minimum element counts for SIMD to be beneficial. Below these thresholds, the overhead of SIMD setup may exceed the benefits. Based on Vector256 (32 bytes) width.
- StrideDetector
Stride-based pattern detection for selecting optimal SIMD execution paths. All methods are aggressively inlined for minimal dispatch overhead.
- TypeRules
Shared type rules for kernel providers.
Structs
- AxisReductionKernelKey
Cache key for axis reduction kernels. Reduces along a specific axis, producing an array with one fewer dimension.
- BinaryScalarKernelKey
Cache key for binary scalar operation kernels. Identifies a unique kernel by LHS type, RHS type, result type, and operation.
- BinaryScalarKey
Cache key for scalar binary operations. Used for element-by-element operations in general/broadcast paths.
- ComparisonKernelKey
Cache key for comparison operation kernels. Identifies a unique kernel by LHS type, RHS type, operation, and execution path. Result type is always bool (NPTypeCode.Boolean).
- ComparisonScalarKernelKey
Cache key for comparison scalar operation kernels. Identifies a unique kernel by LHS type, RHS type, and comparison operation. Result type is always bool.
- ComparisonScalarKey
Cache key for scalar comparison operations. Used for element-by-element comparisons in general/broadcast paths.
- ContiguousKernelKey
Cache key for same-type contiguous binary operations. Used for fast-path SIMD kernels when both operands are contiguous with identical type.
- CumulativeAxisKernelKey
Cache key for cumulative axis reduction kernels (cumsum along axis, etc.). Output has same shape as input, cumulative accumulation along specified axis.
- CumulativeKernelKey
Cache key for cumulative reduction kernels (cumsum, etc.). Output has same shape as input, each element is accumulation of elements before it.
- ElementReductionKernelKey
Cache key for element-wise (full array) reduction kernels. Reduces all elements to a single scalar value.
- MixedTypeKernelKey
Cache key for mixed-type binary operation kernels. Identifies a unique kernel by LHS type, RHS type, result type, operation, and execution path.
- UnaryKernelKey
Cache key for unary operation kernels. Identifies a unique kernel by input type, output type, operation, and whether contiguous.
- UnaryScalarKernelKey
Cache key for unary scalar operation kernels. Identifies a unique kernel by input type, output type, and operation.
- UnaryScalarKey
Cache key for scalar unary operations. Used for element-by-element operations in general/broadcast paths.
Enums
- BinaryOp
Binary operations supported by kernel providers.
- ComparisonOp
Comparison operations supported by kernel providers. All comparison operations return bool (NPTypeCode.Boolean).
- ExecutionPath
Execution paths for binary operations, selected based on stride analysis.
- ReductionOp
Reduction operations supported by kernel providers.
- ReductionPath
Execution path for reduction operations.
- UnaryOp
Unary operations supported by kernel providers.
Delegates
- AxisReductionKernel
Delegate for axis reduction kernels. Reduces along a specific axis, writing to output array.
- BinaryKernel<T>
Unified binary operation kernel signature. All binary operations (Add, Sub, Mul, Div, Mod) use this interface. The kernel handles pattern detection and dispatch internally.
- BinaryScalar<TLhs, TRhs, TOut>
Binary scalar function delegate. Used for element-by-element binary operations in broadcasting and general paths.
- ComparisonKernel
Comparison operation kernel signature using void pointers. LHS and RHS may have different types, but result is always bool. Type conversion is handled internally by the generated IL.
- ComparisonScalar<TLhs, TRhs>
Comparison scalar function delegate. Used for element-by-element comparisons returning boolean.
- ContiguousKernel<T>
Delegate for contiguous (SimdFull) binary operations. Simplified signature - no strides needed since both arrays are contiguous.
- CumulativeAxisKernel
Delegate for cumulative axis reduction kernels. Computes running accumulation along a specific axis.
- CumulativeKernel
Delegate for cumulative reduction kernels (cumsum, etc.). Output has same shape as input.
- ElementReductionKernel
Delegate for element-wise reduction kernels. Reduces all elements of an array to a single value.
- ILKernelGenerator.ShiftArrayKernel<T>
Delegate for shift operation with per-element shift amounts. This is the scalar loop path for element-wise shifts.
- ILKernelGenerator.ShiftScalarKernel<T>
Delegate for shift operation with scalar shift amount. This is the SIMD-optimized path for uniform shifts.
- MatMul2DKernel<T>
Kernel delegate for 2D matrix multiplication: C = A * B A is [M x K], B is [K x N], C is [M x N] All matrices are row-major contiguous.
- MixedTypeKernel
Mixed-type binary operation kernel signature using void pointers. Handles operations where LHS, RHS, and result may have different types. Type conversion is handled internally by the generated IL.
- SimpleReductionKernel<T>
Simple contiguous reduction kernel returning single value. Used for full-array reductions when input is contiguous.
- TypedAxisReductionKernel<T>
Typed axis reduction kernel with generic element type. Reduces along a specific axis with strongly-typed pointers.
- TypedElementReductionKernel<TResult>
Delegate for typed element-wise reduction kernels. Returns the reduced value directly without boxing.
- UnaryKernel
Unary operation kernel signature using void pointers. Handles operations where input and output may have different types. Type conversion is handled internally by the generated IL.
- UnaryKernelStrided<TIn, TOut>
Strided unary operation kernel with explicit offset and stride parameters. Used when input/output arrays are not contiguous in memory.
- UnaryScalar<TIn, TOut>
Unary scalar function delegate. Used for element-by-element operations in broadcasting and general paths.