Struct ComplexSumKernel
Complex sum. When the inner loop is contiguous (stride == 16 bytes = one Complex) and Vector256 is hardware-accelerated, the chunk is summed as a flat double stream with two Vector256<double> lanes (real/imag interleaved survive the lane reduction), then the tail is added scalar. Non-contiguous chunks add scalar. The SIMD reassociation differs from a strict left fold only at ULP level (same class as the codebase's pairwise reductions).
public readonly struct ComplexSumKernel : INDReducingInnerLoop<Complex>
- Implements
- Inherited Members
- Extension Methods
Methods
Execute(void**, long*, long, ref Complex)
public bool Execute(void** dataptrs, long* strides, long count, ref Complex sum)