Table of Contents

Layout suite — reduction / copy / elementwise × memory layout × dtype

ratio = NumPy_ms / NumSharp_ms — >1.0 = NumSharp faster. ✅≥1.0 🟡≥0.5 🟠≥0.2 🔴<0.2. Layouts (8, harmonized with the cast subsystem): C, F (Fortran), T (transpose), strided [:, ::2], sliced (offset), negrow [::-1,:], negcol [:,::-1], bcast (stride-0). Fills the op-matrix's blind spot (it measures C-contiguous only). 100K + 1M elements, best-of-rounds.

Reduction (sum/min/max/prod, both axes)

Geomean by lay

size C F T strided negrow negcol sliced bcast
100K 1.01 ✅ 1.05 ✅ 1.04 ✅ 0.49 🟠 0.61 🟡 0.56 🟡 0.68 🟡 0.59 🟡
1M 0.92 🟡 0.94 🟡 0.94 🟡 0.56 🟡 0.76 🟡 0.58 🟡 0.70 🟡 0.50 🟡

Geomean by dt

size f64 f32 c128 dec f16 i32 i64
100K 0.78 🟡 0.85 🟡 1.01 ✅ 0.08 🔴 1.04 ✅ 0.89 🟡 1.15 ✅
1M 0.88 🟡 1.05 ✅ 1.02 ✅ 0.07 🔴 1.00 ✅ 0.80 🟡 1.02 ✅

Geomean by op

size sum min max prod
100K 0.72 🟡 0.61 🟡 0.61 🟡 1.06 ✅
1M 0.74 🟡 0.59 🟡 0.58 🟡 1.14 ✅

Worst 15 cells (NumSharp slowest vs NumPy)

key NumSharp ms NumPy ms ratio
1M|dec|bcast|sum|ax0 5.3741 0.0657 0.01 🔴
100K|dec|bcast|sum|ax0 0.5412 0.0101 0.02 🔴
1M|dec|sliced|sum|ax0 5.4381 0.1108 0.02 🔴
100K|dec|C|sum|ax0 0.5494 0.0114 0.02 🔴
1M|dec|negrow|sum|ax0 5.4461 0.1134 0.02 🔴
100K|dec|negrow|sum|ax0 0.5371 0.0114 0.02 🔴
100K|dec|T|sum|ax1 0.5356 0.0114 0.02 🔴
1M|dec|F|sum|ax1 5.3609 0.1202 0.02 🔴
1M|dec|T|sum|ax1 5.5377 0.1270 0.02 🔴
100K|dec|sliced|sum|ax0 0.5332 0.0123 0.02 🔴
100K|dec|F|sum|ax1 0.5381 0.0126 0.02 🔴
1M|dec|C|sum|ax0 5.4777 0.1298 0.02 🔴
1M|i32|bcast|sum|ax1 4.1105 0.1203 0.03 🔴
100K|i32|bcast|sum|ax1 0.4050 0.0168 0.04 🔴
100K|dec|C|max|ax0 0.2871 0.0137 0.05 🔴

Copy / identity-ufunc (np.positive)

Geomean by lay

size C F T strided sliced negrow negcol bcast
100K 1.16 ✅ 1.47 ✅ 1.25 ✅ 0.86 🟡 1.56 ✅ 1.63 ✅ 2.22 ✅ 1.57 ✅
1M 2.87 ✅ 2.95 ✅ 2.89 ✅ 1.96 ✅ 2.67 ✅ 2.62 ✅ 3.24 ✅ 2.67 ✅

Geomean by dt

size u8 i8 i16 u16 i32 u32 i64 u64 char f16 f32 f64 c128
100K 0.93 🟡 1.41 ✅ 1.75 ✅ 1.80 ✅ 0.99 🟡 1.09 ✅ 1.66 ✅ 1.03 ✅ 2.44 ✅ 2.15 ✅ 1.06 ✅ 0.84 🟡 2.60 ✅
1M 4.51 ✅ 4.77 ✅ 2.15 ✅ 2.11 ✅ 2.11 ✅ 2.22 ✅ 2.61 ✅ 2.64 ✅ 2.15 ✅ 2.14 ✅ 2.14 ✅ 2.57 ✅ 5.31 ✅

Worst 15 cells (NumSharp slowest vs NumPy)

key NumSharp ms NumPy ms ratio
100K|i64|strided|pos 0.0257 0.0110 0.43 🟠
100K|f64|strided|pos 0.0232 0.0105 0.45 🟠
100K|u64|strided|pos 0.0240 0.0109 0.45 🟠
100K|i64|C|pos 0.0441 0.0208 0.47 🟠
100K|f32|strided|pos 0.0202 0.0103 0.51 🟡
100K|i32|strided|pos 0.0200 0.0109 0.54 🟡
100K|i64|T|pos 0.0376 0.0208 0.55 🟡
100K|u8|negrow|pos 0.0419 0.0232 0.56 🟡
100K|u8|sliced|pos 0.0410 0.0230 0.56 🟡
100K|u8|bcast|pos 0.0409 0.0231 0.56 🟡
100K|f64|C|pos 0.0418 0.0242 0.58 🟡
100K|f64|F|pos 0.0388 0.0227 0.58 🟡
100K|u32|strided|pos 0.0180 0.0112 0.62 🟡
100K|f64|T|pos 0.0364 0.0230 0.63 🟡
100K|u64|T|pos 0.0358 0.0290 0.81 🟡

Elementwise (add/mul/neg/abs/sqrt/less/copy)

Geomean by lay

size C F T strided sliced negrow negcol bcast
100K 0.54 🟡 0.75 🟡 0.68 🟡 0.53 🟡 0.84 🟡 0.81 🟡 1.16 ✅ 0.84 🟡
1M 1.57 ✅ 1.54 ✅ 1.53 ✅ 1.15 ✅ 1.66 ✅ 1.65 ✅ 1.80 ✅ 1.67 ✅

Geomean by dt

size f64 f32 c128 f16 i32 i64
100K 0.58 🟡 0.53 🟡 1.18 ✅ 0.75 🟡 0.79 🟡 0.81 🟡
1M 1.91 ✅ 1.59 ✅ 1.74 ✅ 0.96 🟡 1.55 ✅ 1.82 ✅

Geomean by op

size add mul neg abs sqrt less copy
100K 0.97 🟡 0.93 🟡 0.71 🟡 0.70 🟡 0.94 🟡 0.61 🟡 0.50 🟡
1M 1.81 ✅ 1.80 ✅ 2.12 ✅ 1.66 ✅ 1.55 ✅ 0.69 🟡 1.82 ✅

Worst 15 cells (NumSharp slowest vs NumPy)

key NumSharp ms NumPy ms ratio
100K|f64|strided|abs 0.0481 0.0075 0.16 🔴
100K|f64|C|copy 0.0700 0.0112 0.16 🔴
100K|f64|strided|neg 0.0502 0.0082 0.16 🔴
100K|f64|C|abs 0.0653 0.0113 0.17 🔴
100K|f32|C|abs 0.0310 0.0056 0.18 🔴
100K|f16|C|copy 0.0169 0.0031 0.18 🔴
100K|f64|C|mul 0.0690 0.0129 0.19 🔴
100K|f64|C|neg 0.0649 0.0129 0.20 🔴
100K|f64|C|add 0.0638 0.0130 0.20 🟠
100K|f16|negrow|copy 0.0178 0.0038 0.21 🟠
100K|f32|C|mul 0.0316 0.0069 0.22 🟠
100K|i32|bcast|copy 0.0224 0.0050 0.22 🟠
100K|f32|C|add 0.0311 0.0069 0.22 🟠
100K|f32|T|neg 0.0270 0.0060 0.22 🟠
100K|f32|T|add 0.0302 0.0068 0.23 🟠