Layout suite — reduction / copy / elementwise × memory layout × dtype
ratio = NumPy_ms / NumSharp_ms — >1.0 = NumSharp faster. ✅≥1.0 🟡≥0.5 🟠≥0.2 🔴<0.2.
Layouts (8, harmonized with the cast subsystem): C, F (Fortran), T (transpose), strided [:, ::2], sliced (offset), negrow [::-1,:], negcol [:,::-1], bcast (stride-0). Fills the op-matrix's blind spot (it measures C-contiguous only). 100K + 1M elements, best-of-rounds.
Reduction (sum/min/max/prod, both axes)
Geomean by lay
| size | C | F | T | strided | negrow | negcol | sliced | bcast |
|---|---|---|---|---|---|---|---|---|
| 100K | 1.01 ✅ | 1.05 ✅ | 1.04 ✅ | 0.49 🟠 | 0.61 🟡 | 0.56 🟡 | 0.68 🟡 | 0.59 🟡 |
| 1M | 0.92 🟡 | 0.94 🟡 | 0.94 🟡 | 0.56 🟡 | 0.76 🟡 | 0.58 🟡 | 0.70 🟡 | 0.50 🟡 |
Geomean by dt
| size | f64 | f32 | c128 | dec | f16 | i32 | i64 |
|---|---|---|---|---|---|---|---|
| 100K | 0.78 🟡 | 0.85 🟡 | 1.01 ✅ | 0.08 🔴 | 1.04 ✅ | 0.89 🟡 | 1.15 ✅ |
| 1M | 0.88 🟡 | 1.05 ✅ | 1.02 ✅ | 0.07 🔴 | 1.00 ✅ | 0.80 🟡 | 1.02 ✅ |
Geomean by op
| size | sum | min | max | prod |
|---|---|---|---|---|
| 100K | 0.72 🟡 | 0.61 🟡 | 0.61 🟡 | 1.06 ✅ |
| 1M | 0.74 🟡 | 0.59 🟡 | 0.58 🟡 | 1.14 ✅ |
Worst 15 cells (NumSharp slowest vs NumPy)
| key | NumSharp ms | NumPy ms | ratio |
|---|---|---|---|
| 1M|dec|bcast|sum|ax0 | 5.3741 | 0.0657 | 0.01 🔴 |
| 100K|dec|bcast|sum|ax0 | 0.5412 | 0.0101 | 0.02 🔴 |
| 1M|dec|sliced|sum|ax0 | 5.4381 | 0.1108 | 0.02 🔴 |
| 100K|dec|C|sum|ax0 | 0.5494 | 0.0114 | 0.02 🔴 |
| 1M|dec|negrow|sum|ax0 | 5.4461 | 0.1134 | 0.02 🔴 |
| 100K|dec|negrow|sum|ax0 | 0.5371 | 0.0114 | 0.02 🔴 |
| 100K|dec|T|sum|ax1 | 0.5356 | 0.0114 | 0.02 🔴 |
| 1M|dec|F|sum|ax1 | 5.3609 | 0.1202 | 0.02 🔴 |
| 1M|dec|T|sum|ax1 | 5.5377 | 0.1270 | 0.02 🔴 |
| 100K|dec|sliced|sum|ax0 | 0.5332 | 0.0123 | 0.02 🔴 |
| 100K|dec|F|sum|ax1 | 0.5381 | 0.0126 | 0.02 🔴 |
| 1M|dec|C|sum|ax0 | 5.4777 | 0.1298 | 0.02 🔴 |
| 1M|i32|bcast|sum|ax1 | 4.1105 | 0.1203 | 0.03 🔴 |
| 100K|i32|bcast|sum|ax1 | 0.4050 | 0.0168 | 0.04 🔴 |
| 100K|dec|C|max|ax0 | 0.2871 | 0.0137 | 0.05 🔴 |
Copy / identity-ufunc (np.positive)
Geomean by lay
| size | C | F | T | strided | sliced | negrow | negcol | bcast |
|---|---|---|---|---|---|---|---|---|
| 100K | 1.16 ✅ | 1.47 ✅ | 1.25 ✅ | 0.86 🟡 | 1.56 ✅ | 1.63 ✅ | 2.22 ✅ | 1.57 ✅ |
| 1M | 2.87 ✅ | 2.95 ✅ | 2.89 ✅ | 1.96 ✅ | 2.67 ✅ | 2.62 ✅ | 3.24 ✅ | 2.67 ✅ |
Geomean by dt
| size | u8 | i8 | i16 | u16 | i32 | u32 | i64 | u64 | char | f16 | f32 | f64 | c128 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 100K | 0.93 🟡 | 1.41 ✅ | 1.75 ✅ | 1.80 ✅ | 0.99 🟡 | 1.09 ✅ | 1.66 ✅ | 1.03 ✅ | 2.44 ✅ | 2.15 ✅ | 1.06 ✅ | 0.84 🟡 | 2.60 ✅ |
| 1M | 4.51 ✅ | 4.77 ✅ | 2.15 ✅ | 2.11 ✅ | 2.11 ✅ | 2.22 ✅ | 2.61 ✅ | 2.64 ✅ | 2.15 ✅ | 2.14 ✅ | 2.14 ✅ | 2.57 ✅ | 5.31 ✅ |
Worst 15 cells (NumSharp slowest vs NumPy)
| key | NumSharp ms | NumPy ms | ratio |
|---|---|---|---|
| 100K|i64|strided|pos | 0.0257 | 0.0110 | 0.43 🟠 |
| 100K|f64|strided|pos | 0.0232 | 0.0105 | 0.45 🟠 |
| 100K|u64|strided|pos | 0.0240 | 0.0109 | 0.45 🟠 |
| 100K|i64|C|pos | 0.0441 | 0.0208 | 0.47 🟠 |
| 100K|f32|strided|pos | 0.0202 | 0.0103 | 0.51 🟡 |
| 100K|i32|strided|pos | 0.0200 | 0.0109 | 0.54 🟡 |
| 100K|i64|T|pos | 0.0376 | 0.0208 | 0.55 🟡 |
| 100K|u8|negrow|pos | 0.0419 | 0.0232 | 0.56 🟡 |
| 100K|u8|sliced|pos | 0.0410 | 0.0230 | 0.56 🟡 |
| 100K|u8|bcast|pos | 0.0409 | 0.0231 | 0.56 🟡 |
| 100K|f64|C|pos | 0.0418 | 0.0242 | 0.58 🟡 |
| 100K|f64|F|pos | 0.0388 | 0.0227 | 0.58 🟡 |
| 100K|u32|strided|pos | 0.0180 | 0.0112 | 0.62 🟡 |
| 100K|f64|T|pos | 0.0364 | 0.0230 | 0.63 🟡 |
| 100K|u64|T|pos | 0.0358 | 0.0290 | 0.81 🟡 |
Elementwise (add/mul/neg/abs/sqrt/less/copy)
Geomean by lay
| size | C | F | T | strided | sliced | negrow | negcol | bcast |
|---|---|---|---|---|---|---|---|---|
| 100K | 0.54 🟡 | 0.75 🟡 | 0.68 🟡 | 0.53 🟡 | 0.84 🟡 | 0.81 🟡 | 1.16 ✅ | 0.84 🟡 |
| 1M | 1.57 ✅ | 1.54 ✅ | 1.53 ✅ | 1.15 ✅ | 1.66 ✅ | 1.65 ✅ | 1.80 ✅ | 1.67 ✅ |
Geomean by dt
| size | f64 | f32 | c128 | f16 | i32 | i64 |
|---|---|---|---|---|---|---|
| 100K | 0.58 🟡 | 0.53 🟡 | 1.18 ✅ | 0.75 🟡 | 0.79 🟡 | 0.81 🟡 |
| 1M | 1.91 ✅ | 1.59 ✅ | 1.74 ✅ | 0.96 🟡 | 1.55 ✅ | 1.82 ✅ |
Geomean by op
| size | add | mul | neg | abs | sqrt | less | copy |
|---|---|---|---|---|---|---|---|
| 100K | 0.97 🟡 | 0.93 🟡 | 0.71 🟡 | 0.70 🟡 | 0.94 🟡 | 0.61 🟡 | 0.50 🟡 |
| 1M | 1.81 ✅ | 1.80 ✅ | 2.12 ✅ | 1.66 ✅ | 1.55 ✅ | 0.69 🟡 | 1.82 ✅ |
Worst 15 cells (NumSharp slowest vs NumPy)
| key | NumSharp ms | NumPy ms | ratio |
|---|---|---|---|
| 100K|f64|strided|abs | 0.0481 | 0.0075 | 0.16 🔴 |
| 100K|f64|C|copy | 0.0700 | 0.0112 | 0.16 🔴 |
| 100K|f64|strided|neg | 0.0502 | 0.0082 | 0.16 🔴 |
| 100K|f64|C|abs | 0.0653 | 0.0113 | 0.17 🔴 |
| 100K|f32|C|abs | 0.0310 | 0.0056 | 0.18 🔴 |
| 100K|f16|C|copy | 0.0169 | 0.0031 | 0.18 🔴 |
| 100K|f64|C|mul | 0.0690 | 0.0129 | 0.19 🔴 |
| 100K|f64|C|neg | 0.0649 | 0.0129 | 0.20 🔴 |
| 100K|f64|C|add | 0.0638 | 0.0130 | 0.20 🟠 |
| 100K|f16|negrow|copy | 0.0178 | 0.0038 | 0.21 🟠 |
| 100K|f32|C|mul | 0.0316 | 0.0069 | 0.22 🟠 |
| 100K|i32|bcast|copy | 0.0224 | 0.0050 | 0.22 🟠 |
| 100K|f32|C|add | 0.0311 | 0.0069 | 0.22 🟠 |
| 100K|f32|T|neg | 0.0270 | 0.0060 | 0.22 🟠 |
| 100K|f32|T|add | 0.0302 | 0.0068 | 0.23 🟠 |