NumSharp NDIter — canonical benchmark · 2026-06-23 · speedup = NumPy ÷ NumSharp (>1.0× = NumSharp faster)
198 measured pairs (35 NA) · best-of-rounds, Release · matched kernels/ids
%NumPy🕐 = NumSharp ÷ NumPy × 100 = share of NumPy's time NumSharp uses (8% = takes only 8% as long; <100% = faster)
AV POLICY — a NumSharp section that crashes all retries (known intermittent
AccessViolation, an unmanaged-storage lifetime bug) is reported NA / IGNORED
and excluded from every geomean below. THIS RUN: NA across selection.
HEADLINE — operation matrix: 1.18× geomean · 85%🕐 of NumPy's time · 72 win / 58 lose over 130 cells
OPERATIONS — BY SIZE TIER (geomean over all families)
slower ◄───────── 1.0 (parity) ─────────► faster
scalar ██████████▉ ........ 1.10× 91%🕐 ( 12 win / 14 lose)
1K ███████████▊ ....... 1.19× 84%🕐 ( 15 win / 11 lose)
100K ██████████▊ ........ 1.08× 93%🕐 ( 12 win / 14 lose)
1M █████████████ ...... 1.31× 77%🕐 ( 17 win / 9 lose)
10M ████████████▎ ...... 1.23× 81%🕐 ( 16 win / 10 lose)
ALL ███████████▊ ....... 1.18× 85%🕐 ( 72 win / 58 lose)
OPERATIONS — BY CATEGORY (geomean over its families, all sizes)
slower ◄───────── 1.0 (parity) ─────────► faster
elementwise███████████▊ ....... 1.18× 85%🕐 ( 28 win / 12 lose)
reductions █████████████████▍ 1.75× 57%🕐 ( 28 win / 12 lose)
selection (no data)
copy/cast ███████▎ ........... 0.73× 137%🕐 ( 8 win / 17 lose) ◄ SLOWER
index-math ███████▌ ........... 0.75× 133%🕐 ( 3 win / 7 lose) ◄ SLOWER
dtypes ████████████▏ ...... 1.22× 82%🕐 ( 5 win / 10 lose)
CATEGORY × TIER geomean
category scalar 1K 100K 1M 10M
elementwise 1.05× 1.54× 1.18× 1.09× 1.11×
reductions 2.67× 1.99× 1.51× 1.44× 1.42×
selection - - - - -
copy/cast 0.61× 0.59× 0.40× 1.39× 1.06×
index-math 0.32× 0.51× 0.97× 1.22× 1.22×
dtypes 0.71× 0.85× 1.97× 1.54× 1.47×
PER-FAMILY × TIER (NumPy ÷ NumSharp; >1.0 = NumSharp faster)
family scalar 1K 100K 1M 10M geomean
-- elementwise
add 1.01× 1.48× 1.03× 0.88× 1.01× 1.06×
sqrt 0.85× 1.15× 1.00× 1.01× 1.02× 1.00×
copy 0.88× 2.59× 1.78× 1.33× 1.72× 1.56×
strided 0.89× 1.12× 1.00× 1.02× 0.99× 1.00×
bcast 0.89× 1.13× 1.02× 0.98× 1.03× 1.01×
reversed 0.85× 1.28× 0.90× 0.99× 1.00× 0.99×
castbuf 1.98× 2.29× 1.65× 1.35× 1.16× 1.64×
mixbuf 1.49× 1.94× 1.40× 1.24× 1.09× 1.40×
-- reductions
sum 1.84× 1.78× 2.79× 2.21× 1.76× 2.04×
sum ax0 1.90× 0.86× 0.96× 1.00× 0.94× 1.08×
sum ax1 1.85× 0.86× 1.51× 1.83× 1.57× 1.47×
sum dt= 1.97× 1.47× 0.49× 0.47× 0.55× 0.82×
amin 1.70× 1.61× 0.71× 0.70× 0.82× 1.02×
cumsum 1.47× 1.09× 1.06× 1.80× 1.68× 1.39×
any(F) 8.89× 8.41× 2.12× 0.98× 1.00× 2.74×
any(hit) 9.01× 8.50× 8.50× 7.87× 8.22× 8.41×
-- selection
where NA NA NA NA NA
a[mask] NA NA NA NA NA
a[mask]= NA NA NA NA NA
count_nz NA NA NA NA NA
argwhere NA NA NA NA NA
a[idx] NA NA NA NA NA
a[idx]= NA NA NA NA NA
-- copy/cast
flatten 0.43× 0.44× 0.17× 2.17× 0.90× 0.57×
astype 0.30× 0.53× 0.59× 1.97× 1.90× 0.81×
ravel.T 0.45× 0.73× 0.48× 2.11× 1.01× 0.80×
in-place 1.77× 0.81× 0.81× 1.06× 1.02× 1.05×
less->b 0.81× 0.52× 0.26× 0.54× 0.76× 0.54×
-- index-math
unravel 0.33× 0.50× 0.95× 1.01× 0.97× 0.68×
ravel_mi 0.32× 0.52× 0.99× 1.49× 1.53× 0.82×
-- dtypes
complex 0.74× 0.63× 1.01× 0.76× 0.89× 0.80×
float16 0.72× 0.65× 0.62× 0.62× 0.62× 0.65×
int8 0.67× 1.47× 12.09× 7.70× 5.78× 3.51×
CONSTRUCTION — iterator build+dispose vs np.nditer (size-invariant, 1K)
slower ◄───────── 1.0 (parity) ─────────► faster
1op ██████████████████▋ 1.86× 54%🕐 ( 1 win / 0 lose)
3op_exl ███████████████████▶ 4.43× 23%🕐 ( 1 win / 0 lose)
ufunc ███████████████████▶ 4.98× 20%🕐 ( 1 win / 0 lose)
bufcast ███████████████████▶ 3.49× 29%🕐 ( 1 win / 0 lose)
multiindex ███████████████████▶ 2.56× 39%🕐 ( 1 win / 0 lose)
8op ███████████████████▶ 5.26× 19%🕐 ( 1 win / 0 lose)
4d ███████████████████▶ 2.94× 34%🕐 ( 1 win / 0 lose)
8d ███████████████████▶ 2.65× 38%🕐 ( 1 win / 0 lose)
strided2d ███████████████████▶ 3.35× 30%🕐 ( 1 win / 0 lose)
geomean ███████████████████▶ 3.33× 30%🕐 ( 9 win / 0 lose)
CHUNK-WIDTH dispatch — strided rows, 2M total, inner width w (NumPy = np.positive)
slower ◄───────── 1.0 (parity) ─────────► faster
w=4 ███████ ............ 0.71× 141%🕐 ( 0 win / 1 lose) ◄ SLOWER
w=16 ██████████▏ ........ 1.02× 98%🕐 ( 1 win / 0 lose) ◄ PARITY
w=64 ███████████▍ ....... 1.15× 87%🕐 ( 1 win / 0 lose)
w=256 █████████████▍ ..... 1.34× 75%🕐 ( 1 win / 0 lose)
w=1024 ███████████████ .... 1.51× 66%🕐 ( 1 win / 0 lose)
PATHOLOGY canaries — known taxes/losses to track (NumPy ÷ NumSharp)
bcast_reduce 538.56× (538.6× faster, faster)
allocate 1.10× (1.1× faster, faster)
overlap_copy 1.78× (1.8× faster, faster)
forder_out 1.28× (1.3× faster, faster)
zerodim 1.26× (1.3× faster, faster)
DIVIDENDS — NumSharp-only machinery (NumPy baseline = closest it can do)
scalar 1K 100K 1M 10M note
fuse7 12.65× 3.80× 1.39× 1.62× 2.01× vs chained 6× add
reuse 5.63× 5.30× 0.97× 1.04× 1.06× vs rebuild each call
par8 - 0.66× 2.70× 3.09× 4.25× vs single-thread
biggest NumSharp wins: i8@100K 12.09× · anyeh@1 9.01× · anyff@1 8.89× · anyeh@100K 8.50× · anyeh@1K 8.50×
most behind: flatten@100K 0.17× · lessbool@100K 0.26× · astype@1 0.30× · ravelmi@1 0.32× · unravel@1 0.33×