Add xsimd::get<>() for optimized compile-time element extraction by DiamonDinoia · Pull Request #1294 · xtensor-stack/xsimd

DiamonDinoia · 2026-04-09T23:17:14Z

Add a free function xsimd::get(batch) API mirroring std::get(tuple) for fast compile-time element extraction from SIMD batches.

Per-architecture optimized kernel::get overloads using the fastest available intrinsics:

SSE2: shuffle/shift + scalar convert

SSE4.1: pextrd/pextrq/pextrb/pextrw, bitcast + pextrd for float

AVX: vextractf128/vextracti128 + SSE4.1 delegate

AVX-512: vextracti64x4/vextractf32x4 + AVX delegate

NEON: vgetq_lane_* (single instruction for all types)

NEON64: vgetq_lane_f64

Also fixes a latent bug in the common fallback for complex batch compile-time get (wrong buffer type).

Add a free function xsimd::get(batch) API mirroring std::get(tuple) for fast compile-time element extraction from SIMD batches. Per-architecture optimized kernel::get overloads using the fastest available intrinsics: - SSE2: shuffle/shift + scalar convert - SSE4.1: pextrd/pextrq/pextrb/pextrw, bitcast + pextrd for float - AVX: vextractf128/vextracti128 + SSE4.1 delegate - AVX-512: vextracti64x4/vextractf32x4 + AVX delegate - NEON: vgetq_lane_* (single instruction for all types) - NEON64: vgetq_lane_f64 Also fixes a latent bug in the common fallback for complex batch compile-time get (wrong buffer type).

RVV only had runtime get(batch, size_t, requires_arch<rvv>) which became ambiguous with the new compile-time get(batch, index, requires_arch<common>) because index (std::integral_constant) implicitly converts to size_t. Add index overloads that delegate to the runtime versions, matching the pattern used by SSE/AVX/NEON.

DiamonDinoia · 2026-04-14T17:27:01Z

Nice thanks for fixing CI!

This is ready for review. Once approved I will rewrite the history. I don't want to trigger a useless CI run.

DiamonDinoia force-pushed the feat/optimize-elem-extraction branch from b7725d8 to 0b6d85f Compare April 13, 2026 15:40

DiamonDinoia force-pushed the feat/optimize-elem-extraction branch from 0b6d85f to c6dd311 Compare April 14, 2026 14:38

DiamonDinoia marked this pull request as ready for review April 14, 2026 17:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add xsimd::get<>() for optimized compile-time element extraction#1294

Add xsimd::get<>() for optimized compile-time element extraction#1294
DiamonDinoia wants to merge 2 commits intoxtensor-stack:masterfrom
DiamonDinoia:feat/optimize-elem-extraction

DiamonDinoia commented Apr 9, 2026

Uh oh!

DiamonDinoia commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

DiamonDinoia commented Apr 9, 2026

Uh oh!

DiamonDinoia commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant