Skip to content

feat[gpu]: widen decimals for Arrow device export#8155

Open
0ax1 wants to merge 1 commit into
developfrom
ad/arrow-device-decimal
Open

feat[gpu]: widen decimals for Arrow device export#8155
0ax1 wants to merge 1 commit into
developfrom
ad/arrow-device-decimal

Conversation

@0ax1
Copy link
Copy Markdown
Contributor

@0ax1 0ax1 commented May 29, 2026

Arrow Decimal128/Decimal256 schemas require fixed 16/32-byte value buffers, while Vortex decimals may use narrower storage. Add a CUDA widening kernel for Arrow Device export and cover compact, wide, nullable, and empty decimal cases.

@0ax1 0ax1 added the changelog/feature A new feature label May 29, 2026
@0ax1 0ax1 changed the title feat(cuda): widen decimals for Arrow device export feat[gpu]: widen decimals for Arrow device export May 29, 2026
@0ax1 0ax1 force-pushed the ad/arrow-device-decimal branch from 4d21576 to 03c8b95 Compare May 29, 2026 14:02
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 29, 2026

Merging this PR will not alter performance

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 1 improved benchmark
❌ 1 regressed benchmark
✅ 1264 untouched benchmarks

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation chunked_varbinview_opt_canonical_into[(1000, 10)] 225.4 µs 188.1 µs +19.84%
Simulation chunked_varbinview_canonical_into[(100, 100)] 273.1 µs 307.8 µs -11.27%

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.


Comparing ad/arrow-device-decimal (867a83f) with develop (c005aae)

Open in CodSpeed

@0ax1 0ax1 force-pushed the ad/arrow-device-decimal branch 2 times, most recently from 6681667 to b15cfc3 Compare May 29, 2026 14:54
@0ax1 0ax1 requested review from onursatici and robert3005 May 29, 2026 14:59
Arrow Decimal128/Decimal256 schemas require fixed 16/32-byte value buffers, while Vortex decimals may use narrower storage. Add a CUDA widening kernel for Arrow Device export and cover compact, wide, nullable, and empty decimal cases.

Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
@0ax1 0ax1 force-pushed the ad/arrow-device-decimal branch from b15cfc3 to 867a83f Compare May 29, 2026 15:03
@0ax1 0ax1 marked this pull request as ready for review May 29, 2026 15:13
@0ax1 0ax1 enabled auto-merge (squash) May 29, 2026 15:16
if constexpr (std::is_same_v<Input, int128_t>) {
return value;
} else if constexpr (std::is_same_v<Input, int256_t>) {
return int128_t {value.parts[0], value.parts[1]};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this end up truncating? I think it is currently possible in vortex to do this:

  let array = DecimalArray::from_iter(
      [i256::from_parts(0, 1)], // 2^128, so does not fit into i128
      DecimalDType::new(38, 0), // this is normally i128
  );

when exporting this array we would pick the i256 -> i128 kernel and truncate the values without checking for overflow.

probably the right fix is for vortex to reject constructing such arrays

@0ax1 0ax1 disabled auto-merge May 29, 2026 17:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/feature A new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants