[C++][Acero] Swiss table still has risks of overflow #45506

zanmato1984 · 2025-02-12T06:41:51Z

Describe the enhancement requested

After fixing #44513 and #45334, I kept looking for possible overflow risks in our Swiss join implementation. And my finding follows.

Background

In the Swiss table, a "block" consists of 8 keys (rows). When the number of rows is large enough, a block occupies 40 bytes, aka. num_block_bytes: 4 bytes for each key and one 8 bytes header. Blocks are stored continuously in a buffer namely uint8_t * blocks_. So locating the address of block_id-th block requires indexing like:

blocks_ + num_block_bytes * block_id

Risks

The limit of number of rows in Swiss table is 2^32. So we can have 2^32 / 8 blocks at most, therefore the block_id is normally represented using uint32_t. The num_block_bytes is represented using regular int. If no explicit type promotion is conducted, num_block_bytes * block_id will perform 32-bit multiplication and overflow may happen (2^32 / 8 * 40 > 2^32).

In our code base, there are places where such calculations are done with promoting to 64-bit multiplication so overflow is avoided, to name a few:

arrow/cpp/src/arrow/compute/key_map_internal.cc

Lines 262 to 263 in e79d60d

    
           const uint8_t* blockbase = 
        
               blocks_->data() + static_cast<uint64_t>(iblock) * num_block_bytes;

arrow/cpp/src/arrow/compute/key_map_internal.cc

Lines 408 to 409 in e79d60d

    
           const uint64_t num_block_bytes = (8 + num_groupid_bits); 
        
           blockbase = blocks_->mutable_data() + num_block_bytes * (start_slot_id >> 3);

However requiring such explicit type promotion is error-prone.

What may cause real trouble is where such calculations are still in 32-bit, there is one:

arrow/cpp/src/arrow/compute/key_map_internal.cc

Lines 226 to 227 in e79d60d

    
           block = *reinterpret_cast<const uint64_t*>(blocks_->mutable_data() + 
        
                                                      num_block_bytes * iblock);

(I wish I could come up with a concrete test case that such overflow results in wrong data - it's possible. But it's non-trivial and wouldn't be practical to run in limited resources.)

Given that such code are either correct but error-prone, or possible for real overflow, and once issues happen I can't imagine how painful the debugging will be, we should refactor them in a more overflow-safe fashion.

Component(s)

C++

The text was updated successfully, but these errors were encountered:

### Rationale for this change See #45506. ### What changes are included in this PR? 1. Abstract current overflow-prone block data access into functions that do proper type promotion to avoid overflow. Also remove the old block base address accessor. 2. Unify the data types used for various concepts as they naturally are (i.e., w/o explicit promotion): `uint32_t` for `block_id`, `int` for `num_xxx_bits/bytes`, `uint32_t` for `group_id`, `int` for `local_slot_id` and `uint32_t` for `global_slot_id`. 3. Abstract several constants and utility functions for readability and maintainability. ### Are these changes tested? Existing tests should suffice. It is really hard (gosh I did try) to create a concrete test case that fails w/o this change and passes w/ this change. ### Are there any user-facing changes? None. * GitHub Issue: #45506 Authored-by: Rossi Sun <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>

pitrou · 2025-02-17T09:33:06Z

Issue resolved by pull request 45515
#45515

zanmato1984 added the Type: enhancement label Feb 12, 2025

github-actions bot added the Component: C++ label Feb 12, 2025

zanmato1984 self-assigned this Feb 12, 2025

zanmato1984 mentioned this issue Feb 12, 2025

GH-45506: [C++][Acero] More overflow-safe Swiss table #45515

Merged

pitrou added this to the 20.0.0 milestone Feb 17, 2025

pitrou closed this as completed Feb 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[C++][Acero] Swiss table still has risks of overflow #45506

[C++][Acero] Swiss table still has risks of overflow #45506

zanmato1984 commented Feb 12, 2025

pitrou commented Feb 17, 2025

[C++][Acero] Swiss table still has risks of overflow #45506

[C++][Acero] Swiss table still has risks of overflow #45506

Comments

zanmato1984 commented Feb 12, 2025

Describe the enhancement requested

Background

Risks

Component(s)

pitrou commented Feb 17, 2025