GH-50051: [C++][Parquet] Avoid size overflow in WKBBuffer::ReadCoords#50036
GH-50051: [C++][Parquet] Avoid size overflow in WKBBuffer::ReadCoords#50036jmestwa-coder wants to merge 2 commits into
Conversation
|
|
Instead of fixing this one by one, could you figure out all similar spots with the help of coding agents? |
b008a04 to
a4bbb84
Compare
|
I opened #50051, restored the PR template, and audited the related WKB parsing paths in During that audit, I did not identify additional high-confidence cases beyond the |
|
|
wgtmac
left a comment
There was a problem hiding this comment.
Thank you for creating the issue! I still have one minor question.
| template <typename Coord, typename Visit> | ||
| void ReadCoords(uint32_t n_coords, bool swap, Visit&& visit) { | ||
| size_t total_bytes = n_coords * sizeof(Coord); | ||
| uint64_t total_bytes = static_cast<uint64_t>(n_coords) * sizeof(Coord); |
There was a problem hiding this comment.
Should we use MultiplyWithOverflow instead? Current multiplication can still overflow in some extreme cases.
There was a problem hiding this comment.
Done, switched to MultiplyWithOverflow on uint64_t so the overflow path is rejected explicitly.
There was a problem hiding this comment.
Pull request overview
Hardens Parquet’s geospatial WKB parsing on 32-bit targets by preventing overflow in coordinate sequence size calculations prior to bounds checks, ensuring malformed WKB cannot bypass validation.
Changes:
- Widen
n_coords * sizeof(Coord)computation inWKBBuffer::ReadCoords()touint64_tbefore comparing against remaining buffer size. - Add a regression test covering an overflow-sensitive coordinate count that would truncate on 32-bit builds.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
cpp/src/parquet/geospatial/util_internal.cc |
Uses widened arithmetic for coordinate sequence byte-size calculation to prevent overflow-before-bounds-check. |
cpp/src/parquet/geospatial/util_internal_test.cc |
Adds a malformed WKB test case that exercises the overflow scenario and expects rejection. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Could you please fix the linter error? |
Rationale for this change
WKBBuffer::ReadCoords()computes the coordinate sequence byte size usingn_coords * sizeof(Coord)before validating the remaining buffer size.On 32-bit targets such as wasm32, this multiplication can overflow when
n_coordsis derived from malformed WKB input, causing the bounds check to validate a truncated value before the subsequent coordinate read loop processes the coordinate sequence.This change widens the computation before validation so the bounds check remains correct across supported architectures.
Related issue: #50051
What changes are included in this PR?
WKBBuffer::ReadCoords()Are these changes tested?
Yes.
Added coverage for malformed WKB inputs that exercise overflow-sensitive coordinate size calculations and verified the parser rejects invalid input safely.
Are there any user-facing changes?
No.
This PR contains a "Critical Fix".
Malformed WKB input could cause coordinate sequence size calculations to overflow on 32-bit targets before bounds validation occurs, potentially resulting in out-of-bounds reads during parsing.
This change ensures bounds validation uses widened arithmetic before parsing coordinate sequences.