Skip to content

GH-50051: [C++][Parquet] Avoid size overflow in WKBBuffer::ReadCoords#50036

Open
jmestwa-coder wants to merge 2 commits into
apache:mainfrom
jmestwa-coder:wkb-readcoords-size-overflow
Open

GH-50051: [C++][Parquet] Avoid size overflow in WKBBuffer::ReadCoords#50036
jmestwa-coder wants to merge 2 commits into
apache:mainfrom
jmestwa-coder:wkb-readcoords-size-overflow

Conversation

@jmestwa-coder
Copy link
Copy Markdown

@jmestwa-coder jmestwa-coder commented May 25, 2026

Rationale for this change

WKBBuffer::ReadCoords() computes the coordinate sequence byte size using n_coords * sizeof(Coord) before validating the remaining buffer size.

On 32-bit targets such as wasm32, this multiplication can overflow when n_coords is derived from malformed WKB input, causing the bounds check to validate a truncated value before the subsequent coordinate read loop processes the coordinate sequence.

This change widens the computation before validation so the bounds check remains correct across supported architectures.

Related issue: #50051

What changes are included in this PR?

  • Widen the coordinate sequence byte-size computation in WKBBuffer::ReadCoords()
  • Prevent overflow-before-bounds-check behavior on 32-bit targets
  • Audit and harden closely related WKB parsing size computations with the same externally controlled arithmetic pattern

Are these changes tested?

Yes.

Added coverage for malformed WKB inputs that exercise overflow-sensitive coordinate size calculations and verified the parser rejects invalid input safely.

Are there any user-facing changes?

No.

This PR contains a "Critical Fix".

Malformed WKB input could cause coordinate sequence size calculations to overflow on 32-bit targets before bounds validation occurs, potentially resulting in out-of-bounds reads during parsing.

This change ensures bounds validation uses widened arithmetic before parsing coordinate sequences.

@kou
Copy link
Copy Markdown
Member

kou commented May 25, 2026

@wgtmac
Copy link
Copy Markdown
Member

wgtmac commented May 26, 2026

Instead of fixing this one by one, could you figure out all similar spots with the help of coding agents?

@jmestwa-coder jmestwa-coder force-pushed the wkb-readcoords-size-overflow branch from b008a04 to a4bbb84 Compare May 27, 2026 12:56
@jmestwa-coder
Copy link
Copy Markdown
Author

@wgtmac @kou

I opened #50051, restored the PR template, and audited the related WKB parsing paths in cpp/src/parquet/geospatial for similar overflow-before-bounds-check patterns on 32-bit targets.

During that audit, I did not identify additional high-confidence cases beyond the ReadCoords() computation addressed in this PR. I also added focused regression coverage for the malformed-input scenario exercised by this fix.

@wgtmac wgtmac changed the title MINOR: [C++][Parquet] Avoid size overflow in WKBBuffer::ReadCoords GH-50051: [C++][Parquet] Avoid size overflow in WKBBuffer::ReadCoords May 28, 2026
@github-actions
Copy link
Copy Markdown

⚠️ GitHub issue #50051 has been automatically assigned in GitHub to PR creator.

Copy link
Copy Markdown
Member

@wgtmac wgtmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for creating the issue! I still have one minor question.

template <typename Coord, typename Visit>
void ReadCoords(uint32_t n_coords, bool swap, Visit&& visit) {
size_t total_bytes = n_coords * sizeof(Coord);
uint64_t total_bytes = static_cast<uint64_t>(n_coords) * sizeof(Coord);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use MultiplyWithOverflow instead? Current multiplication can still overflow in some extreme cases.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, switched to MultiplyWithOverflow on uint64_t so the overflow path is rejected explicitly.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Hardens Parquet’s geospatial WKB parsing on 32-bit targets by preventing overflow in coordinate sequence size calculations prior to bounds checks, ensuring malformed WKB cannot bypass validation.

Changes:

  • Widen n_coords * sizeof(Coord) computation in WKBBuffer::ReadCoords() to uint64_t before comparing against remaining buffer size.
  • Add a regression test covering an overflow-sensitive coordinate count that would truncate on 32-bit builds.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
cpp/src/parquet/geospatial/util_internal.cc Uses widened arithmetic for coordinate sequence byte-size calculation to prevent overflow-before-bounds-check.
cpp/src/parquet/geospatial/util_internal_test.cc Adds a malformed WKB test case that exercises the overflow scenario and expects rejection.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Member

@wgtmac wgtmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @jmestwa-coder!

@github-actions github-actions Bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels May 28, 2026
@wgtmac
Copy link
Copy Markdown
Member

wgtmac commented May 28, 2026

Could you please fix the linter error?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants