Skip to content

Implement memory handling correctness fixes#2016

Draft
alcaeus wants to merge 13 commits intomongodb:v2.3from
alcaeus:minor-fixes
Draft

Implement memory handling correctness fixes#2016
alcaeus wants to merge 13 commits intomongodb:v2.3from
alcaeus:minor-fixes

Conversation

@alcaeus
Copy link
Copy Markdown
Member

@alcaeus alcaeus commented May 5, 2026

This PR introduces a bunch of defensive hardening and consistency fixes across the extension. Each commit is scoped to a single change to keep review and bisecting straightforward.

  • Manager cache key — store a SHA-256 digest of the URI rather than the raw connection string in phongo_manager_make_client_hash.
  • Int64 shift operators — match PHP's native integer-shift handlers: throw ArithmeticError on negative shift counts and return 0/-1 at or above the operand width.
  • BSON traversal — halt the document and array visitors cleanly when bsonUnserialize throws, instead of inserting a partially-constructed object into the parent.
  • BSON value encoder — propagate failure from phongo_zval_to_bson_value_ex and validate UTF-8 in the scalar path so it matches phongo_bson_append.
  • Field path helpers — guard phongo_field_path_pop against an empty path and fix the trailing-separator overwrite in phongo_field_path_as_string when all entries are NULL.
  • Binary / Regex / Javascript — free pre-existing buffers before re-init, surface scope-encoding errors from phongo_javascript_init, route phongo_regex_new through phongo_regex_init so BSON-decoded Regex instances pick up the same flag canonicalisation as PHP-constructed ones.
  • Type cleanups — use size_t for BSON data lengths and for phongo_regex_t length fields; drop a tautological sparsity > INT64_MAX check in ClientEncryption.

alcaeus added 13 commits May 5, 2026 16:08
Add an early return in phongo_field_path_pop when the path has no
elements to decrement. The function previously assumed callers
maintained push/pop balance, which makes the helper fragile under
future error-recovery edits.
ZEND_SL and ZEND_SR previously delegated directly to the C shift
operators without bounding the shift amount. Match the behaviour of
PHP's native integer shift handlers: throw ArithmeticError on negative
shift counts, return 0 (or -1 for arithmetic right-shift of a negative
value) when the count is at or above the operand width, and use an
unsigned cast for the left-shift to avoid relying on signed shift
semantics.
Have the helper return whether the conversion populated the destination
bson_value_t and forward that result through phongo_zval_to_bson_value.
The IS_ARRAY/IS_OBJECT branch previously returned true unconditionally,
which is misleading when an exception was thrown inside the inner
encoder.
The document and array visitors call into userland bsonUnserialize() but
did not consult EG(exception) before inserting the resulting object
into the parent container. When bsonUnserialize() throws, abort the
traversal cleanly and propagate the exception instead of attaching a
partially-constructed object to the parent.
The cache key produced by phongo_manager_make_client_hash previously
contained the raw connection string. Replace it with a SHA-1 digest of
the URI so the key remains a stable per-URI identifier without
embedding the connection string verbatim into long-lived process
storage.
When all entries in field_path->elements are NULL, the loop appends
nothing and the trailing '.' overwrite would land before the start of
the allocated buffer. Branch on whether the loop wrote anything before
trimming the trailing separator.
The condition sparsity > INT64_MAX is always false because sparsity is
declared as int64_t. Remove the dead branch and keep the meaningful
sparsity < 0 guard.
Change phongo_bson_data_to_zval and phongo_bson_data_to_zval_ex to
accept size_t lengths instead of int, matching the unsigned width
expected by libbson's reader API. Existing callers pass uint32_t from
bson_iter_document, so the widening is implicit.
Bring phongo_regex_t in line with the other string-carrying structs in
this file, which already use size_t for length fields. Removes a
silent narrowing of the size_t parameter accepted by phongo_regex_init.
phongo_regex_new (used when decoding a Regex from BSON) previously
copied the pattern and flags directly without sorting the flags
alphabetically, while phongo_regex_init does. This caused two Regex
instances representing the same pattern and flags to compare unequal
depending on which path constructed them. Defer to phongo_regex_init
to keep the canonicalisation in one place.
The IS_STRING branch wrote the PHP string verbatim into the
bson_value_t without checking that it was valid UTF-8, while the
phongo_bson_append document-encoding path already validates. Match the
existing pattern so that invalid UTF-8 is rejected with an exception
in both code paths.
When phongo_zval_to_bson throws while encoding the scope, release the
already-allocated code buffer and the partially-written scope BSON,
and report failure to the caller. Previously the function returned
true even though intern was left in an inconsistent state.
phongo_binary_init, phongo_regex_init, and phongo_javascript_init
overwrote heap-owned struct members without freeing what was already
there. Re-init the slots cleanly so a second call (e.g. from a
subclass that invokes parent::__unserialize twice) does not leak the
prior allocation. As part of this, hoist the regex flags null-byte
check above the pattern allocation so a flag rejection can no longer
leave a stray pattern buffer behind.
Copilot AI review requested due to automatic review settings May 5, 2026 15:02
@alcaeus alcaeus requested a review from a team as a code owner May 5, 2026 15:03
@alcaeus alcaeus requested review from paulinevos and removed request for a team May 5, 2026 15:03
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens several low-level correctness paths in the PHP MongoDB extension, mainly around BSON decoding/encoding, BSON type initialization, and manager client hashing. It fits into the extension’s core runtime layer by tightening error propagation, reducing unsafe reinitialization behavior, and normalizing a few internal type/length handling paths.

Changes:

  • Reworks BSON traversal/encoding paths to better propagate failures, validate UTF-8 consistently, and fix field-path edge cases.
  • Adjusts BSON value object initialization for Binary, Regex, Javascript, and Int64 to improve reinit behavior and operator correctness.
  • Changes manager client hashing to use a SHA-256 digest of the URI and applies a few internal type cleanups.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/phongo_structs.h Updates internal Regex length fields to size_t.
src/phongo_client.c Switches manager client hash input from raw URI text to a SHA-256 digest.
src/phongo_bson.h Changes BSON data decode length parameters from int to size_t.
src/phongo_bson.c Fixes field-path edge cases and stops nested BSON traversal cleanly on bsonUnserialize exceptions.
src/phongo_bson_encode.c Makes BSON value encoding return failures and validates UTF-8 for scalar strings.
src/MongoDB/ClientEncryption.c Removes a redundant sparsity upper-bound check.
src/BSON/Regex.c Reuses common Regex init logic and frees prior buffers before reinit.
src/BSON/Javascript.c Frees prior code/scope before reinit and propagates scope-encoding failures.
src/BSON/Int64.c Aligns shift operator behavior with PHP’s arithmetic/width rules.
src/BSON/Binary.c Frees prior binary data before reinitialization.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/BSON/Javascript.c
Comment on lines +58 to +63
if (EG(exception)) {
efree(intern->code);
intern->code = NULL;
bson_destroy(intern->scope);
intern->scope = NULL;
return false;
Comment thread src/phongo_client.c

#include <php.h>
#include <ext/hash/php_hash.h>
#include <ext/hash/php_hash_sha.h>
Comment thread src/BSON/Int64.c
Comment on lines +446 to +450
if (value2 < 0) {
zend_throw_exception(zend_ce_arithmetic_error, "Bit shift by negative number", 0);
return FAILURE;
}
OPERATION_RESULT_INT64(value2 >= 64 ? 0 : (int64_t) ((uint64_t) value1 << value2));
Comment thread src/phongo_bson_encode.c
Comment on lines +646 to +649
if (!bson_utf8_validate(Z_STRVAL_P(data), Z_STRLEN_P(data), true)) {
phongo_throw_exception(PHONGO_ERROR_UNEXPECTED_VALUE, "Detected invalid UTF-8 in string value");
return false;
}
@alcaeus alcaeus marked this pull request as draft May 5, 2026 15:20
Comment thread src/phongo_bson.c
Comment on lines +208 to +210
if (field_path->size == 0) {
return false;
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be covered by a test?

Comment thread src/BSON/Int64.c
PHONGO_GET_INT64(value2, op2);
OPERATION_RESULT_INT64(value1 << value2);
if (value2 < 0) {
zend_throw_exception(zend_ce_arithmetic_error, "Bit shift by negative number", 0);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This exception can also be tested.

Comment thread src/phongo_client.c

zval args;

PHP_SHA256Init(&sha_ctx);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confirm sh256 is the best algo for this use-case: crypto-safe for credentials and fast in the execution hot path (Hardware Acceleration).

Comment thread src/phongo_client.c
PHP_SHA256Init(&sha_ctx);
PHP_SHA256Update(&sha_ctx, (const unsigned char*) uri_string, strlen(uri_string));
PHP_SHA256Final(sha_digest, &sha_ctx);
php_hash_bin2hex(sha_hex, sha_digest, sizeof(sha_digest));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need to convert to hexadecimal? Is it displayed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants