feat(storage): log additional bytes received from GCS in read path#16152
feat(storage): log additional bytes received from GCS in read path#16152nidhiii-27 wants to merge 1 commit into
Conversation
When the GCS client requests a specific range of bytes but the transport provides more bytes than expected, we now emit a warning log. This matches the behavior of other GCS client libraries and aids in debugging. Decompressed objects are ignored because their wire size varies from the requested size. Fixes: #475824752 [Generated-by: AI]
There was a problem hiding this comment.
Code Review
This pull request introduces tracking of the total bytes received during GCS object reads in ObjectReadStreambuf to log a warning if more bytes are received than requested. The review feedback highlights three main areas for improvement: first, total_bytes_received_ is only updated in xsgetn() and must also be updated in underflow() to prevent under-counting; second, a defensive check is needed to ensure range.end > range.begin to avoid incorrect negative expected byte calculations; and finally, for consistency, the member variable transformation_ should be accessed directly instead of using the transformation() accessor.
| if (!size_) size_ = std::move(read->size); | ||
| if (!transformation_) transformation_ = std::move(read->transformation); | ||
|
|
||
| total_bytes_received_ += read->bytes_received; |
There was a problem hiding this comment.
The total_bytes_received_ counter is only updated here in xsgetn(). However, ObjectReadStreambuf also overrides underflow() to handle buffered/single-character reads (e.g., via std::getline or operator>>). If a user reads from the stream using those methods, underflow() will read from the source but total_bytes_received_ will not be incremented, leading to an under-count and missed warnings.
Please ensure total_bytes_received_ is also updated in underflow(), or refactor the byte-tracking logic into a common helper method called by both xsgetn() and underflow().
| if (request.HasOption<ReadRange>()) { | ||
| auto const& range = request.GetOption<ReadRange>().value(); | ||
| return range.end - range.begin; | ||
| } |
There was a problem hiding this comment.
If ReadRange is configured such that range.end <= range.begin (or if it represents an open-ended range where end is not set/defaults to 0), range.end - range.begin will be negative or zero. This would cause total_bytes_received_ > *expected_bytes_ to evaluate to true incorrectly, triggering false warning logs.
We should defensively check that range.end > range.begin before returning the difference.
if (request.HasOption<ReadRange>()) {
auto const& range = request.GetOption<ReadRange>().value();
if (range.end > range.begin) {
return range.end - range.begin;
}
return absl::nullopt;
}References
- Prefer defensive code, such as explicit checks, even if they seem redundant based on the current implementation of a framework, as the framework's contract may change in the future.
|
|
||
| void ObjectReadStreambuf::Close() { | ||
| if (expected_bytes_.has_value() && total_bytes_received_ > *expected_bytes_) { | ||
| if (!transformation().has_value()) { |
There was a problem hiding this comment.
For consistency with how other member variables (like expected_bytes_, total_bytes_received_, bucket_name_, and object_name_) are accessed in this function and in xsgetn(), we should access the member variable transformation_ directly instead of calling the accessor method transformation().
if (!transformation_.has_value()) {|
Addressed all PR review comments. |
When the GCS client requests a specific range of bytes but the transport provides more bytes than expected, we now emit a warning log. This matches the behavior of other GCS client libraries and aids in debugging. Decompressed objects are ignored because their wire size varies from the requested size.
Fixes: #475824752
[Generated-by: AI]