Use HeadObject instead of GetObject for S3 metadata#565
Conversation
pjbull
left a comment
There was a problem hiding this comment.
Generally looking good, a few quick comments in there.
Did you test against your s3 compatible server and real s3?
Also, did you make sure linting passes?
| data = self.client.head_object( | ||
| Bucket=cloud_path.bucket, | ||
| Key=cloud_path.key, | ||
| **self.boto3_dl_extra_args, |
There was a problem hiding this comment.
Does head_object respect what we collect in these kwargs?
There was a problem hiding this comment.
Yes, checked all 7 (see https://docs.aws.amazon.com/boto3/latest/reference/services/s3/client/head_object.html)
|
|
||
| ## UNRELEASED | ||
|
|
||
| - Changed `S3Client._get_metadata` to read object metadata with `HeadObject` instead of `GetObject`, so `stat`, `etag`, and `size` no longer open the object body. Also fixes a `KeyError` on `ContentLength` against S3-compatible gateways that drop `Content-Length` from `GetObject` responses. (Issue [#564](https://github.com/drivendataorg/cloudpathlib/issues/564), PR [#TODO]) |
|
Yes, linting passed and checked on real s3 bucket and s3 compatible server (lakefs). |
|
Code quality errors are unrelated to my changes I think. |
|
Looks like linter package version updates. I'll get it fixed on main so you can rebase. |
|
@mattijsdp fixed on the main branch if you want to rebase this. Thanks for bearing with. |
S3Client._get_metadata read object metadata via ObjectSummary.get() (a GetObject), which issues a full object read and leaves an unread response stream instead of returning the connection to the pool. Switch to head_object, which reads only the metadata and keeps Content-Length on S3-compatible gateways that drop it from GetObject responses, while bringing S3 in line with the other backends. Since head_object returns no body, a missing key surfaces as a generic 404 ClientError rather than NoSuchKey, so S3Path.stat now translates a 404 to NoStatError and lets other errors (e.g. 403) propagate. Updates the S3 test mock and adds regression tests. Closes drivendataorg#564 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
edce3fa to
8f0a536
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #565 +/- ##
========================================
- Coverage 93.9% 93.2% -0.8%
========================================
Files 28 28
Lines 2222 2226 +4
========================================
- Hits 2088 2075 -13
- Misses 134 151 +17
🚀 New features to boost your workflow:
|
|
live tests expected to fail on fork, moving to repo-local branch to test real backends. |
* Use HeadObject instead of GetObject for S3 metadata S3Client._get_metadata read object metadata via ObjectSummary.get() (a GetObject), which issues a full object read and leaves an unread response stream instead of returning the connection to the pool. Switch to head_object, which reads only the metadata and keeps Content-Length on S3-compatible gateways that drop it from GetObject responses, while bringing S3 in line with the other backends. Since head_object returns no body, a missing key surfaces as a generic 404 ClientError rather than NoSuchKey, so S3Path.stat now translates a 404 to NoStatError and lets other errors (e.g. 403) propagate. Updates the S3 test mock and adds regression tests. Closes #564 * add PR number --------- Co-authored-by: Mattijs De Paepe <33912270+mattijsdp@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Closes #564
S3Client._get_metadata read object metadata via ObjectSummary.get() (a GetObject), which issues a full object read and leaves an unread response stream instead of returning the connection to the pool. Switch to head_object, which reads only the metadata and keeps Content-Length on S3-compatible gateways that drop it from GetObject responses, while bringing S3 in line with the other backends.
Since head_object returns no body, a missing key surfaces as a generic 404 ClientError rather than NoSuchKey, so S3Path.stat now translates a 404 to NoStatError and lets other errors (e.g. 403) propagate. Updates the S3 test mock and adds regression tests.
Contributor checklist:
CONTRIBUTING.mdCloses #issueappears in the PR summary (e.g.,Closes #123).HISTORY.mdwith the issue that is addressed and the PR you are submitting. If the top section is not `## UNRELEASED``, then you need to add a new section to the top of the document for your change.