fix(workflow-core): paginate S3 deleteDirectory deletions#5569
Conversation
deleteDirectory issued a single listObjectsV2 call (capped at 1000 keys per page) and one deleteObjects batch, so any objects beyond the first 1000 under a prefix were silently orphaned. AWS DeleteObjects also accepts at most 1000 keys per request, and reports per-key failures in its response rather than throwing. List via listObjectsV2Paginator, which transparently follows the continuation token across all pages, deleting keys in batches of at most 1000. Inspect each DeleteObjects response and raise if any key failed, so partial deletions are no longer swallowed. Also remove the unused MD5 computation (digested over the request's toString and never used). Closes apache#5281
1b4b0c3 to
e3deaba
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #5569 +/- ##
============================================
- Coverage 52.42% 52.10% -0.32%
Complexity 2481 2481
============================================
Files 1070 1067 -3
Lines 41359 41262 -97
Branches 4441 4439 -2
============================================
- Hits 21682 21500 -182
- Misses 18406 18494 +88
+ Partials 1271 1268 -3
*This pull request uses carry forward flags. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
…ry docs - Cap throwOnDeleteErrors enumeration at MAX_LISTED_DELETE_ERRORS (10) and summarize the remainder, so a fully-failed batch can't produce an unbounded message. - Update the LargeBinaryManager doc that still described the pre-pagination single-page (<=1000) behavior of deleteDirectory.
|
/request-review @Xiao-zhen-Liu |
|
@xuang7 I think this may need to be back ported to v1.2 |
|
@kunwp1 if you agree this needs backport, it may need to be manually backported as there are commits missing between two branches. |
|
Needs to be backported because it's a bug fix. |
Sounds good! I'll create a PR that includes both #5249 and #5280 |
What changes were proposed in this PR?
S3StorageClient.deleteDirectorylisted objects with a singlelistObjectsV2call and issued onedeleteObjectsbatch. Both S3 APIs cap at 1000 keys per call, so for any prefix holding more than 1000 objects only the first 1000 were deleted and the rest causes a storage leak. This affects dataset deletion (DatasetResource) and per-execution cleanup (LargeBinaryManager), either of which can exceed 1000 objects under one prefix.This PR:
listObjectsV2Paginator, which follows the continuation token across all pages, and deletes in batches of at most 1000 keys. Keys are streamed so memory stays bounded to a single batch.DeleteObjectsresponse and throws if any key failed.Any related issues, documentation, discussions?
Closes #5281
How was this PR tested?
for i in {1..1100}; do printf 'x' > "file_$i.txt"; doneWas this PR authored or co-authored using generative AI tooling?
Generated-by: Claude Code (Claude Opus 4.8)