Skip to content

Lazy pagination iterator for catalog list methods#3416

Open
GayathriSrividya wants to merge 2 commits into
apache:mainfrom
GayathriSrividya:fix/issue-3365-lazy-pagination-iterator
Open

Lazy pagination iterator for catalog list methods#3416
GayathriSrividya wants to merge 2 commits into
apache:mainfrom
GayathriSrividya:fix/issue-3365-lazy-pagination-iterator

Conversation

@GayathriSrividya
Copy link
Copy Markdown

@GayathriSrividya GayathriSrividya commented May 26, 2026

Closes #3365

This PR builds on @jayceslesar's earlier work in #2172 and has been restructured to credit the original author.

Changes

Commit 1 — by @jayceslesar (rebased from #2172):

  • Changed abstract method signatures: list_tables, list_namespaces, list_views return types from list[Identifier] to Iterator[Identifier]
  • All non-REST catalogs (SQL, DynamoDB, Glue, Hive, BigQuery, NoOp) return iter() of their results
  • CLI and output updated to work with iterators
  • Tests updated accordingly

Commit 2 — by @GayathriSrividya:

  • REST catalog uses true generators with per-page HTTP fetch helpers (_fetch_tables_page, _fetch_namespaces_page, _fetch_views_page) decorated with @retry so auth retry logic works correctly per page

@Fokko
Copy link
Copy Markdown
Contributor

Fokko commented May 26, 2026

@GayathriSrividya To credit the original author, it would be good to cherry-pick his work

@GayathriSrividya GayathriSrividya force-pushed the fix/issue-3365-lazy-pagination-iterator branch from 7b6744e to 7e1538e Compare May 27, 2026 18:06
@GayathriSrividya
Copy link
Copy Markdown
Author

GayathriSrividya commented May 27, 2026

Addressed the CI failures with follow-up commits and pushed fixes to this PR branch.
All required checks are now passing on the latest commit (aa88377).
Ready for review. Thanks!

@rambleraptor
Copy link
Copy Markdown
Contributor

@GayathriSrividya was this based off of #2172? If it was, we should cherry-pick in the commits so that the original author shows up in this PR as well.

jayceslesar and others added 2 commits May 28, 2026 07:52
Replace the collect-then-return approach with proper generator functions
that yield results page by page. Extract per-page fetch logic into
dedicated helper methods (_fetch_tables_page, _fetch_views_page,
_fetch_namespaces_page) decorated with @Retry so authentication retries
work correctly per page.

Co-authored-by: Yuya Ebihara <ebyhry@gmail.com>
@GayathriSrividya GayathriSrividya force-pushed the fix/issue-3365-lazy-pagination-iterator branch from aa88377 to 54d8f64 Compare May 28, 2026 02:38
@GayathriSrividya
Copy link
Copy Markdown
Author

Hi @rambleraptor and @Fokko — yes, PR #3416 was based on @jayceslesar's work from #2172. I've restructured the branch to credit them properly:

  • Commit 1 (authored by @jayceslesar): Changes abstract method signatures and adds iter() wrapping for all non-REST catalogs (BigQuery, DynamoDB, Glue, Hive, NoOp, SQL)
  • Commit 2 (authored by me): Implements proper lazy pagination generators for the REST catalog with per-page @retry-decorated helpers

All checks are passing on the latest push. Ready for review. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement iterator to lazily go through the paged response

4 participants