Skip to content

Fix ice-disk table scans#491

Open
aheev wants to merge 6 commits into
LadybugDB:mainfrom
aheev:fix-icedisk-scans
Open

Fix ice-disk table scans#491
aheev wants to merge 6 commits into
LadybugDB:mainfrom
aheev:fix-icedisk-scans

Conversation

@aheev
Copy link
Copy Markdown
Contributor

@aheev aheev commented May 15, 2026

  • fixed nodeID offset in node table scan by calculating calc global row offset using parquet metadata
  • fixed early break issue in rel table scans by refactoring ice-disk internal scan to full table based rather than rowGroup based
  • enumized STORAGE_FORMAT

context: #476 (review)

@aheev
Copy link
Copy Markdown
Contributor Author

aheev commented May 15, 2026

@adsharma could you PTAL?

Re: duplicate boundNodes in unordered_map

Two cases:

 1. Source mode (MATCH (a:user)-[:follows]->(b) — direct node scan child): fetchNextBoundNodeBatch generates unique sequential offsets [nextOffset, nextOffset+N). No duplicates by construction.
 2. Non-source mode (multi-hop (a)-[r1]->(b)-[r2]->(c)):
 - r1's scan processes one source node a at a time (the break when boundOffset != activeBoundOffset)
 - So each call to r1.getNextTuple produces neighbors of exactly one a
 - A single source node's neighbor list has no duplicates in a well-formed CSR file
 - IceDisk node table emits 1 node per scan call. Even if it emits in a batch they would be distinct
 - Therefore r2's bound node vector always has distinct b values in each batch

@aheev
Copy link
Copy Markdown
Contributor Author

aheev commented May 15, 2026

dataset PR: LadybugDB/dataset#3

@aheev
Copy link
Copy Markdown
Contributor Author

aheev commented May 16, 2026

@adsharma should we add a get_icebug_disk_supported_version CALL?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant