Specialize SQLGraph.from_other for SQLite-to-SQLite (closes #285) by JoOkuma · Pull Request #286 · royerlab/tracksdata

JoOkuma · 2026-05-01T16:59:53Z

Closes #285.

BaseGraph.from_other ends with other.overlaps(), which on a SQL-rooted GraphView issues a single IN (...) containing every selected node id and overflows SQLite's variable limit.

This PR adds a SQL-level fast path on SQLGraph.from_other: when both ends are on-disk SQLite, the source's schema and rows are dumped straight into the destination via ATTACH DATABASE + replayed CREATE TABLE/INDEX DDL + INSERT INTO ... SELECT, then the destination is opened from the populated file. Filtered selections (GraphView) are joined against a temp table to sidestep the bound-parameter limit. Other configurations fall back to the generic path.

Separately, SQLGraph.overlaps(node_ids) is now chunked via the existing _chunked_sa_read helper so direct callers don't hit the same limit.

Tests cover the fast path, the issue reproducer (filter → subgraph → from_other with > 999 nodes + overlaps), full-copy id preservation, and fallbacks for :memory: and non-SQL sources.

@JoOkuma

) The generic BaseGraph.from_other materializes the source graph in Python and then calls other.overlaps(), which on a SQLGraph-backed GraphView issues a single 'IN (...)' clause containing every selected node id. With non-trivial selections this hits SQLite's SQLITE_MAX_VARIABLE_NUMBER and raises 'too many SQL variables'. Per @JoOkuma's suggestion in royerlab#285, add a specialized fast path that bypasses the BaseGraph copy entirely when both sides are SQLite: * Open the destination SQLGraph so it materializes its schema. * Mirror any extra columns and metadata. * Dispose the dst engine, ATTACH the dst database to the source's connection, and dump each table via INSERT INTO ... SELECT FROM. * For GraphView sources, materialize the selected node ids in a temp table (chunked inserts) and JOIN against it instead of using a giant IN (...) clause. * Re-open the dst engine and refresh derived caches. For correctness in other code paths (summary, NN solver, direct callers), also chunk SQLGraph.overlaps(node_ids) so it never overflows the bound-parameter limit. New tests cover: * The ATTACH fast path is taken (BaseGraph.from_other not called). * The exact issue royerlab#285 scenario: filter -> subgraph -> from_other with a selection > 999 nodes plus overlaps. * Node ids are preserved on full copies. * Fallback to the generic path for ':memory:' destinations and for non-SQL sources. * SQLGraph.overlaps with > 999 node ids returns the right pairs.

Instead of instantiating the destination upfront and ALTER-ing it into shape, replay the source's own CREATE TABLE/INDEX DDL from sqlite_master against the attached destination, copy rows (including the Metadata table verbatim), then open the destination via cls(**kwargs). The constructor's normal reflection path rebuilds the in-memory state, so the manual engine dispose/reopen, cache invalidation, and _update_max_id_per_time call are no longer needed. Also tighten the eligibility gate in from_other: the destination file must start empty (DDL replay would clash with an existing schema), and overwrite=True is excluded (the dst constructor would drop the tables we just populated).

Replace the hand-rolled chunk loop with the existing _chunked_sa_read helper. Only the source side is constrained per chunk via IN(...); the target side is filtered in Polars afterwards, which avoids the quadratic bound-parameter blow-up of having two IN clauses per chunk and lets us use the full _sql_chunk_size() budget. Also drop the seen-set deduplication, which was a behaviour change vs. the pre-fix code (the original query.all() preserved duplicates if any existed in the Overlap table).

codecov-commenter · 2026-05-01T17:54:48Z

Codecov Report

❌ Patch coverage is 90.16393% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 87.72%. Comparing base (6227efc) to head (74a8ca7).
⚠️ Report is 2 commits behind head on main.

Files with missing lines	Patch %	Lines
src/tracksdata/graph/_sql_graph.py	90.16%	3 Missing and 3 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #286      +/-   ##
==========================================
+ Coverage   87.70%   87.72%   +0.02%     
==========================================
  Files          57       57              
  Lines        4879     4936      +57     
  Branches      858      868      +10     
==========================================
+ Hits         4279     4330      +51     
- Misses        378      381       +3     
- Partials      222      225       +3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

yfukai

I'm sorry for the late response. I only have minor comments, and this looks good to me!

yfukai · 2026-05-13T05:22:04Z

+    dst._engine.dispose()
+
+
+def test_sql_from_other_full_copy_preserves_node_ids(tmp_path: Path) -> None:


I may be missing something, but this looks like it has already been tested in _assert_graphs_equivalent in test_sql_from_other_uses_attach_dump_for_sqlite?

yfukai · 2026-05-13T05:30:01Z

+    dst._engine.dispose()
+
+
+def test_sql_from_other_subgraph_view_with_overlaps(tmp_path: Path) -> None:


It may make sense to explicitly test this scenario for all combinations of SQL(disk), SQL(memory), and RustworkX.

JoOkuma added 3 commits May 1, 2026 09:59

JoOkuma mentioned this pull request May 1, 2026

SQLGraph.from_other fails with OperationalError #285

Open

Merge branch 'main' into fix/sql-graph-from-other-285

74a8ca7

yfukai approved these changes May 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Specialize SQLGraph.from_other for SQLite-to-SQLite (closes #285)#286

Specialize SQLGraph.from_other for SQLite-to-SQLite (closes #285)#286
JoOkuma wants to merge 4 commits into
royerlab:mainfrom
JoOkuma:fix/sql-graph-from-other-285

JoOkuma commented May 1, 2026 •

edited

Loading

Uh oh!

codecov-commenter commented May 1, 2026 •

edited

Loading

Uh oh!

yfukai left a comment

Uh oh!

yfukai May 13, 2026

Uh oh!

yfukai May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		dst._engine.dispose()


		def test_sql_from_other_full_copy_preserves_node_ids(tmp_path: Path) -> None:

		dst._engine.dispose()


		def test_sql_from_other_subgraph_view_with_overlaps(tmp_path: Path) -> None:

Conversation

JoOkuma commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

yfukai left a comment

Choose a reason for hiding this comment

Uh oh!

yfukai May 13, 2026

Choose a reason for hiding this comment

Uh oh!

yfukai May 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

JoOkuma commented May 1, 2026 •

edited

Loading

codecov-commenter commented May 1, 2026 •

edited

Loading