New CTable dictionary spec, nested columns and richer Arrow/Parquet interoperability#634
Merged
Conversation
…ough import/export
…y usable across APIs
…l mapping, and roundtrip support
- Add dotted nested column support in CTable access and filters:
- attribute namespaces (t.trip.begin.lon)
- string where-expression rewriting for dotted operands
- Store dotted leaf columns hierarchically under _cols/...:
- a.b.c -> /_cols/a/b/c(.b2nd/.b2b)
- Add schema metadata v2 for nested mappings:
- logical/physical/storage path maps
- root alias metadata support
- Preserve unnamed Arrow root ("") through Parquet/Arrow import-export:
- normalize internally, restore on export via metadata
- Add logical->physical selector resolution across APIs:
- __getitem__, __getattr__, select, Arrow export columns, index APIs
- Support struct-prefix expansion in selectors (select(["trip"]))
- Implement recursive Arrow struct flattening in from_arrow:
- flatten to dotted physical leaves
- Implement row reconstruction for flattened structs on row materialization
- Add virtual struct-path column reads (t["props"][:]) from descendant leaves
- include null-collapse behavior for fully-null structs
- Keep top-level struct schema compatibility in columns_by_name and Arrow/Parquet export
- Extend nested semantics for index lifecycle and sorting:
- logical aliases in create/rebuild/drop/compact index paths
- clear error for non-leaf sort keys resolving to multiple leaves
- Consolidate nested tests into two modules:
- tests/ctable/test_nested_access_storage.py
- tests/ctable/test_nested_metadata_root.py
- Add/adjust tests for nested access, storage paths, metadata v2, root alias,
struct flatten/reconstruct, roundtrip, sort/index behavior, and compatibility
- Add benchmark utility:
- bench/ctable/bench_nested_parquet_roundtrip.py
…o ctable-dict-spec
…lder required anymore
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR extends CTable with richer Arrow/Parquet interoperability, nested-column support, dictionary-encoded string columns, faster table opening, and lower blosc2 import overhead.
Main additions: