You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Add missing DataFrame methods for set operations and query
Expose upstream DataFusion DataFrame methods that were not yet
available in the Python API. Closesapache#1455.
Set operations:
- except_distinct: set difference with deduplication
- intersect_distinct: set intersection with deduplication
- union_by_name: union matching columns by name instead of position
- union_by_name_distinct: union by name with deduplication
Query:
- distinct_on: deduplicate rows based on specific columns
- sort_by: sort by expressions with ascending order and nulls last
Note: show_limit is already covered by the existing show(num) method.
explain_with_options and with_param_values are deferred as they require
exposing additional types (ExplainOption, ParamValues).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Add ExplainFormat enum and format option to DataFrame.explain()
Extend the existing explain() method with an optional format parameter
instead of adding a separate explain_with_options() method. This keeps
the API simple while exposing all upstream ExplainOption functionality.
Available formats: indent (default), tree, pgjson, graphviz.
The ExplainFormat enum is exported from the top-level datafusion module.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Add DataFrame.window() and unnest recursion options
Expose remaining DataFrame methods from upstream DataFusion.
Closesapache#1456.
- window(*exprs): apply window function expressions and append results
as new columns
- unnest_column/unnest_columns: add optional recursions parameter for
controlling unnest depth via (input_column, output_column, depth)
tuples
Note: drop_columns is already exposed as the existing drop() method.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Update docstring
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Improve docstrings and test robustness for new DataFrame methods
Clarify except_distinct/intersect_distinct docstrings, add deterministic
sort to test_window, add sort_by ascending verification test, and add
smoke tests for PGJSON and GRAPHVIZ explain formats.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Consolidate new DataFrame tests into parametrized tests
Combine set operation tests (except_distinct, intersect_distinct,
union_by_name, union_by_name_distinct) into a single parametrized
test_set_operations_distinct. Merge sort_by tests and convert
explain format tests to parametrized form.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Add doctest examples to new DataFrame method docstrings
Add >>> style usage examples for window, explain, except_distinct,
intersect_distinct, union_by_name, union_by_name_distinct, distinct_on,
sort_by, and unnest_columns to match existing docstring conventions.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Improve error messages, tests, and API hygiene from PR review
- Provide actionable error message for invalid explain format strings
- Remove recursions param from deprecated unnest_column (use unnest_columns)
- Add null-handling test case for sort_by to verify nulls-last behavior
- Add format-specific assertions to explain tests (TREE, PGJSON, GRAPHVIZ)
- Add deep recursion test for unnest_columns with depth > 1
- Add multi-expression window test to verify variadic *exprs
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Consolidate window and unnest tests into parametrized tests
Combine test_window and test_window_multiple_expressions into a single
parametrized test. Merge unnest recursion tests into one parametrized
test covering basic, explicit depth 1, and deep recursion cases.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Address PR review feedback for DataFrame operations
- Use upstream parse error for explain format instead of hardcoded options
- Fix sort_by to use column name resolution consistent with sort()
- Use ExplainFormat enum members directly in tests instead of string lookup
- Merge union_by_name_distinct into union_by_name(distinct=False) for a
more Pythonic API
- Update check-upstream skill to note union_by_name_distinct coverage
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Add DataFrame.column(), col(), and find_qualified_columns() methods
Expose upstream find_qualified_columns to resolve unqualified column
names into fully qualified column expressions. This is especially
useful for disambiguating columns after joins.
- find_qualified_columns(*names) on Rust side calls upstream directly
- DataFrame.column(name) and col(name) alias on Python side
- Update join and join_on docstrings to reference DataFrame.col()
- Add "Disambiguating Columns with DataFrame.col()" section to joins docs
- Add tests for qualified column resolution, ambiguity, and join usage
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Merge union_by_name and union_by_name_distinct into a single method with distinct flag
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* converting into a python dict loses a column when the names are identical
* Consolidate except_all/except_distinct and intersect/intersect_distinct into single methods with distinct flag
Follows the same pattern as union(distinct=) and union_by_name(distinct=).
Also deprecates union_distinct() in favor of union(distinct=True).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
0 commit comments