You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: AGENTS.md
+18-69Lines changed: 18 additions & 69 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,12 +6,19 @@ This repository contains Python bindings for Rust's DataFusion.
6
6
- Root split: Rust implementation in `src/` and Python wrappers in `python/datafusion/`.
7
7
- Examples live in `examples/`; use `examples/datafusion-ffi-example/` as a reference for FFI idioms and UDF/UDAF examples.
8
8
9
-
## Development workflow
9
+
## Build / dev / test workflow (essential)
10
10
- Ensure git submodules are initialized: `git submodule update --init`.
11
-
- Build the Rust extension before running tests:
12
-
-`uv run --no-project maturin develop --uv`
13
-
- Run tests with pytest:
14
-
-`uv --no-project pytest .`
11
+
- Build the Rust→Python extension with maturin (prefer `uv` tooling):
12
+
- Local dev build: `uv run --no-project maturin develop --uv` (or `maturin develop --uv` inside a venv)
13
+
- Run tests after building:
14
+
-`uv --no-project pytest .` or `python -m pytest`
15
+
16
+
## Project-specific conventions & patterns
17
+
- Use the `maturin` + `pyo3` workflow for building wheels/develop installs; repository `pyproject.toml` contains maturin configuration.
18
+
- Many Python-only helpers and higher-level APIs live in `python/datafusion/` (for example `io.py`, `user_defined.py`, `dataframe_formatter.py`); prefer these helper modules when changing Python surface area.
19
+
- For Rust ↔ Python interop, prefer Arrow C Data Interface / PyCapsule patterns (see `src/pyarrow_util.rs`, `python/datafusion/context.py`, and `docs/source/contributor-guide/ffi.rst`).
20
+
- Place typing-only imports under `if TYPE_CHECKING:` guards (Ruff rule `TCH001` is enforced).
21
+
- In Rust examples/interop glue, prefer raw C string literals like `cr"..."` for small constants over allocating a `CString`.
15
22
16
23
## Linting and formatting
17
24
- Use pre-commit for linting/formatting.
@@ -24,7 +31,6 @@ This repository contains Python bindings for Rust's DataFusion.
24
31
- Rust linting via `cargo clippy`
25
32
- Ruff rules that frequently fail in this repo:
26
33
-**Import sorting (`I001`)**: Keep import blocks sorted/grouped. Running `ruff check --select I --fix <files>` will repair order.
27
-
-**Type-checking guards (`TCH001`)**: Place imports that are only needed for typing (e.g., `AggregateUDF`, `ScalarUDF`, `TableFunction`, `WindowUDF`, `NullTreatment`, `DataFrame`) inside a `if TYPE_CHECKING:` block.
28
34
-**Docstring spacing (`D202`, `D205`)**: The summary line must be separated from the body with exactly one blank line, and there must be no blank line immediately after the closing triple quotes.
29
35
-**Ternary suggestions (`SIM108`)**: Prefer single-line ternary expressions when Ruff requests them over multi-line `if`/`else` assignments.
30
36
@@ -34,54 +40,13 @@ This repository contains Python bindings for Rust's DataFusion.
34
40
35
41
## Rust insights
36
42
37
-
Below are a set of concise mental-model shifts and expert insights about Rust that are helpful when developing and reviewing the Rust parts of this repository. They emphasize how to think in terms of compile-time guarantees, capabilities, and algebraic composition rather than just language ergonomics.
43
+
Use these as quick mental models when reviewing or editing Rust code in this repo:
38
44
39
-
1. Ownership → Compile-Time Resource Graph
40
-
41
-
> Stop seeing ownership as “who frees memory.”
42
-
> See it as a **compile-time dataflow graph of resource control**.
43
-
44
-
Every `let`, `move`, or `borrow` defines an edge in a graph the compiler statically verifies — ensuring linear usage of scarce resources (files, sockets, locks) **without a runtime GC**. Once you see lifetimes as edges, not annotations, you’re designing **proofs of safety**, not code that merely compiles.
45
-
46
-
2. Borrowing → Capability Leasing
47
-
48
-
> Stop thinking of borrowing as “taking a reference.”
49
-
> It’s **temporary permission to mutate or observe**, granted by the compiler’s capability system.
50
-
51
-
`&mut` isn’t a pointer — it’s a **lease with exclusive rights**, enforced at compile time. Expert code treats borrows as contracts:
52
-
53
-
* If you can shorten them, you increase parallelism.
54
-
* If you lengthen them, you increase safety scope.
55
-
56
-
3. Traits → Behavioral Algebra
57
-
58
-
> Stop viewing traits as “interfaces.”
59
-
> They’re **algebraic building blocks** that define composable laws of behavior.
60
-
61
-
A `Trait` isn't just a promise of methods; it’s a **contract that can be combined, derived, or blanket-implemented**. Once you realize traits form a behavioral lattice, you stop subclassing and start composing — expressing polymorphism as **capabilities, not hierarchies**.
62
-
63
-
4.`Result` → Explicit Control Flow as Data
64
-
65
-
> Stop using `Result` as an error type.
66
-
> It’s **control flow reified as data**.
67
-
68
-
The `?` operator turns sequential logic into a **monadic pipeline** — your `Result` chain isn’t linear code; it’s a dependency graph of partial successes. Experts design their APIs so every recoverable branch is an **encoded decision**, not a runtime exception.
69
-
70
-
5. Lifetimes → Static Borrow Slices
71
-
72
-
> Stop fearing lifetimes as compiler noise.
73
-
> They’re **proofs of local consistency** — mini type-level theorems.
74
-
75
-
Each `'a` parameter expresses that two pieces of data **coexist safely** within a bounded region of time. Experts deliberately model relationships through lifetime parameters to **eliminate entire classes of runtime checks**.
76
-
77
-
6. Pattern Matching → Declarative Exhaustiveness
78
-
79
-
> Stop thinking of `match` as a fancy switch.
80
-
> It’s a **total function over variants**, verified at compile time.
81
-
82
-
Once you realize `match` isn’t branching but **structural enumeration**, you start writing exhaustive domain models where every possible state is named, and every transition is **type-checked**.
83
-
84
-
Stop seeing `Option` as “value or no value.” Instead, see it as a lazy computation pipeline that only executes when meaningful. These combinators turn error-handling into data flow: once you think of absence as a first-class transformation, you can write algorithms that never mention control flow explicitly—and yet, they’re 100% safe and analyzable by the compiler.
45
+
- Ownership/borrowing: model values and references as compile-time capability flow; keep borrows as short as practical.
46
+
- Traits: prefer composable capability contracts over inheritance-style thinking.
47
+
-`Result`/`Option`: treat them as explicit control-flow data; use combinators and `?` for clear pipelines.
48
+
- Lifetimes: express valid coexistence windows for references, not just syntax to satisfy the compiler.
49
+
- Pattern matching: model state with enums and exhaustive `match` handling.
85
50
86
51
87
52
## Refactoring opportunities
@@ -103,22 +68,6 @@ Stop seeing `Option` as “value or no value.” Instead, see it as a lazy compu
103
68
and prefer `from_stream(df)` instead. This improves readability and avoids
104
69
relying on private PyArrow internals that may change.
105
70
106
-
- Prefer Rust raw C-string literals for passing small string constants to
107
-
Python's C-API or embedding contexts instead of allocating a `CString`.
0 commit comments