Skip to content

Commit 00fd3c0

Browse files
committed
docs: update AGENTS.md to enhance build and development workflow instructions
1 parent 456409a commit 00fd3c0

File tree

1 file changed

+18
-69
lines changed

1 file changed

+18
-69
lines changed

AGENTS.md

Lines changed: 18 additions & 69 deletions
Original file line numberDiff line numberDiff line change
@@ -6,12 +6,19 @@ This repository contains Python bindings for Rust's DataFusion.
66
- Root split: Rust implementation in `src/` and Python wrappers in `python/datafusion/`.
77
- Examples live in `examples/`; use `examples/datafusion-ffi-example/` as a reference for FFI idioms and UDF/UDAF examples.
88

9-
## Development workflow
9+
## Build / dev / test workflow (essential)
1010
- Ensure git submodules are initialized: `git submodule update --init`.
11-
- Build the Rust extension before running tests:
12-
- `uv run --no-project maturin develop --uv`
13-
- Run tests with pytest:
14-
- `uv --no-project pytest .`
11+
- Build the Rust→Python extension with maturin (prefer `uv` tooling):
12+
- Local dev build: `uv run --no-project maturin develop --uv` (or `maturin develop --uv` inside a venv)
13+
- Run tests after building:
14+
- `uv --no-project pytest .` or `python -m pytest`
15+
16+
## Project-specific conventions & patterns
17+
- Use the `maturin` + `pyo3` workflow for building wheels/develop installs; repository `pyproject.toml` contains maturin configuration.
18+
- Many Python-only helpers and higher-level APIs live in `python/datafusion/` (for example `io.py`, `user_defined.py`, `dataframe_formatter.py`); prefer these helper modules when changing Python surface area.
19+
- For Rust ↔ Python interop, prefer Arrow C Data Interface / PyCapsule patterns (see `src/pyarrow_util.rs`, `python/datafusion/context.py`, and `docs/source/contributor-guide/ffi.rst`).
20+
- Place typing-only imports under `if TYPE_CHECKING:` guards (Ruff rule `TCH001` is enforced).
21+
- In Rust examples/interop glue, prefer raw C string literals like `cr"..."` for small constants over allocating a `CString`.
1522

1623
## Linting and formatting
1724
- Use pre-commit for linting/formatting.
@@ -24,7 +31,6 @@ This repository contains Python bindings for Rust's DataFusion.
2431
- Rust linting via `cargo clippy`
2532
- Ruff rules that frequently fail in this repo:
2633
- **Import sorting (`I001`)**: Keep import blocks sorted/grouped. Running `ruff check --select I --fix <files>` will repair order.
27-
- **Type-checking guards (`TCH001`)**: Place imports that are only needed for typing (e.g., `AggregateUDF`, `ScalarUDF`, `TableFunction`, `WindowUDF`, `NullTreatment`, `DataFrame`) inside a `if TYPE_CHECKING:` block.
2834
- **Docstring spacing (`D202`, `D205`)**: The summary line must be separated from the body with exactly one blank line, and there must be no blank line immediately after the closing triple quotes.
2935
- **Ternary suggestions (`SIM108`)**: Prefer single-line ternary expressions when Ruff requests them over multi-line `if`/`else` assignments.
3036

@@ -34,54 +40,13 @@ This repository contains Python bindings for Rust's DataFusion.
3440

3541
## Rust insights
3642

37-
Below are a set of concise mental-model shifts and expert insights about Rust that are helpful when developing and reviewing the Rust parts of this repository. They emphasize how to think in terms of compile-time guarantees, capabilities, and algebraic composition rather than just language ergonomics.
43+
Use these as quick mental models when reviewing or editing Rust code in this repo:
3844

39-
1. Ownership → Compile-Time Resource Graph
40-
41-
> Stop seeing ownership as “who frees memory.”
42-
> See it as a **compile-time dataflow graph of resource control**.
43-
44-
Every `let`, `move`, or `borrow` defines an edge in a graph the compiler statically verifies — ensuring linear usage of scarce resources (files, sockets, locks) **without a runtime GC**. Once you see lifetimes as edges, not annotations, you’re designing **proofs of safety**, not code that merely compiles.
45-
46-
2. Borrowing → Capability Leasing
47-
48-
> Stop thinking of borrowing as “taking a reference.”
49-
> It’s **temporary permission to mutate or observe**, granted by the compiler’s capability system.
50-
51-
`&mut` isn’t a pointer — it’s a **lease with exclusive rights**, enforced at compile time. Expert code treats borrows as contracts:
52-
53-
* If you can shorten them, you increase parallelism.
54-
* If you lengthen them, you increase safety scope.
55-
56-
3. Traits → Behavioral Algebra
57-
58-
> Stop viewing traits as “interfaces.”
59-
> They’re **algebraic building blocks** that define composable laws of behavior.
60-
61-
A `Trait` isn't just a promise of methods; it’s a **contract that can be combined, derived, or blanket-implemented**. Once you realize traits form a behavioral lattice, you stop subclassing and start composing — expressing polymorphism as **capabilities, not hierarchies**.
62-
63-
4. `Result` → Explicit Control Flow as Data
64-
65-
> Stop using `Result` as an error type.
66-
> It’s **control flow reified as data**.
67-
68-
The `?` operator turns sequential logic into a **monadic pipeline** — your `Result` chain isn’t linear code; it’s a dependency graph of partial successes. Experts design their APIs so every recoverable branch is an **encoded decision**, not a runtime exception.
69-
70-
5. Lifetimes → Static Borrow Slices
71-
72-
> Stop fearing lifetimes as compiler noise.
73-
> They’re **proofs of local consistency** — mini type-level theorems.
74-
75-
Each `'a` parameter expresses that two pieces of data **coexist safely** within a bounded region of time. Experts deliberately model relationships through lifetime parameters to **eliminate entire classes of runtime checks**.
76-
77-
6. Pattern Matching → Declarative Exhaustiveness
78-
79-
> Stop thinking of `match` as a fancy switch.
80-
> It’s a **total function over variants**, verified at compile time.
81-
82-
Once you realize `match` isn’t branching but **structural enumeration**, you start writing exhaustive domain models where every possible state is named, and every transition is **type-checked**.
83-
84-
Stop seeing `Option` as “value or no value.” Instead, see it as a lazy computation pipeline that only executes when meaningful. These combinators turn error-handling into data flow: once you think of absence as a first-class transformation, you can write algorithms that never mention control flow explicitly—and yet, they’re 100% safe and analyzable by the compiler.
45+
- Ownership/borrowing: model values and references as compile-time capability flow; keep borrows as short as practical.
46+
- Traits: prefer composable capability contracts over inheritance-style thinking.
47+
- `Result`/`Option`: treat them as explicit control-flow data; use combinators and `?` for clear pipelines.
48+
- Lifetimes: express valid coexistence windows for references, not just syntax to satisfy the compiler.
49+
- Pattern matching: model state with enums and exhaustive `match` handling.
8550

8651

8752
## Refactoring opportunities
@@ -103,22 +68,6 @@ Stop seeing `Option` as “value or no value.” Instead, see it as a lazy compu
10368
and prefer `from_stream(df)` instead. This improves readability and avoids
10469
relying on private PyArrow internals that may change.
10570

106-
- Prefer Rust raw C-string literals for passing small string constants to
107-
Python's C-API or embedding contexts instead of allocating a `CString`.
108-
109-
Before (allocates a CString and unwraps):
110-
111-
```rust
112-
let code = CString::new("pass").unwrap();
113-
py.run(code.as_c_str(), None, None)?;
114-
```
115-
116-
After (use a raw C string literal via `cr"..."`):
117-
118-
```rust
119-
py.run(cr"pass", None, None)?;
120-
```
121-
12271
## Helper Functions
12372

12473
## Commenting guidance

0 commit comments

Comments
 (0)