Skip to content

Commit 336cb02

Browse files
Fix/remove images field from pipe llm blueprint (Pipelex#105)
1 parent 08bbe33 commit 336cb02

28 files changed

Lines changed: 526 additions & 151 deletions

.cursor/rules/best_practices.mdc

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
---
2+
description:
3+
globs:
4+
alwaysApply: false
5+
---
6+
# Best Practices Guide
7+
8+
This document outlines the core best practices and patterns used in our codebase.
9+
10+
## Type Hints
11+
12+
1. **Always Use Type Hints**
13+
- Every function parameter must be typed
14+
- Every function return must be typed
15+
- Use type hints for all variables where type is not obvious
16+
17+
2. **StrEnum**
18+
- Import StrEnum from pipelex.types
19+
```python
20+
from pipelex.types import StrEnum
21+
22+
class ModelType(StrEnum):
23+
GPT4 = "gpt-4"
24+
GPT35 = "gpt-3.5-turbo"
25+
```
26+
27+
## Factory Pattern
28+
29+
1. **Use Factory Pattern for Object Creation**
30+
- Create factories when dealing with multiple implementations
31+
32+
## Documentation
33+
34+
1. **Docstring Format**
35+
- Quick description of the function/class
36+
- List args and their types
37+
- Document return values
38+
- Example:
39+
```python
40+
def process_image(image_path: str, size: Tuple[int, int]) -> bytes:
41+
"""Process and resize an image.
42+
43+
Args:
44+
image_path: Path to the source image
45+
size: Tuple of (width, height) for resizing
46+
47+
Returns:
48+
Processed image as bytes
49+
"""
50+
pass
51+
```
52+
53+
2. **Class Documentation**
54+
- Document class purpose and behavior
55+
- Include examples if complex
56+
```python
57+
class ImageProcessor:
58+
"""Handles image processing operations.
59+
60+
Provides methods for resizing, converting, and optimizing images.
61+
"""
62+
```
63+
64+
## Custom Exceptions
65+
66+
1. **Graceful Error Handling**
67+
- Use try/except blocks with specific exceptions
68+
- Convert third-party exceptions to custom ones
69+
```python
70+
try:
71+
from fal_client import AsyncClient as FalAsyncClient
72+
except ImportError as exc:
73+
raise MissingDependencyError(
74+
"fal-client", "fal",
75+
"The fal-client SDK is required to use FAL models."
76+
) from exc
77+
```

.cursor/rules/pipes.mdc

Lines changed: 0 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -5,20 +5,9 @@ alwaysApply: false
55
---
66
This rule explains how to build pipes.
77

8-
# File Naming & Structure
9-
10-
## The pipelines/ directory
11-
12-
- Pipelines and structures are defined in the `pipelex/libraries/pipelines` directory
13-
14-
pipelex/libraries
15-
└── pipelines
16-
178
## Pipeline file naming
189

19-
- Pipeline TOML files should be placed in the `pipelex/libraries/pipelines/` directory or a subdirectory of it
2010
- The file name should be descriptive, in snake_case and end with `.toml`
21-
- The file should contain both concepts and pipes definitions
2211

2312
## Pipeline file structure
2413

@@ -167,7 +156,6 @@ Your summary should not be longer than 2 sentences.
167156
PipeLLM = "Convert table screenshot to HTML"
168157
inputs = { table_screenshot = "TableScreenshot" }
169158
output = "HtmlTable"
170-
images = ["table_screenshot"]
171159
system_prompt = """
172160
You are a vision-based table extractor.
173161
"""

.cursor/rules/pytest.mdc

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,8 @@ These rules apply when writing unit tests.
1717
- More precisely, for `pipelex` and `pipelex.cogt` the async tests are placed inside subdirectories named `cogt_asynch` and `pipelex_asynch`
1818
- Fixtures are defined in conftest.py modules at different levels of the hierarchy, their scope is handled by pytest
1919
- Test data is placed inside test_data.py at different levels of the hierarchy, they must be imported with package paths from the root like `tests.pipelex.test_data`. Their content is all constants, regrouped inside classes to keep things tidy.
20+
- Always put test inside Test classes.
21+
- The pipelex pipelines should be stored in `tests/test_pipelines` as well as the related structured Output classes that inherit from `StructuredContent`
2022

2123
## Markers
2224

.cursor/rules/standards.mdc

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
---
2+
description:
3+
globs:
4+
alwaysApply: true
5+
---
6+
# Coding Standards
7+
8+
This document outlines the coding standards and quality control procedures that must be followed when contributing to this project.
9+
10+
## Code Quality Checks
11+
12+
### Linting and Type Checking
13+
14+
Before finalizing a task, you must run the following command to check for linting issues, type errors, and code quality problems:
15+
16+
```bash
17+
make check
18+
```
19+
20+
This command runs multiple code quality tools:
21+
- Pyright: Static type checking
22+
- Ruff: Fast Python linter
23+
- Mypy: Static type checker
24+
25+
Always fix any issues reported by these tools before proceeding.
26+
27+
### Running Tests
28+
29+
We have several make commands for running tests:
30+
31+
1. `make tp`: Runs all tests with these markers:
32+
```
33+
(dry_runnable or not (inference or llm or imgg or ocr)) and not (needs_output or pipelex_api)
34+
```
35+
Use this for quick test runs that don't require LLM or image generation.
36+
37+
2. `make ti`: Runs all tests with these markers:
38+
```
39+
inference and not imgg
40+
```
41+
Use this for testing LLM functionality without image generation.
42+
43+
3. To run specific tests:
44+
```bash
45+
make tp TEST=TestClassName
46+
# or
47+
make tp TEST=test_function_name
48+
```
49+
It matches names, so `TEST=test_function_name` is going to run all test with the function name that STARTS with `test_function_name`.
50+
51+
## Important Project Directories
52+
53+
### Pipelines Directory
54+
- All pipeline definitions go in `pipelex/libraries/pipelines/`
55+
56+
### Tests Directory
57+
- All tests are located in the `tests/` directory
58+
59+
### Documentation Directory
60+
- All documentation is located in the `docs/` directory

.cursor/rules/structures.mdc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,3 +59,4 @@ Structure classes defined within `pipelex_libraries/pipelines/` are automaticall
5959
- Use Pydantic validators for data cleaning/validation
6060
- Remove timezone info from datetime fields
6161
- Respect rules of @base_models.
62+

.cursor/rules/tdd.mdc

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
---
2+
description:
3+
globs:
4+
alwaysApply: false
5+
---
6+
# Test-Driven Development Guide
7+
8+
This document outlines our test-driven development (TDD) process and the tools available for testing.
9+
10+
## TDD Cycle
11+
12+
1. **Write a Test First**
13+
[pytest.mdc](mdc:.cursor/rules/pytest.mdc)
14+
15+
2. **Write the Code**
16+
- Implement the minimum amount of code needed to pass the test
17+
- Follow the project's coding standards
18+
- Keep it simple - don't write more than needed
19+
20+
3. **Run Linting and Type Checking**
21+
[standards.mdc](mdc:.cursor/rules/standards.mdc)
22+
23+
4. **Refactor if needed**
24+
If the code needs refactoring, with the best practices [best_practices.mdc](mdc:.cursor/rules/best_practices.mdc)
25+
26+
5. **Validate tests**
27+
28+
Remember: The key to TDD is writing the test first and letting it drive your implementation. Always run the full test suite and quality checks before considering a feature complete.

docs/pages/build-reliable-ai-workflows-with-pipelex/pipe-operators/PipeLLM.md

Lines changed: 95 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,96 @@ For structured data output, `PipeLLM` employs two main strategies:
1313
a. First, the LLM generates a free-form text based on the initial prompt.
1414
b. Second, another LLM call is made with a specific prompt designed to extract and structure the information from the generated text into the target Pydantic model.
1515

16+
## Working with Images (Vision Language Models)
17+
18+
`PipeLLM` supports Vision Language Models (VLMs) that can process both text and images. To use images in your prompts:
19+
20+
### Basic Image Input
21+
22+
Images must be declared in the `inputs` section of your pipe definition. The image will be automatically passed to the VLM along with your text prompt.
23+
24+
```toml
25+
[pipe.describe_image]
26+
PipeLLM = "Describe an image"
27+
inputs = { image = "Image" }
28+
output = "VisualDescription"
29+
prompt_template = """
30+
Describe the provided image in great detail.
31+
"""
32+
```
33+
34+
**Important**: Do NOT reference image variables in your prompt template using `@image` or `$image`. Images are automatically passed to vision-enabled LLMs and should not be treated as text variables.
35+
36+
**Flexible Image Inputs**
37+
38+
You can use any concept that refines `Image` as an input, and choose descriptive variable names that fit your use case:
39+
40+
```toml
41+
[pipe.analyze_wedding]
42+
PipeLLM = "Analyze wedding photo"
43+
inputs = { wedding_photo = "images.Photo" }
44+
output = "PhotoAnalysis"
45+
prompt_template = """
46+
Analyze this wedding photo and describe the key moments captured.
47+
"""
48+
```
49+
50+
### Images as Sub-attributes of Structured Content
51+
52+
When working with structured content that contains image fields (like `PageContent` which has a `page_view` field), you need to specify the full path to the image attribute in the `inputs` section:
53+
54+
```toml
55+
[pipe.analyze_page_view]
56+
PipeLLM = "Analyze the visual layout of a page"
57+
inputs = { "page_content.page_view" = "Image" }
58+
output = "LayoutAnalysis"
59+
prompt_template = """
60+
Analyze the visual layout and design elements of this page.
61+
Focus on typography, spacing, and overall composition.
62+
"""
63+
```
64+
65+
In this example:
66+
- `page_content` is the input variable containing a `PageContent` object
67+
- `page_view` is the `ImageContent` field within the `PageContent` structure
68+
- The dot notation `page_content.page_view` tells Pipelex to extract the image from that specific field
69+
70+
### Multiple Images
71+
72+
You can include multiple images in a single prompt by listing them in the inputs:
73+
74+
```toml
75+
[pipe.compare_images]
76+
PipeLLM = "Compare two images"
77+
inputs = {
78+
first_image = "Image",
79+
second_image = "Image"
80+
}
81+
output = "ImageComparison"
82+
prompt_template = """
83+
Compare these two images and describe their similarities and differences.
84+
"""
85+
```
86+
87+
### Combining Text and Image Inputs
88+
89+
You can mix any stuff and image inputs in the same pipe:
90+
91+
```toml
92+
[pipe.analyze_document_with_context]
93+
PipeLLM = "Analyze a document page with additional context"
94+
inputs = {
95+
context = "Text",
96+
document.page_view = "Image"
97+
}
98+
output = "DocumentAnalysis"
99+
prompt_template = """
100+
Given this context: $context
101+
102+
Analyze the document page shown in the image and explain how it relates to the provided context.
103+
"""
104+
```
105+
16106
## Configuration
17107

18108
`PipeLLM` is configured in your pipeline's `.toml` file.
@@ -22,17 +112,18 @@ For structured data output, `PipeLLM` employs two main strategies:
22112
| Parameter | Type | Description | Required |
23113
| --------------------------- | ------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------- |
24114
| `PipeLLM` | string | A descriptive name for the LLM operation. | Yes |
25-
| `inputs` | dictionary | The input concept(s) for the LLM operation, as a dictionary mapping input names to concept codes. | Yes |
115+
| `inputs` | dictionary | The input concept(s) for the LLM operation, as a dictionary mapping input names to concept codes. For images within structured content, use dot notation (e.g., `"page.image"`). | Yes |
26116
| `output` | string | The output concept produced by the LLM operation. | Yes |
27117
| `llm` | string or table | Specifies the LLM preset(s) to use. Can be a single preset or a table mapping different presets for different generation modes (e.g., `main`, `object_direct`). | No |
28118
| `system_prompt` | string | A system-level prompt to guide the LLM's behavior (e.g., "You are a helpful assistant"). Can be inline text or a reference to a template file (`"file:path/to/prompt.md"`). | No |
29119
| `prompt` | string | A simple, static user prompt. Use this when you don't need to inject any variables. | No |
30-
| `prompt_template` | string | A template for the user prompt. Use `$` for inline variables (e.g., `$topic`) and `@` to insert the content of an entire input (e.g., `@text_to_summarize`). | No |
31-
| `images` | list of strings | For Vision Language Models (VLMs), specifies which input variables are images. | No |
120+
| `prompt_template` | string | A template for the user prompt. Use `$` for inline variables (e.g., `$topic`) and `@` to insert the content of an entire input (e.g., `@text_to_summarize`). **Note**: Do not use `@` or `$` for image variables. | No |
121+
| `images` | list of strings | **Deprecated**: Use the `inputs` section to declare image inputs instead. | No |
32122
| `structuring_method` | string | The method for generating structured output. Can be `direct` or `preliminary_text`. Defaults to the global configuration. | No |
33123
| `prompt_template_to_structure` | string | The prompt template for the second step in `preliminary_text` mode. | No |
34124
| `output_multiplicity` | string or integer | Defines the number of outputs. Use `"list"` for a variable-length list, or an integer (e.g., `3`) for a fixed-size list. | No |
35125

126+
## Examples
36127

37128
### Simple Text Generation Example
38129

@@ -75,7 +166,6 @@ This pipe takes an image of a table and uses a VLM to extract the content as an
75166
PipeLLM = "Extract table data from an image"
76167
inputs = { image = "TableScreenshot" }
77168
output = "TableData"
78-
images = ["image"]
79169
prompt_template = """
80170
Extract the table data from this image and format it as a structured table.
81171
"""
@@ -105,4 +195,4 @@ Analyze this expense report and extract the following information:
105195
"""
106196
```
107197

108-
In this example, `Pipelex` will instruct the LLM to return a list of objects that conform to the `Expense` structure.
198+
In this example, `Pipelex` will instruct the LLM to return a list of objects that conform to the `Expense` structure.

docs/pages/cookbook-examples/extract-dpe.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,6 @@ The pipeline uses a `PipeLLM` with a very specific prompt to extract the informa
6363
PipeLLM = "Write markdown from page content of a 'Diagnostic de Performance Energetique'"
6464
inputs = { page_content = "Page" }
6565
output = "Dpe" # The output is structured as a Dpe object
66-
images = ["page_content.page_view"]
6766
llm = "llm_for_img_to_text"
6867
structuring_method = "preliminary_text"
6968
system_prompt = """You are a multimodal LLM, expert in converting images into perfect markdown."""

docs/pages/cookbook-examples/extract-gantt.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,6 @@ PipeLLM = "Extract the precise dates of the task, start_date and end_date"
7979
inputs = { gantt_chart_image = "GanttChartImage", gantt_timescale = "GanttTimescaleDescription", gantt_task_name = "GanttTaskName" }
8080
output = "GanttTaskDetails" # The output is structured as a GanttTaskDetails object
8181
structuring_method = "preliminary_text"
82-
images = ["gantt_chart_image"]
8382
llm = "llm_to_extract_diagram"
8483
prompt_template = """
8584
I am sharing an image of a Gantt chart.

docs/pages/cookbook-examples/extract-proof-of-purchase.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -57,9 +57,8 @@ The pipeline uses a powerful `PipeLLM` to extract the structured data from the d
5757
```toml
5858
[pipe.write_markdown_from_page_content_proof_of_purchase]
5959
PipeLLM = "Write markdown from page content"
60-
inputs = { page_content = "Page" }
60+
inputs = { "page_content.page_view" = "Page" } # The LLM receives the image of the page
6161
output = "ProofOfPurchase" # The LLM is forced to output a ProofOfPurchase object
62-
images = ["page_content.page_view"] # The LLM receives the image of the page
6362
llm = "llm_for_img_to_text"
6463
structuring_method = "preliminary_text"
6564
system_prompt = """You are a multimodal LLM, expert at converting images into perfect markdown."""

0 commit comments

Comments
 (0)