Skip to content

Commit cc35fcc

Browse files
authored
Merge pull request Pipelex#143 from Pipelex/release/v0.5.0
### Highlight: Vibe Coding an AI workflow becomes a reality **Create AI workflows from natural language without writing code** - The combination of Pipelex's declarative language, comprehensive Cursor rules, and robust validation tools enables AI assistants to autonomously iterate on pipelines until all errors are resolved and workflows are ready to run. ### Added - **Complete Dry Run & Static Validation System** - A comprehensive validation framework that catches configuration and pipeline errors before any expensive inference operations. - **WorkingMemoryFactory Enhancement**: New `make_for_dry_run()` method creates working memory with realistic mock objects for zero-cost pipeline testing - **Enhanced Dry Run System**: Complete dry run support for all pipe controllers (`PipeCondition`, `PipeParallel`, `PipeBatch`) with mock data generation using `polyfactory` - **Comprehensive Static Validation**: Enhanced static validation with configurable error handling for missing/extraneous input variables and domain validation - **TOML File Validation**: Automatic detection and prevention of trailing whitespaces, formatting issues, and compilation blockers in pipeline files - **Pipeline Testing Framework**: New `dry_run_all_pipes()` method enables comprehensive testing of entire pipeline libraries - **Enhanced Library Loading**: Improved error handling and validation during TOML file loading with proper exception propagation ### Configuration - **Dry Run Configuration**: New `allowed_to_fail_pipes` setting allows specific pipes (like infinite loop examples that fail on purpose) to be excluded from dry run validation - **Static Validation Control**: Configurable error reactions (`raise`, `log`, `ignore`) for different validation error types ### Documentation & Development Experience - **Cursor Rules Enhancement**: Comprehensive pipe controller documentation covering `PipeSequence`, `PipeCondition`, `PipeBatch`, and `PipeParallel`, improved PipeOperator documentation for `PipeLLM`, `PipeOCR` - **Pipeline Validation CLI**: Enhanced `pipelex validate` command with better error reporting and validation coverage - **Improved Error Messages**: Better formatting and context for pipeline configuration errors ### Changed - **OCR Input Standardization**: Changed OCR pipe input parameter naming to consistently use `ocr_input` for both image and PDF inputs, improving consistency across the API - **Error Message Improvements**: Updated PipeCondition error messages to reference `expression_template` instead of deprecated `expression_jinja2`
2 parents 78a0c0f + 888fa5a commit cc35fcc

149 files changed

Lines changed: 6216 additions & 910 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.cursor/rules/best_practices.mdc

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -15,13 +15,9 @@ This document outlines the core best practices and patterns used in our codebase
1515
- Use type hints for all variables where type is not obvious
1616

1717
2. **StrEnum**
18-
- Import StrEnum from pipelex.types
18+
If you want to use StrEnum, import it from `pipelex.types`
1919
```python
2020
from pipelex.types import StrEnum
21-
22-
class ModelType(StrEnum):
23-
GPT4 = "gpt-4"
24-
GPT35 = "gpt-3.5-turbo"
2521
```
2622

2723
## Factory Pattern
@@ -75,3 +71,7 @@ This document outlines the core best practices and patterns used in our codebase
7571
"The fal-client SDK is required to use FAL models."
7672
) from exc
7773
```
74+
75+
## Pipelines
76+
77+
Always run the cli `pipelex validate` when you are finished writing pipelines: This checks for errors. If there are errors, iterate.

.cursor/rules/llms.mdc

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
description: Use LLM models with approrpiate settings. Define LLM handles. Define LLM parameters directly in PipeLLM or through presets.
2+
description:
33
globs:
44
alwaysApply: false
55
---
@@ -41,7 +41,7 @@ Here is an example of using an llm_handle to specify which LLM to use in a PipeL
4141
PipeLLM = "Write text about Hello World."
4242
output = "Text"
4343
llm = { llm_handle = "gpt-4o-mini", temperature = 0.9, max_tokens = "auto" }
44-
prompt = """
44+
prompt_template = """
4545
Write a haiku about Hello World.
4646
"""
4747
```

.cursor/rules/pipe-batch.mdc

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
---
2+
description:
3+
globs:
4+
alwaysApply: false
5+
---
6+
# PipeBatch Controller
7+
8+
The PipeBatch controller allows you to apply a pipe operation to each element in a list of inputs in parallele. It is created via a PipeSequence.
9+
10+
## Usage in TOML Configuration
11+
12+
```toml
13+
[pipe.sequence_with_batch]
14+
PipeSequence = "A Sequence of pipes"
15+
inputs = { input_data = "ConceptName" }
16+
output = "OutputConceptName"
17+
steps = [
18+
{ pipe = "pipe_to_apply", batch_over = "input_list", batch_as = "current_item", result = "batch_results" }
19+
]
20+
```
21+
22+
## Key Parameters
23+
24+
- `pipe`: The pipe operation to apply to each element in the batch
25+
- `batch_over`: The name of the list in the context to iterate over
26+
- `batch_as`: The name to use for the current element in the pipe's context
27+
- `result`: Where to store the results of the batch operation
28+
29+
# Important tip
30+
31+
Always run the cli `pipelex validate` when you are finished writing pipelines: This checks for errors. If there are errors, iterate.

.cursor/rules/pipe-condition.mdc

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
---
2+
description:
3+
globs:
4+
alwaysApply: false
5+
---
6+
# PipeCondition Controller
7+
8+
The PipeCondition controller allows you to implement conditional logic in your pipeline, choosing which pipe to execute based on an evaluated expression. It supports both direct expressions and expression templates.
9+
10+
## Usage in TOML Configuration
11+
12+
### Basic Usage with Direct Expression
13+
14+
```toml
15+
[pipe.conditional_operation]
16+
PipeCondition = "A conditonal pipe to decide wheter..."
17+
inputs = { input_data = "CategoryInput" }
18+
output = "native.Text"
19+
expression = "input_data.category"
20+
21+
[pipe.conditional_operation.pipe_map]
22+
small = "process_small"
23+
medium = "process_medium"
24+
large = "process_large"
25+
```
26+
or
27+
```toml
28+
[pipe.conditional_operation]
29+
PipeCondition = "A conditonal pipe to decide wheter..."
30+
inputs = { input_data = "CategoryInput" }
31+
output = "native.Text"
32+
expression_template = "{{ input_data.category }}" # Jinja2 code
33+
34+
[pipe.conditional_operation.pipe_map]
35+
small = "process_small"
36+
medium = "process_medium"
37+
large = "process_large"
38+
```
39+
40+
## Key Parameters
41+
42+
- `expression`: Direct boolean or string expression (mutually exclusive with expression_template)
43+
- `expression_template`: Jinja2 template for more complex conditional logic (mutually exclusive with expression)
44+
- `pipe_map`: Dictionary mapping expression results to pipe codes :
45+
1 - The key on the left (`small`, `medium`) is the result of `expression` or `expression_template`.
46+
2 - The value on the right (`process_small`, `process_medium`, ..) is the name of the pipce to trigger
47+
48+
# Important tip
49+
50+
Always run the cli `pipelex validate` when you are finished writing pipelines: This checks for errors. If there are errors, iterate.

.cursor/rules/pipe-func.mdc

Whitespace-only changes.

.cursor/rules/pipe-imgg.mdc

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
---
2+
description:
3+
globs:
4+
alwaysApply: false
5+
---

.cursor/rules/pipe-llm.mdc

Lines changed: 107 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,107 @@
1+
---
2+
description:
3+
globs:
4+
alwaysApply: false
5+
---
6+
# PipeLLM Guide
7+
8+
## Purpose
9+
10+
PipeLLM is used to:
11+
1. Generate text or objects with LLMs
12+
2. Process images with Vision LLMs
13+
14+
## Basic Usage
15+
16+
### Simple Text Generation
17+
```toml
18+
[pipe.write_story]
19+
PipeLLM = "Write a short story"
20+
output = "Text"
21+
prompt_template = """
22+
Write a short story about a programmer.
23+
"""
24+
```
25+
26+
### Structured Data Extraction
27+
```toml
28+
[pipe.extract_info]
29+
PipeLLM = "Extract information"
30+
inputs = { text = "Text" }
31+
output = "PersonInfo"
32+
prompt_template = """
33+
Extract person information from this text:
34+
@text
35+
"""
36+
```
37+
38+
### Where to Put Structured Objects
39+
Place your Pydantic models in `pipelex_libraries/pipelines/your_models.py`:
40+
41+
```python
42+
from pipelex.core.stuff_content import StructuredContent
43+
44+
class PersonInfo(StructuredContent): # The output models always have to be subclass of StructuredContent
45+
name: str
46+
age: int
47+
email: str
48+
```
49+
50+
## Advanced Features
51+
52+
### LLM Settings
53+
54+
You can specify LLM settings in two ways:
55+
56+
1. **Direct in the pipe**:
57+
```toml
58+
[pipe.analyze]
59+
PipeLLM = "Analyze text"
60+
output = "Analysis"
61+
llm = { llm_handle = "gpt-4", temperature = 0.7 }
62+
prompt_template = "Analyze this text"
63+
```
64+
65+
2. **Using predefined settings** from `pipelex_libraries/llm_deck/base_llm_deck.toml`:
66+
```toml
67+
[pipe.analyze]
68+
PipeLLM = "Analyze text"
69+
output = "Analysis"
70+
llm = "llm_for_analysis" # References a preset from llm_deck
71+
prompt_template = "Analyze this text"
72+
```
73+
74+
### System Prompts
75+
Add system-level instructions:
76+
```toml
77+
[pipe.expert_analysis]
78+
PipeLLM = "Expert analysis"
79+
output = "Analysis"
80+
system_prompt = "You are a data analysis expert"
81+
prompt_template = "Analyze this data"
82+
```
83+
84+
### Multiple Outputs
85+
Generate multiple results:
86+
```toml
87+
[pipe.generate_ideas]
88+
PipeLLM = "Generate ideas"
89+
output = "Idea"
90+
nb_output = 3 # Generate exactly 3 ideas
91+
# OR
92+
multiple_output = true # Let the LLM decide how many to generate
93+
```
94+
95+
### Vision Tasks
96+
Process images with VLMs:
97+
```toml
98+
[pipe.analyze_image]
99+
PipeLLM = "Analyze image"
100+
inputs = { image = "Image" } # `image` is the name of the stuff that contains the Image. If its in a stuff, you can add something like `{ "page.image": "Image" }
101+
output = "ImageAnalysis"
102+
prompt_template = "Describe what you see in this image"
103+
```
104+
105+
# Important tip
106+
107+
Always run the cli `pipelex validate` when you are finished writing pipelines: This checks for errors. If there are errors, iterate.

.cursor/rules/pipe-ocr.mdc

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
---
2+
description:
3+
globs:
4+
alwaysApply: false
5+
---
6+
# PipeOCR Guide
7+
8+
## Purpose
9+
10+
Extract text and images from an image or a PDF
11+
12+
## Basic Usage
13+
14+
### Simple Text Generation
15+
```toml
16+
[pipe.extract_info]
17+
PipeOcr = "extract the information"
18+
inputs = { ocr_input = "PDF" } # or { ocr_input = "Image" } if its an image. This is the only input
19+
output = "Page"
20+
```
21+
22+
The output concept `Page` is a native concept, with the structure `PageContent`:
23+
It corresponds to 1 page. Therefore, the PipeOcr is outputing a `ListContent` of `Page`
24+
25+
```python
26+
class TextAndImagesContent(StuffContent):
27+
text: Optional[TextContent]
28+
images: Optional[List[ImageContent]]
29+
30+
class PageContent(StructuredContent):
31+
text_and_images: TextAndImagesContent
32+
page_view: Optional[ImageContent] = None
33+
```
34+
- `text_and_images` are the text, and the related images found in the input image or PDF.
35+
- `page_view` is the screenshot of the whole pdf page/image.
36+
37+
# Important tip
38+
39+
Always run the cli `pipelex validate` when you are finished writing pipelines: This checks for errors. If there are errors, iterate.

.cursor/rules/pipe-parallel.mdc

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
---
2+
description:
3+
globs:
4+
alwaysApply: false
5+
---

.cursor/rules/pipe-sequence.mdc

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
---
2+
description:
3+
globs:
4+
alwaysApply: false
5+
---
6+
# PipeSequence Guide
7+
8+
## Purpose
9+
PipeSequence executes multiple pipes in a defined order, where each step can use results from previous steps.
10+
11+
## Basic Structure
12+
```toml
13+
[pipe.your_sequence_name]
14+
PipeSequence = "Description of what this sequence does"
15+
inputs = { input_name = "InputType" } # All the inputs of the sub pipes, except the ones generated by intermediate steps
16+
output = "OutputType"
17+
steps = [
18+
{ pipe = "first_pipe", result = "first_result" },
19+
{ pipe = "second_pipe", result = "second_result" },
20+
{ pipe = "final_pipe", result = "final_result" }
21+
]
22+
```
23+
24+
## Key Components
25+
26+
1. **Steps Array**: List of pipes to execute in sequence
27+
- `pipe`: Name of the pipe to execute
28+
- `result`: Name to assign to the pipe's output that will be in the working memory
29+
30+
2. **Working Memory**: Each step can access:
31+
- Original sequence inputs
32+
- Results from previous steps
33+
- Use the result names in subsequent steps
34+
35+
## Using PipeBatch in Steps
36+
37+
You can use PipeBatch functionality within steps using `batch_over` and `batch_as`:
38+
39+
```toml
40+
steps = [
41+
{ pipe = "process_items", batch_over = "input_list", batch_as = "current_item", result = "processed_items"
42+
}
43+
]
44+
```
45+
46+
1. **batch_over**: Specifies a `ListContent` field to iterate over. Each item in the list will be processed individually and IN PARALLEL by the pipe.
47+
- Must be a `ListContent` type containing the items to process
48+
- Can reference inputs or results from previous steps
49+
50+
2. **batch_as**: Defines the name that will be used to reference the current item being processed
51+
- This name can be used in the pipe's input mappings
52+
- Makes each item from the batch available as a single element
53+
54+
The result of a batched step will be a `ListContent` containing the outputs from processing each item.
55+
56+
# Important tip
57+
58+
Always run the cli `pipelex validate` when you are finished writing pipelines: This checks for errors. If there are errors, iterate.

0 commit comments

Comments
 (0)