Skip to content

Commit 62064af

Browse files
feature/dry-run (Pipelex#135)
feature/dry-run(Pipelex#135) --------- Co-authored-by: Louis Choquel <lchoquel@users.noreply.github.com> Co-authored-by: Louis Choquel <louis@pipelex.com>
1 parent 9835dba commit 62064af

144 files changed

Lines changed: 5560 additions & 905 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.cursor/rules/best_practices.mdc

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -15,13 +15,9 @@ This document outlines the core best practices and patterns used in our codebase
1515
- Use type hints for all variables where type is not obvious
1616

1717
2. **StrEnum**
18-
- Import StrEnum from pipelex.types
18+
If you want to use StrEnum, import it from `pipelex.types`
1919
```python
2020
from pipelex.types import StrEnum
21-
22-
class ModelType(StrEnum):
23-
GPT4 = "gpt-4"
24-
GPT35 = "gpt-3.5-turbo"
2521
```
2622

2723
## Factory Pattern
@@ -75,3 +71,7 @@ This document outlines the core best practices and patterns used in our codebase
7571
"The fal-client SDK is required to use FAL models."
7672
) from exc
7773
```
74+
75+
## Pipelines
76+
77+
Always run the cli `pipelex validate` when you are finished writing pipelines: This checks for errors. If there are errors, iterate.

.cursor/rules/llms.mdc

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
description: Use LLM models with approrpiate settings. Define LLM handles. Define LLM parameters directly in PipeLLM or through presets.
2+
description:
33
globs:
44
alwaysApply: false
55
---
@@ -41,7 +41,7 @@ Here is an example of using an llm_handle to specify which LLM to use in a PipeL
4141
PipeLLM = "Write text about Hello World."
4242
output = "Text"
4343
llm = { llm_handle = "gpt-4o-mini", temperature = 0.9, max_tokens = "auto" }
44-
prompt = """
44+
prompt_template = """
4545
Write a haiku about Hello World.
4646
"""
4747
```

.cursor/rules/pipe-batch.mdc

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
---
2+
description:
3+
globs:
4+
alwaysApply: false
5+
---
6+
# PipeBatch Controller
7+
8+
The PipeBatch controller allows you to apply a pipe operation to each element in a list of inputs in parallele. It is created via a PipeSequence.
9+
10+
## Usage in TOML Configuration
11+
12+
```toml
13+
[pipe.sequence_with_batch]
14+
PipeSequence = "A Sequence of pipes"
15+
inputs = { input_data = "ConceptName" }
16+
output = "OutputConceptName"
17+
steps = [
18+
{ pipe = "pipe_to_apply", batch_over = "input_list", batch_as = "current_item", result = "batch_results" }
19+
]
20+
```
21+
22+
## Key Parameters
23+
24+
- `pipe`: The pipe operation to apply to each element in the batch
25+
- `batch_over`: The name of the list in the context to iterate over
26+
- `batch_as`: The name to use for the current element in the pipe's context
27+
- `result`: Where to store the results of the batch operation
28+
29+
# Important tip
30+
31+
Always run the cli `pipelex validate` when you are finished writing pipelines: This checks for errors. If there are errors, iterate.

.cursor/rules/pipe-condition.mdc

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
---
2+
description:
3+
globs:
4+
alwaysApply: false
5+
---
6+
# PipeCondition Controller
7+
8+
The PipeCondition controller allows you to implement conditional logic in your pipeline, choosing which pipe to execute based on an evaluated expression. It supports both direct expressions and expression templates.
9+
10+
## Usage in TOML Configuration
11+
12+
### Basic Usage with Direct Expression
13+
14+
```toml
15+
[pipe.conditional_operation]
16+
PipeCondition = "A conditonal pipe to decide wheter..."
17+
inputs = { input_data = "CategoryInput" }
18+
output = "native.Text"
19+
expression = "input_data.category"
20+
21+
[pipe.conditional_operation.pipe_map]
22+
small = "process_small"
23+
medium = "process_medium"
24+
large = "process_large"
25+
```
26+
or
27+
```toml
28+
[pipe.conditional_operation]
29+
PipeCondition = "A conditonal pipe to decide wheter..."
30+
inputs = { input_data = "CategoryInput" }
31+
output = "native.Text"
32+
expression_template = "{{ input_data.category }}" # Jinja2 code
33+
34+
[pipe.conditional_operation.pipe_map]
35+
small = "process_small"
36+
medium = "process_medium"
37+
large = "process_large"
38+
```
39+
40+
## Key Parameters
41+
42+
- `expression`: Direct boolean or string expression (mutually exclusive with expression_template)
43+
- `expression_template`: Jinja2 template for more complex conditional logic (mutually exclusive with expression)
44+
- `pipe_map`: Dictionary mapping expression results to pipe codes :
45+
1 - The key on the left (`small`, `medium`) is the result of `expression` or `expression_template`.
46+
2 - The value on the right (`process_small`, `process_medium`, ..) is the name of the pipce to trigger
47+
48+
# Important tip
49+
50+
Always run the cli `pipelex validate` when you are finished writing pipelines: This checks for errors. If there are errors, iterate.

.cursor/rules/pipe-func.mdc

Whitespace-only changes.

.cursor/rules/pipe-imgg.mdc

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
---
2+
description:
3+
globs:
4+
alwaysApply: false
5+
---

.cursor/rules/pipe-llm.mdc

Lines changed: 107 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,107 @@
1+
---
2+
description:
3+
globs:
4+
alwaysApply: false
5+
---
6+
# PipeLLM Guide
7+
8+
## Purpose
9+
10+
PipeLLM is used to:
11+
1. Generate text or objects with LLMs
12+
2. Process images with Vision LLMs
13+
14+
## Basic Usage
15+
16+
### Simple Text Generation
17+
```toml
18+
[pipe.write_story]
19+
PipeLLM = "Write a short story"
20+
output = "Text"
21+
prompt_template = """
22+
Write a short story about a programmer.
23+
"""
24+
```
25+
26+
### Structured Data Extraction
27+
```toml
28+
[pipe.extract_info]
29+
PipeLLM = "Extract information"
30+
inputs = { text = "Text" }
31+
output = "PersonInfo"
32+
prompt_template = """
33+
Extract person information from this text:
34+
@text
35+
"""
36+
```
37+
38+
### Where to Put Structured Objects
39+
Place your Pydantic models in `pipelex_libraries/pipelines/your_models.py`:
40+
41+
```python
42+
from pipelex.core.stuff_content import StructuredContent
43+
44+
class PersonInfo(StructuredContent): # The output models always have to be subclass of StructuredContent
45+
name: str
46+
age: int
47+
email: str
48+
```
49+
50+
## Advanced Features
51+
52+
### LLM Settings
53+
54+
You can specify LLM settings in two ways:
55+
56+
1. **Direct in the pipe**:
57+
```toml
58+
[pipe.analyze]
59+
PipeLLM = "Analyze text"
60+
output = "Analysis"
61+
llm = { llm_handle = "gpt-4", temperature = 0.7 }
62+
prompt_template = "Analyze this text"
63+
```
64+
65+
2. **Using predefined settings** from `pipelex_libraries/llm_deck/base_llm_deck.toml`:
66+
```toml
67+
[pipe.analyze]
68+
PipeLLM = "Analyze text"
69+
output = "Analysis"
70+
llm = "llm_for_analysis" # References a preset from llm_deck
71+
prompt_template = "Analyze this text"
72+
```
73+
74+
### System Prompts
75+
Add system-level instructions:
76+
```toml
77+
[pipe.expert_analysis]
78+
PipeLLM = "Expert analysis"
79+
output = "Analysis"
80+
system_prompt = "You are a data analysis expert"
81+
prompt_template = "Analyze this data"
82+
```
83+
84+
### Multiple Outputs
85+
Generate multiple results:
86+
```toml
87+
[pipe.generate_ideas]
88+
PipeLLM = "Generate ideas"
89+
output = "Idea"
90+
nb_output = 3 # Generate exactly 3 ideas
91+
# OR
92+
multiple_output = true # Let the LLM decide how many to generate
93+
```
94+
95+
### Vision Tasks
96+
Process images with VLMs:
97+
```toml
98+
[pipe.analyze_image]
99+
PipeLLM = "Analyze image"
100+
inputs = { image = "Image" } # `image` is the name of the stuff that contains the Image. If its in a stuff, you can add something like `{ "page.image": "Image" }
101+
output = "ImageAnalysis"
102+
prompt_template = "Describe what you see in this image"
103+
```
104+
105+
# Important tip
106+
107+
Always run the cli `pipelex validate` when you are finished writing pipelines: This checks for errors. If there are errors, iterate.

.cursor/rules/pipe-ocr.mdc

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
---
2+
description:
3+
globs:
4+
alwaysApply: false
5+
---
6+
# PipeOCR Guide
7+
8+
## Purpose
9+
10+
Extract text and images from an image or a PDF
11+
12+
## Basic Usage
13+
14+
### Simple Text Generation
15+
```toml
16+
[pipe.extract_info]
17+
PipeOcr = "extract the information"
18+
inputs = { ocr_input = "PDF" } # or { ocr_input = "Image" } if its an image. This is the only input
19+
output = "Page"
20+
```
21+
22+
The output concept `Page` is a native concept, with the structure `PageContent`:
23+
It corresponds to 1 page. Therefore, the PipeOcr is outputing a `ListContent` of `Page`
24+
25+
```python
26+
class TextAndImagesContent(StuffContent):
27+
text: Optional[TextContent]
28+
images: Optional[List[ImageContent]]
29+
30+
class PageContent(StructuredContent):
31+
text_and_images: TextAndImagesContent
32+
page_view: Optional[ImageContent] = None
33+
```
34+
- `text_and_images` are the text, and the related images found in the input image or PDF.
35+
- `page_view` is the screenshot of the whole pdf page/image.
36+
37+
# Important tip
38+
39+
Always run the cli `pipelex validate` when you are finished writing pipelines: This checks for errors. If there are errors, iterate.

.cursor/rules/pipe-parallel.mdc

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
---
2+
description:
3+
globs:
4+
alwaysApply: false
5+
---

.cursor/rules/pipe-sequence.mdc

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
---
2+
description:
3+
globs:
4+
alwaysApply: false
5+
---
6+
# PipeSequence Guide
7+
8+
## Purpose
9+
PipeSequence executes multiple pipes in a defined order, where each step can use results from previous steps.
10+
11+
## Basic Structure
12+
```toml
13+
[pipe.your_sequence_name]
14+
PipeSequence = "Description of what this sequence does"
15+
inputs = { input_name = "InputType" } # All the inputs of the sub pipes, except the ones generated by intermediate steps
16+
output = "OutputType"
17+
steps = [
18+
{ pipe = "first_pipe", result = "first_result" },
19+
{ pipe = "second_pipe", result = "second_result" },
20+
{ pipe = "final_pipe", result = "final_result" }
21+
]
22+
```
23+
24+
## Key Components
25+
26+
1. **Steps Array**: List of pipes to execute in sequence
27+
- `pipe`: Name of the pipe to execute
28+
- `result`: Name to assign to the pipe's output that will be in the working memory
29+
30+
2. **Working Memory**: Each step can access:
31+
- Original sequence inputs
32+
- Results from previous steps
33+
- Use the result names in subsequent steps
34+
35+
## Using PipeBatch in Steps
36+
37+
You can use PipeBatch functionality within steps using `batch_over` and `batch_as`:
38+
39+
```toml
40+
steps = [
41+
{ pipe = "process_items", batch_over = "input_list", batch_as = "current_item", result = "processed_items"
42+
}
43+
]
44+
```
45+
46+
1. **batch_over**: Specifies a `ListContent` field to iterate over. Each item in the list will be processed individually and IN PARALLEL by the pipe.
47+
- Must be a `ListContent` type containing the items to process
48+
- Can reference inputs or results from previous steps
49+
50+
2. **batch_as**: Defines the name that will be used to reference the current item being processed
51+
- This name can be used in the pipe's input mappings
52+
- Makes each item from the batch available as a single element
53+
54+
The result of a batched step will be a `ListContent` containing the outputs from processing each item.
55+
56+
# Important tip
57+
58+
Always run the cli `pipelex validate` when you are finished writing pipelines: This checks for errors. If there are errors, iterate.

0 commit comments

Comments
 (0)