You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
### Highlight: Complete documentation overhaul
- **MkDocs** setup for static web docs generation
- **Material** for MkDocs theme, custom styling and navigation
- Other plugins: meta-manager, glightbox
- **GitHub Pages** deployment, mapped to [docs.pipelex.com](http://docs.pipelex.com/)](http://docs.pipelex.com)
- Added GHA workflows for documentation deployment and validation
- **Added to docs:**
- [**[Manifesto](https://docs.pipelex.com/manifesto/)**](https://docs.pipelex.com/manifesto/) explaining the Pipelex viewpoint
- [**[The Pipelex Paradigm](https://docs.pipelex.com/pages/pipelex-paradigm-for-repeatable-ai-workflows/)**](https://docs.pipelex.com/pages/pipelex-paradigm-for-repeatable-ai-workflows/) explaining the fundamentals of Pipelex’s solution
- [[**Cookbook examples](https://docs.pipelex.com/pages/cookbook-examples/)](https://docs.pipelex.com/pages/cookbook-examples/)** presented and explained, commented code, some event with [[mermaid](https://docs.pipelex.com/pages/cookbook-examples/invoice-extractor/)](https://docs.pipelex.com/pages/cookbook-examples/invoice-extractor/) [[flow](https://docs.pipelex.com/pages/cookbook-examples/extract-gantt/)](https://docs.pipelex.com/pages/cookbook-examples/extract-gantt/) [[charts](https://docs.pipelex.com/pages/cookbook-examples/write-tweet/)](https://docs.pipelex.com/pages/cookbook-examples/write-tweet/)
- And plenty of details about **using Pipelex** and **developing for Pipelex,** from **structured generation** to PipeOperators (**LLM**, **Image generation**, **OCR**…) to PipeControllers (**Sequence**, **Parallel**, **Batch**, **Condition**…), workflow **optimization**, workflow static **validation** and dry run… there’s still work to do, but we move fast!
- **Also a major update of Cursor rules**
### Tooling Improvements
- Pipeline tracking: restored **visual flowchart generation using Mermaid**
- Enhanced dry run configuration: added more granular control with `nb_list_items`, `nb_ocr_pages`, and `image_urls`
- New feature flags: better control over pipeline tracking, activity tracking, and reporting
- Improved OCR configuration: handle image file type for Mistral-OCR, added `default_page_views_dpi` setting
- Enhanced LLM configuration: **better prompting for structured generation with automatic schema insertion** for two-step structuring: generate plain text and then structure via Json
- Better logging: Enhanced log truncation and display for large objects like image bytes (there are still cases to deal with)
### Refactor
**Concept system refactoring**
- Improved concept code factory with better domain handling, so you no longer need the `native` domain prefix for native domains, you can just call them by their names: `Text`, `Image`, `PDF`, `Page`, `Number`…
- Concept `refines` attribute can now be a string for single refined concepts (the most common case)
### Breaking Changes
- File structure changes: documentation moved from `doc/` to `docs/`
- Configuration changes: some configuration keys have been renamed or restructured
- `StuffFactory.make_stuff()` argument `concept_code` renamed to `concept_str` to explicitly support concepts without fully qualified domains (e.g., `Text` or `PDF` implicitly `native` )
- Some method signatures have been updated
### Tests
- **Added Concept refinement validation:** `TestConceptRefinesValidationFunction` and `TestConceptPydanticFieldValidation` ensure proper concept inheritance and field validation
Write docs and answer questions about writing docs.
7
+
8
+
We use Material for MkDocs. All markdown in our docs must be compatible with Material for MkDocs and done using best practices to get the best results with Material for MkDocs.
In mosty cases, we only want to use version "latest" and llm_platform_choice "default" in which case the declaration is simply a match of the llm_handle to the llm_name, like this:
27
27
```toml
28
-
gemini-2-5-pro = "gemini-2.5-pro"
28
+
best-claude = "claude-4-opus"
29
+
best-gemini = "gemini-2.5-pro"
30
+
best-mistral = "mistral-large"
29
31
```
30
32
33
+
And of course, llm_handles are automatically assigned for all models by their name, with version "latest" and llm_platform_choice "default".
34
+
31
35
## Using an LLM Handle in a PipeLLM
32
36
33
37
Here is an example of using an llm_handle to specify which LLM to use in a PipeLLM:
@@ -59,7 +63,7 @@ The interest is that these presets can be used to set the LLM choice in a PipeLL
59
63
```toml
60
64
[pipe.extract_invoice]
61
65
PipeLLM = "Extract invoice information from an invoice text transcript"
62
-
input = "InvoiceText"
66
+
inputs = { invoice_text = "InvoiceText" }
63
67
output = "Invoice"
64
68
llm = "llm_to_extract_invoice"
65
69
prompt_template = """
@@ -73,3 +77,6 @@ The category of this invoice is: $invoice_details.category.
73
77
74
78
The setting here `llm = "llm_to_extract_invoice"` works because "llm_to_extract_invoice" has been declared as an llm_preset in the deck.
75
79
You must not use an LLM preset in a PipeLLM that does not exist in the deck. If needed, you can add llm presets.
80
+
81
+
82
+
You can override the predefined llm presets in [overrides.toml](mdc:pipelex/libraries/llm_deck/overrides.toml).
@@ -9,7 +9,7 @@ This rule explains how to build pipes.
9
9
10
10
## The pipelines/ directory
11
11
12
-
- Pipelines and structures are defined in the `pipelex/libraries/` directory
12
+
- Pipelines and structures are defined in the `pipelex/libraries/pipelines` directory
13
13
14
14
pipelex/libraries
15
15
└── pipelines
@@ -40,6 +40,11 @@ pipelex/libraries
40
40
- Never include the usage context in the concept. The concept indicates what the stuff is in itself, it's not "for something", it just is something. e.g. don't define "TextToSummarize": it's just "Text". If you want to refine the concept for instance you can define "Essay".
41
41
- In particular, never define a concept as a plural form: if the context of the pipeline execution makes it multiple, it will just be handled using a ListContent (see below). e.g. don't define a concept for "Stories", just define "Story".
42
42
- Also, avoid including adjectives in concepts, e.g. don't define "LargeText", it's just "Text".
43
+
- Don't redefine the native concepts from [concept_native.py](mdc:pipelex/core/concept_native.py)
44
+
45
+
⚠️ Important ⚠️
46
+
47
+
A Concept MUST NEVER be a plural noun and you should never create a SomeConceptList: lists and arrays are implicitly handled by Pipelex according to the context. Just define SomeConcept.
43
48
44
49
- Define concepts in one of two ways:
45
50
@@ -56,14 +61,14 @@ ConceptName = "Description of the concept"
56
61
[concept.ConceptName]
57
62
Concept = "Description of the concept"
58
63
structure = "StructureName"
59
-
refines = ["ParentConcept"] # Optional, for concept inheritance
64
+
refines = "ParentConcept" # Optional, for concept inheritance
60
65
```
61
66
62
67
About the `structure` field:
63
68
- It's Optional
64
69
- It's the name of the Python BaseModel class used for the concept
65
70
- The class must be a subclass of StuffContent
66
-
- The class must be defined in a python module placed inside the `pipelex/pipelex_libraries/pipelines/structures/` directory
71
+
- The class must be defined in a python module placed inside the `pipelex/libraries/pipelines/` directory
67
72
- If the `structure` field is omitted but a class with the same name as the concept is defined in the structures directory, then it's implicitly applied to the concept
68
73
69
74
About the `refines` field:
@@ -83,7 +88,7 @@ Pipelex provides the following structures natively:
83
88
- MermaidContent
84
89
- LLMPromptContent
85
90
86
-
The native structures are implied when using the native concepts: "native.Text", "native.Number", "native.Image"... do you don't have the state it.
91
+
The native structures are implied when using the native concepts: "Text", "Number", "Image", "PDF",... do you don't have the state it.
87
92
88
93
Some subclasses of StuffContent exist which you should never use directly:
89
94
- ListContent: this is used internally to manipulate a list of stuff, but you should NEVER define a plural concept
@@ -130,7 +135,7 @@ Write a poem about an AI that meets a Software and they fall in l0ve.
130
135
# Example of a PipeLLM that uses @ prefix to insert a block of text
131
136
[pipe.process_text]
132
137
PipeLLM = "Process input text using an LLM"
133
-
input = "Text"
138
+
inputs = { text = "Text" }
134
139
output = "Text"
135
140
llm = "llm_to_summarize_text"
136
141
system_prompt = """
@@ -143,10 +148,10 @@ Summarize the following text:
143
148
144
149
"""
145
150
146
-
# Example of a PipeLLM that uses @ prefix to insert a block of text but also a $ prefix to insert text inline in a sentence (that is teh case for the $topic)
151
+
# Example of a PipeLLM that uses '@' prefix to insert a block of text but also a '$' prefix to insert text inline in a sentence (that is teh case for the $topic)
147
152
[pipe.summarize_topic]
148
153
PipeLLM = "Summarize a dense text with of focus on a specific topic."
149
-
input = "Topic"
154
+
inputs = { topic = "Topic", text = "Text" }
150
155
output = "Summary"
151
156
prompt_template = """
152
157
Your goal is to summarize everything related to $topic in the provided text:
@@ -160,7 +165,7 @@ Your summary should not be longer than 2 sentences.
160
165
# Example of a PipeLLM with image vision processing by a VLM
161
166
[pipe.get_html_table_from_image]
162
167
PipeLLM = "Convert table screenshot to HTML"
163
-
input = "TableScreenshot"
168
+
inputs = { table_screenshot = "TableScreenshot" }
164
169
output = "HtmlTable"
165
170
images = ["table_screenshot"]
166
171
system_prompt = """
@@ -178,7 +183,7 @@ llm = "llm_to_extract_tables"
178
183
# Example of a PipeSequence
179
184
[pipe.answer_question_with_instructions]
180
185
PipeSequence = "Answer a question with instructions"
181
-
input = "Question"
186
+
inputs = { question = "Question" }
182
187
output = "FormattedAnswer"
183
188
steps = [
184
189
{ pipe = "enrich_instructions", result = "instructions", },
@@ -189,7 +194,7 @@ steps = [
189
194
# Example of a PipeParallel
190
195
[pipe.extract_expense_report]
191
196
PipeParallel = "Extract useful information from an expense report"
192
-
input = "ExpenseReportText"
197
+
inputs = { expense_report = "ExpenseReportText" }
193
198
output = "Composite"
194
199
parallels = [
195
200
{ pipe = "extract_employee_from_expense_report", result = "employee" },
Rules to write structure classes for concepts used in pipes with structured generations.
@@ -10,11 +10,14 @@ In particular, these structures are generated by PipeLLM using structured genera
10
10
11
11
## Model Location and Registration
12
12
13
-
- Create models for structured generations related to "some_domain" in `pipelex/pipelex_libraries/pipelines/structures/<some_domain>.py`
13
+
- Create models for structured generations related to "some_domain" in `pipelex_libraries/pipelines/<some_domain>.py`
14
14
- Models must inherit from `StructuredContent` or appropriate content type
15
15
16
16
## Model Structure
17
17
18
+
Concepts and their structure classes are meant to indicate an idea.
19
+
A Concept MUST NEVER be a plural noun and you should never create a SomeConceptList: lists and arrays are implicitly handled by Pipelex according to the context. Just define SomeConcept.
20
+
18
21
```python
19
22
from datetime import datetime
20
23
from typing import List, Optional
@@ -41,6 +44,12 @@ class YourModel(StructuredContent):
41
44
return v.replace(tzinfo=None)
42
45
return v
43
46
```
47
+
## Usage
48
+
49
+
Structures are meant to indicate what class to use for a particular Concept. In general they use the same name as the concept.
50
+
51
+
Structure classes defined within `pipelex_libraries/pipelines/` are automatically loaded into the class_registry when setting up Pipelex, no need to do it manually.
52
+
44
53
45
54
## Best Practices for structures
46
55
@@ -49,4 +58,4 @@ class YourModel(StructuredContent):
49
58
- Use `Field` declaration and write the description
50
59
- Use Pydantic validators for data cleaning/validation
51
60
- Remove timezone info from datetime fields
52
-
- Respect rules of [pydantic.mdc](mdc:.cursor/rules/pydantic.mdc)
0 commit comments