Skip to content

Commit df80052

Browse files
authored
Release/v0.4.0 (Pipelex#88)
### Highlight: Complete documentation overhaul - **MkDocs** setup for static web docs generation - **Material** for MkDocs theme, custom styling and navigation - Other plugins: meta-manager, glightbox - **GitHub Pages** deployment, mapped to [docs.pipelex.com](http://docs.pipelex.com/)](http://docs.pipelex.com) - Added GHA workflows for documentation deployment and validation - **Added to docs:** - [**[Manifesto](https://docs.pipelex.com/manifesto/)**](https://docs.pipelex.com/manifesto/) explaining the Pipelex viewpoint - [**[The Pipelex Paradigm](https://docs.pipelex.com/pages/pipelex-paradigm-for-repeatable-ai-workflows/)**](https://docs.pipelex.com/pages/pipelex-paradigm-for-repeatable-ai-workflows/) explaining the fundamentals of Pipelex’s solution - [[**Cookbook examples](https://docs.pipelex.com/pages/cookbook-examples/)](https://docs.pipelex.com/pages/cookbook-examples/)** presented and explained, commented code, some event with [[mermaid](https://docs.pipelex.com/pages/cookbook-examples/invoice-extractor/)](https://docs.pipelex.com/pages/cookbook-examples/invoice-extractor/) [[flow](https://docs.pipelex.com/pages/cookbook-examples/extract-gantt/)](https://docs.pipelex.com/pages/cookbook-examples/extract-gantt/) [[charts](https://docs.pipelex.com/pages/cookbook-examples/write-tweet/)](https://docs.pipelex.com/pages/cookbook-examples/write-tweet/) - And plenty of details about **using Pipelex** and **developing for Pipelex,** from **structured generation** to PipeOperators (**LLM**, **Image generation**, **OCR**…) to PipeControllers (**Sequence**, **Parallel**, **Batch**, **Condition**…), workflow **optimization**, workflow static **validation** and dry run… there’s still work to do, but we move fast! - **Also a major update of Cursor rules** ### Tooling Improvements - Pipeline tracking: restored **visual flowchart generation using Mermaid** - Enhanced dry run configuration: added more granular control with `nb_list_items`, `nb_ocr_pages`, and `image_urls` - New feature flags: better control over pipeline tracking, activity tracking, and reporting - Improved OCR configuration: handle image file type for Mistral-OCR, added `default_page_views_dpi` setting - Enhanced LLM configuration: **better prompting for structured generation with automatic schema insertion** for two-step structuring: generate plain text and then structure via Json - Better logging: Enhanced log truncation and display for large objects like image bytes (there are still cases to deal with) ### Refactor **Concept system refactoring** - Improved concept code factory with better domain handling, so you no longer need the `native` domain prefix for native domains, you can just call them by their names: `Text`, `Image`, `PDF`, `Page`, `Number`… - Concept `refines` attribute can now be a string for single refined concepts (the most common case) ### Breaking Changes - File structure changes: documentation moved from `doc/` to `docs/` - Configuration changes: some configuration keys have been renamed or restructured - `StuffFactory.make_stuff()` argument `concept_code` renamed to `concept_str` to explicitly support concepts without fully qualified domains (e.g., `Text` or `PDF` implicitly `native` ) - Some method signatures have been updated ### Tests - **Added Concept refinement validation:** `TestConceptRefinesValidationFunction` and `TestConceptPydanticFieldValidation` ensure proper concept inheritance and field validation
1 parent 9ca4e6b commit df80052

162 files changed

Lines changed: 6335 additions & 1373 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.cursor/rules/docs.mdc

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
---
2+
description:
3+
globs: docs/**/*.md
4+
alwaysApply: false
5+
---
6+
Write docs and answer questions about writing docs.
7+
8+
We use Material for MkDocs. All markdown in our docs must be compatible with Material for MkDocs and done using best practices to get the best results with Material for MkDocs.

.cursor/rules/llms.mdc

Lines changed: 15 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,10 @@ alwaysApply: false
55
---
66
# Rules to choose LLM models used in PipeLLMs.
77

8+
## LLM Handles
9+
810
In order to use it in a pipe, an LLM is referenced by its llm_handle and possibly by an llm_preset.
9-
Both llm_handles and llm_presets are defined in this toml config file: [config_pipelex_llm_deck.toml](mdc:pipelex/config_pipelex_llm_deck.toml)
11+
Both llm_handles and llm_presets are defined in this toml config file: [base_llm_deck.toml](mdc:pipelex/libraries/llm_deck/base_llm_deck.toml)
1012

1113
## LLM Handles
1214

@@ -15,19 +17,21 @@ An llm_handle matches the handle (an id of sorts) with the full specification of
1517
- llm_version
1618
- llm_platform_choice
1719

18-
The declaration looks like this in toml syntax:
20+
The declaration of llm_handleslooks like this in toml syntax:
1921
```toml
20-
[cogt.llm_config.llm_deck.llm_handle_to_llm_engine_blueprint.gpt-4o-2024-05-13]
21-
llm_name = "gpt-4o"
22-
llm_version = "2024-05-13"
23-
llm_platform_choice = "openai"
22+
[llm_handles]
23+
gpt-4o-2024-08-06 = { llm_name = "gpt-4o", llm_version = "2024-08-06" }
2424
```
2525

2626
In mosty cases, we only want to use version "latest" and llm_platform_choice "default" in which case the declaration is simply a match of the llm_handle to the llm_name, like this:
2727
```toml
28-
gemini-2-5-pro = "gemini-2.5-pro"
28+
best-claude = "claude-4-opus"
29+
best-gemini = "gemini-2.5-pro"
30+
best-mistral = "mistral-large"
2931
```
3032

33+
And of course, llm_handles are automatically assigned for all models by their name, with version "latest" and llm_platform_choice "default".
34+
3135
## Using an LLM Handle in a PipeLLM
3236

3337
Here is an example of using an llm_handle to specify which LLM to use in a PipeLLM:
@@ -59,7 +63,7 @@ The interest is that these presets can be used to set the LLM choice in a PipeLL
5963
```toml
6064
[pipe.extract_invoice]
6165
PipeLLM = "Extract invoice information from an invoice text transcript"
62-
input = "InvoiceText"
66+
inputs = { invoice_text = "InvoiceText" }
6367
output = "Invoice"
6468
llm = "llm_to_extract_invoice"
6569
prompt_template = """
@@ -73,3 +77,6 @@ The category of this invoice is: $invoice_details.category.
7377

7478
The setting here `llm = "llm_to_extract_invoice"` works because "llm_to_extract_invoice" has been declared as an llm_preset in the deck.
7579
You must not use an LLM preset in a PipeLLM that does not exist in the deck. If needed, you can add llm presets.
80+
81+
82+
You can override the predefined llm presets in [overrides.toml](mdc:pipelex/libraries/llm_deck/overrides.toml).

.cursor/rules/pipes.mdc

Lines changed: 17 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
description:
3-
globs: **/pipelex_libraries/pipelines/toml/**/*.toml
3+
globs: **/pipelex/libraries/pipelines/**/*.toml
44
alwaysApply: false
55
---
66
This rule explains how to build pipes.
@@ -9,7 +9,7 @@ This rule explains how to build pipes.
99

1010
## The pipelines/ directory
1111

12-
- Pipelines and structures are defined in the `pipelex/libraries/` directory
12+
- Pipelines and structures are defined in the `pipelex/libraries/pipelines` directory
1313

1414
pipelex/libraries
1515
└── pipelines
@@ -40,6 +40,11 @@ pipelex/libraries
4040
- Never include the usage context in the concept. The concept indicates what the stuff is in itself, it's not "for something", it just is something. e.g. don't define "TextToSummarize": it's just "Text". If you want to refine the concept for instance you can define "Essay".
4141
- In particular, never define a concept as a plural form: if the context of the pipeline execution makes it multiple, it will just be handled using a ListContent (see below). e.g. don't define a concept for "Stories", just define "Story".
4242
- Also, avoid including adjectives in concepts, e.g. don't define "LargeText", it's just "Text".
43+
- Don't redefine the native concepts from [concept_native.py](mdc:pipelex/core/concept_native.py)
44+
45+
⚠️ Important ⚠️
46+
47+
A Concept MUST NEVER be a plural noun and you should never create a SomeConceptList: lists and arrays are implicitly handled by Pipelex according to the context. Just define SomeConcept.
4348

4449
- Define concepts in one of two ways:
4550

@@ -56,14 +61,14 @@ ConceptName = "Description of the concept"
5661
[concept.ConceptName]
5762
Concept = "Description of the concept"
5863
structure = "StructureName"
59-
refines = ["ParentConcept"] # Optional, for concept inheritance
64+
refines = "ParentConcept" # Optional, for concept inheritance
6065
```
6166

6267
About the `structure` field:
6368
- It's Optional
6469
- It's the name of the Python BaseModel class used for the concept
6570
- The class must be a subclass of StuffContent
66-
- The class must be defined in a python module placed inside the `pipelex/pipelex_libraries/pipelines/structures/` directory
71+
- The class must be defined in a python module placed inside the `pipelex/libraries/pipelines/` directory
6772
- If the `structure` field is omitted but a class with the same name as the concept is defined in the structures directory, then it's implicitly applied to the concept
6873

6974
About the `refines` field:
@@ -83,7 +88,7 @@ Pipelex provides the following structures natively:
8388
- MermaidContent
8489
- LLMPromptContent
8590

86-
The native structures are implied when using the native concepts: "native.Text", "native.Number", "native.Image"... do you don't have the state it.
91+
The native structures are implied when using the native concepts: "Text", "Number", "Image", "PDF",... do you don't have the state it.
8792

8893
Some subclasses of StuffContent exist which you should never use directly:
8994
- ListContent: this is used internally to manipulate a list of stuff, but you should NEVER define a plural concept
@@ -130,7 +135,7 @@ Write a poem about an AI that meets a Software and they fall in l0ve.
130135
# Example of a PipeLLM that uses @ prefix to insert a block of text
131136
[pipe.process_text]
132137
PipeLLM = "Process input text using an LLM"
133-
input = "Text"
138+
inputs = { text = "Text" }
134139
output = "Text"
135140
llm = "llm_to_summarize_text"
136141
system_prompt = """
@@ -143,10 +148,10 @@ Summarize the following text:
143148

144149
"""
145150

146-
# Example of a PipeLLM that uses @ prefix to insert a block of text but also a $ prefix to insert text inline in a sentence (that is teh case for the $topic)
151+
# Example of a PipeLLM that uses '@' prefix to insert a block of text but also a '$' prefix to insert text inline in a sentence (that is teh case for the $topic)
147152
[pipe.summarize_topic]
148153
PipeLLM = "Summarize a dense text with of focus on a specific topic."
149-
input = "Topic"
154+
inputs = { topic = "Topic", text = "Text" }
150155
output = "Summary"
151156
prompt_template = """
152157
Your goal is to summarize everything related to $topic in the provided text:
@@ -160,7 +165,7 @@ Your summary should not be longer than 2 sentences.
160165
# Example of a PipeLLM with image vision processing by a VLM
161166
[pipe.get_html_table_from_image]
162167
PipeLLM = "Convert table screenshot to HTML"
163-
input = "TableScreenshot"
168+
inputs = { table_screenshot = "TableScreenshot" }
164169
output = "HtmlTable"
165170
images = ["table_screenshot"]
166171
system_prompt = """
@@ -178,7 +183,7 @@ llm = "llm_to_extract_tables"
178183
# Example of a PipeSequence
179184
[pipe.answer_question_with_instructions]
180185
PipeSequence = "Answer a question with instructions"
181-
input = "Question"
186+
inputs = { question = "Question" }
182187
output = "FormattedAnswer"
183188
steps = [
184189
{ pipe = "enrich_instructions", result = "instructions", },
@@ -189,7 +194,7 @@ steps = [
189194
# Example of a PipeParallel
190195
[pipe.extract_expense_report]
191196
PipeParallel = "Extract useful information from an expense report"
192-
input = "ExpenseReportText"
197+
inputs = { expense_report = "ExpenseReportText" }
193198
output = "Composite"
194199
parallels = [
195200
{ pipe = "extract_employee_from_expense_report", result = "employee" },
@@ -199,7 +204,7 @@ parallels = [
199204
# Example of a PipeCondition
200205
[pipe.expense_conditional_validation]
201206
PipeCondition = "Choose the rules to apply"
202-
input = "Expense"
207+
inputs = { expense = "Expense" }
203208
output = "RulesToApply"
204209
expression = "expense_category.category"
205210
```

.cursor/rules/structures.mdc

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
description:
3-
globs: **/pipelex_libraries/pipelines/structures/**/*.py
3+
globs: pipelex/libraries/pipelines/**/*.py
44
alwaysApply: false
55
---
66
Rules to write structure classes for concepts used in pipes with structured generations.
@@ -10,11 +10,14 @@ In particular, these structures are generated by PipeLLM using structured genera
1010

1111
## Model Location and Registration
1212

13-
- Create models for structured generations related to "some_domain" in `pipelex/pipelex_libraries/pipelines/structures/<some_domain>.py`
13+
- Create models for structured generations related to "some_domain" in `pipelex_libraries/pipelines/<some_domain>.py`
1414
- Models must inherit from `StructuredContent` or appropriate content type
1515

1616
## Model Structure
1717

18+
Concepts and their structure classes are meant to indicate an idea.
19+
A Concept MUST NEVER be a plural noun and you should never create a SomeConceptList: lists and arrays are implicitly handled by Pipelex according to the context. Just define SomeConcept.
20+
1821
```python
1922
from datetime import datetime
2023
from typing import List, Optional
@@ -41,6 +44,12 @@ class YourModel(StructuredContent):
4144
return v.replace(tzinfo=None)
4245
return v
4346
```
47+
## Usage
48+
49+
Structures are meant to indicate what class to use for a particular Concept. In general they use the same name as the concept.
50+
51+
Structure classes defined within `pipelex_libraries/pipelines/` are automatically loaded into the class_registry when setting up Pipelex, no need to do it manually.
52+
4453

4554
## Best Practices for structures
4655

@@ -49,4 +58,4 @@ class YourModel(StructuredContent):
4958
- Use `Field` declaration and write the description
5059
- Use Pydantic validators for data cleaning/validation
5160
- Remove timezone info from datetime fields
52-
- Respect rules of [pydantic.mdc](mdc:.cursor/rules/pydantic.mdc)
61+
- Respect rules of @base_models.

.github/workflows/deploy-doc.yml

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
# .github/workflows/docs-preview.yml
2+
name: Preview MkDocs on GitHub Pages
3+
4+
on:
5+
push:
6+
branches: [main]
7+
8+
jobs:
9+
preview:
10+
runs-on: ubuntu-latest
11+
steps:
12+
- uses: actions/checkout@v4
13+
14+
- uses: actions/setup-python@v5
15+
with:
16+
python-version: '3.x'
17+
18+
- name: Build docs
19+
run: |
20+
pip install mkdocs==1.6.1 mkdocs-material==9.6.14 mkdocs-glightbox==0.4.0 mkdocs-meta-manager==1.1.0
21+
mkdocs build --strict # generates ./site
22+
23+
- name: Publish to *gh-pages-preview*
24+
uses: peaceiris/actions-gh-pages@v4
25+
with:
26+
github_token: ${{ secrets.GITHUB_TOKEN }}
27+
publish_dir: ./site
28+
publish_branch: gh-pages-preview # <─ test branch
29+
# put each branch in its own folder so multiple previews can coexist
30+
destination_dir: ${{ github.ref_name }}
31+
force_orphan: true

.github/workflows/doc-check.yml

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
name: Documentation Check
2+
3+
on:
4+
pull_request:
5+
branches:
6+
- main
7+
- dev
8+
- "release/v[0-9]+.[0-9]+.[0-9]+"
9+
paths:
10+
- 'docs/**'
11+
- 'mkdocs.yml'
12+
13+
jobs:
14+
doc-check:
15+
runs-on: ubuntu-latest
16+
steps:
17+
- uses: actions/checkout@v4
18+
19+
- name: Set up Python
20+
uses: actions/setup-python@v5
21+
with:
22+
python-version: '3.11'
23+
cache: 'pip'
24+
25+
- name: Install mkdocs
26+
run: |
27+
python -m pip install --upgrade pip
28+
pip install mkdocs==1.6.1 mkdocs-material==9.6.14 mkdocs-glightbox==0.4.0 mkdocs-meta-manager==1.1.0
29+
30+
- name: Check documentation build
31+
run: mkdocs build --strict

.github/workflows/guard-branches.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -36,8 +36,8 @@ jobs:
3636
if [[ "$HEAD" == "dev" ]]; then
3737
exit 0
3838
fi
39-
if [[ ! "$HEAD" =~ ^(fix|feature|refactor|chore|doc|ci-cd|changelog)\/[A-Za-z0-9._\/\<\>\=\-]+$ ]]; then
40-
echo "::error::Branch must start with fix/, feature/, refactor/, chore/, doc/, or ci-cd/."
39+
if [[ ! "$HEAD" =~ ^(fix|feature|refactor|chore|doc|ci-cd|changelog|codex)\/[A-Za-z0-9._\/\<\>\=\-]+$ ]]; then
40+
echo "::error::Branch must start with fix/, feature/, refactor/, chore/, docs/, or ci-cd/."
4141
exit 1
4242
fi
4343

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,9 @@ dist/
1010
build/
1111
*.egg-info/
1212

13+
# mkdocs
14+
site/
15+
1316
# Unit test / coverage reports
1417
htmlcov/
1518
.pytest_cache/

.vscode/settings.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,5 +19,6 @@
1919
"tests"
2020
],
2121
"python.testing.unittestEnabled": false,
22-
"python.testing.pytestEnabled": true
22+
"python.testing.pytestEnabled": true,
23+
"djlint.showInstallError": false
2324
}

0 commit comments

Comments
 (0)