Skip to content

Commit d155426

Browse files
Release/v0.3.0 (Pipelex#71)
### Highlights - **Structured Input Specifications**: Pipe inputs are now defined as a dictionary mapping a required variable name to a concept code (`required_variable` -> `concept_code`). This replaces the previous single `input` field and allows for multiple, named inputs, making pipes more powerful and explicit. This is a **breaking change**. - **Static Validation for Inference Pipes**: You can now catch configuration and input mistakes in your pipelines *before* running any operations. This static validation checks `PipeLLM`, `PipeOcr`, and `PipeImgGen`. Static validation for controller pipes (PipeSequence, PipeParallel…) will come in a future release. - Configure the behavior for different error types using the `static_validation_config` section in your settings. For each error type, choose to `raise`, `log`, or `ignore`. - **Dry Run Mode for Zero-Cost Pipeline Validation**: A powerful dry-run mode allows you to test entire pipelines without making any actual inference calls. It's fast, costs nothing, works offline, and is perfect for linting and validating pipeline logic. - The new `dry_run_config` lets you control settings, like disabling Jinja2 rendering during a dry run. - This feature leverages `polyfactory` to generate mock Pydantic models for simulated outputs. - Error handling for bad inputs during `run_pipe` has been improved and is fully effective in dry-run mode. - One limitation: currently, dry running doesn't work when the pipeline uses a PipeCondition. This will be fixed in a future release. ### Added - **`native.Anything` Concept**: A new flexible native concept that is compatible with any other concept, simplifying pipe definitions where input types can vary. - Added dependency on `polyfactory` for mock Pydantic model generation in dry-run mode. ### Changed - **Refactored Cognitive Workers**: The abstraction for `LLM`, `Imgg`, and `Ocr` workers has been elegantly simplified. The old decorator-based approach (`..._job_func`) has been replaced with a more robust pattern: a public base method now handles pre- and post-execution logic while calling a private abstract method that each worker implements. - The `b64_image_bytes` field in `PromptImageBytes` was renamed to `base_64` for better consistency. ### Fixed - Resolved a logged error related to the pipe stack when using `PipeParallel`. - The pipe tracker functionality has been restored. It no longer crashes when using nested object attributes (e.g., `my_object.attribute`) as pipe inputs. ### Tests - A new pytest command-line option `--pipe-run-mode` has been added to switch between `live` and `dry` runs (default is `dry`). All pipe tests now respect this mode. - Introduced the `pipelex_api` pytest marker for tests related to the Pipelex API client, separating them from general `inference` or `llm` tests. - Added a `make test-pipelex-api` target (shorthand: `make ta`) to exclusively run these new API client tests. ### Removed - The `llm_job_func.py` file and the associated decorators have been removed as part of the cognitive worker refactoring. --------- Co-authored-by: Louis Choquel <louis@pipelex.com>
1 parent e4b61e7 commit d155426

117 files changed

Lines changed: 2303 additions & 1932 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

CHANGELOG.md

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,43 @@
11
# Changelog
22

3+
## [v0.3.0] - 2025-06-10
4+
5+
### Highlights
6+
7+
- **Structured Input Specifications**: Pipe inputs are now defined as a dictionary mapping a required variable name to a concept code (`required_variable` -> `concept_code`). This replaces the previous single `input` field and allows for multiple, named inputs, making pipes more powerful and explicit. This is a **breaking change**.
8+
- **Static Validation for Inference Pipes**: You can now catch configuration and input mistakes in your pipelines *before* running any operations. This static validation checks `PipeLLM`, `PipeOcr`, and `PipeImgGen`. Static validation for controller pipes (PipeSequence, PipeParallel…) will come in a future release.
9+
- Configure the behavior for different error types using the `static_validation_config` section in your settings. For each error type, choose to `raise`, `log`, or `ignore`.
10+
- **Dry Run Mode for Zero-Cost Pipeline Validation**: A powerful dry-run mode allows you to test entire pipelines without making any actual inference calls. It's fast, costs nothing, works offline, and is perfect for linting and validating pipeline logic.
11+
- The new `dry_run_config` lets you control settings, like disabling Jinja2 rendering during a dry run.
12+
- This feature leverages `polyfactory` to generate mock Pydantic models for simulated outputs.
13+
- Error handling for bad inputs during `run_pipe` has been improved and is fully effective in dry-run mode.
14+
- One limitation: currently, dry running doesn't work when the pipeline uses a PipeCondition. This will be fixed in a future release.
15+
16+
### Added
17+
18+
- **`native.Anything` Concept**: A new flexible native concept that is compatible with any other concept, simplifying pipe definitions where input types can vary.
19+
- Added dependency on `polyfactory` for mock Pydantic model generation in dry-run mode.
20+
21+
### Changed
22+
23+
- **Refactored Cognitive Workers**: The abstraction for `LLM`, `Imgg`, and `Ocr` workers has been elegantly simplified. The old decorator-based approach (`..._job_func`) has been replaced with a more robust pattern: a public base method now handles pre- and post-execution logic while calling a private abstract method that each worker implements.
24+
- The `b64_image_bytes` field in `PromptImageBytes` was renamed to `base_64` for better consistency.
25+
26+
### Fixed
27+
28+
- Resolved a logged error related to the pipe stack when using `PipeParallel`.
29+
- The pipe tracker functionality has been restored. It no longer crashes when using nested object attributes (e.g., `my_object.attribute`) as pipe inputs.
30+
31+
### Tests
32+
33+
- A new pytest command-line option `--pipe-run-mode` has been added to switch between `live` and `dry` runs (default is `dry`). All pipe tests now respect this mode.
34+
- Introduced the `pipelex_api` pytest marker for tests related to the Pipelex API client, separating them from general `inference` or `llm` tests.
35+
- Added a `make test-pipelex-api` target (shorthand: `make ta`) to exclusively run these new API client tests.
36+
37+
### Removed
38+
39+
- The `llm_job_func.py` file and the associated decorators have been removed as part of the cognitive worker refactoring.
40+
341
## [v0.2.14] - 2025-06-06
442

543
- Added a feature flag for the `ReportingManager` in the config:

Makefile

Lines changed: 14 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -203,12 +203,12 @@ cleanall: cleanderived cleanenv cleanlibraries
203203
codex-tests: env
204204
$(call PRINT_TITLE,"Unit testing for Codex")
205205
@echo "• Running unit tests for Codex (excluding inference and codex_disabled)"
206-
$(VENV_PYTEST) --exitfirst --quiet -m "not inference and not codex_disabled" || [ $$? = 5 ]
206+
$(VENV_PYTEST) --exitfirst --quiet -m "not (inference or codex_disabled or pipelex_api)" || [ $$? = 5 ]
207207

208208
gha-tests: env
209209
$(call PRINT_TITLE,"Unit testing for github actions")
210210
@echo "• Running unit tests for github actions (excluding inference and gha_disabled)"
211-
$(VENV_PYTEST) --exitfirst --quiet -m "not inference and not gha_disabled" || [ $$? = 5 ]
211+
$(VENV_PYTEST) --exitfirst --quiet -m "not (inference or gha_disabled or pipelex_api)" || [ $$? = 5 ]
212212

213213
run-all-tests: env
214214
$(call PRINT_TITLE,"Running all unit tests")
@@ -218,7 +218,7 @@ run-all-tests: env
218218
run-manual-trigger-gha-tests: env
219219
$(call PRINT_TITLE,"Running GHA tests")
220220
@echo "• Running GHA unit tests for inference, llm, and not gha_disabled"
221-
$(VENV_PYTEST) --exitfirst --quiet -m "not gha_disabled and (inference or llm)" || [ $$? = 5 ]
221+
$(VENV_PYTEST) --exitfirst --quiet -m "not (gha_disabled or pipelex_api) and (inference or llm)" || [ $$? = 5 ]
222222

223223
run-gha_disabled-tests: env
224224
$(call PRINT_TITLE,"Running GHA disabled tests")
@@ -303,6 +303,17 @@ test-imgg: env
303303
tg: test-imgg
304304
@echo "> done: tg = test-imgg"
305305

306+
test-pipelex-api: env
307+
$(call PRINT_TITLE,"Unit testing")
308+
@if [ -n "$(TEST)" ]; then \
309+
$(VENV_PYTEST) --exitfirst -m "pipelex_api" -s -k "$(TEST)" $(if $(filter 1,$(VERBOSE)),-v,$(if $(filter 2,$(VERBOSE)),-vv,$(if $(filter 3,$(VERBOSE)),-vvv,))); \
310+
else \
311+
$(VENV_PYTEST) --exitfirst -m "pipelex_api" -s $(if $(filter 1,$(VERBOSE)),-v,$(if $(filter 2,$(VERBOSE)),-vv,$(if $(filter 3,$(VERBOSE)),-vvv,))); \
312+
fi
313+
314+
ta: test-pipelex-api
315+
@echo "> done: ta = test-pipelex-api"
316+
306317
############################################################################################
307318
############################ Linting ############################
308319
############################################################################################

pipelex/cli/_cli.py

Lines changed: 17 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -137,22 +137,23 @@ def _format_concept_code(concept_code: Optional[str], current_domain: str) -> st
137137
pipes_dict[domain] = {}
138138

139139
for pipe in domain_pipes:
140-
if pipe.code:
141-
input_code = _format_concept_code(pipe.input_concept_code, domain)
142-
output_code = _format_concept_code(pipe.output_concept_code, domain)
143-
144-
table.add_row(
145-
pipe.code,
146-
pipe.definition or "",
147-
input_code,
148-
output_code,
149-
)
150-
151-
pipes_dict[domain][pipe.code] = {
152-
"definition": pipe.definition or "",
153-
"input": pipe.input_concept_code or "",
154-
"output": pipe.output_concept_code or "",
155-
}
140+
inputs = pipe.inputs
141+
formatted_inputs = [f"{name}: {_format_concept_code(concept_code, domain)}" for name, concept_code in inputs.items]
142+
formatted_inputs_str = ", ".join(formatted_inputs)
143+
output_code = _format_concept_code(pipe.output_concept_code, domain)
144+
145+
table.add_row(
146+
pipe.code,
147+
pipe.definition or "",
148+
formatted_inputs_str,
149+
output_code,
150+
)
151+
152+
pipes_dict[domain][pipe.code] = {
153+
"definition": pipe.definition or "",
154+
"inputs": formatted_inputs_str,
155+
"output": pipe.output_concept_code,
156+
}
156157

157158
pretty_print(table)
158159

Lines changed: 249 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,249 @@
1+
from typing import Any, Dict, List, Optional, Type
2+
3+
from polyfactory.factories.pydantic_factory import ModelFactory
4+
from typing_extensions import override
5+
6+
from pipelex import log
7+
from pipelex.cogt.content_generation.content_generator_protocol import ContentGeneratorProtocol, update_job_metadata
8+
from pipelex.cogt.image.generated_image import GeneratedImage
9+
from pipelex.cogt.imgg.imgg_handle import ImggHandle
10+
from pipelex.cogt.imgg.imgg_job_components import ImggJobConfig, ImggJobParams
11+
from pipelex.cogt.imgg.imgg_prompt import ImggPrompt
12+
from pipelex.cogt.llm.llm_models.llm_setting import LLMSetting
13+
from pipelex.cogt.llm.llm_prompt import LLMPrompt
14+
from pipelex.cogt.llm.llm_prompt_factory_abstract import LLMPromptFactoryAbstract
15+
from pipelex.cogt.ocr.ocr_handle import OcrHandle
16+
from pipelex.cogt.ocr.ocr_input import OcrInput
17+
from pipelex.cogt.ocr.ocr_job_components import OcrJobConfig, OcrJobParams
18+
from pipelex.cogt.ocr.ocr_output import ExtractedImageFromPage, OcrOutput, Page
19+
from pipelex.config import get_config
20+
from pipelex.pipeline.job_metadata import JobMetadata
21+
from pipelex.tools.templating.jinja2_environment import Jinja2TemplateCategory
22+
from pipelex.tools.templating.templating_models import PromptingStyle
23+
from pipelex.tools.typing.pydantic_utils import BaseModelTypeVar
24+
25+
26+
class ContentGeneratorDry(ContentGeneratorProtocol):
27+
"""
28+
This class is used to generate mock content for testing purposes.
29+
It does not use any inference.
30+
"""
31+
32+
@property
33+
def _text_gen_truncate_length(self) -> int:
34+
return get_config().pipelex.dry_run_config.text_gen_truncate_length
35+
36+
@override
37+
@update_job_metadata
38+
async def make_llm_text( # pyright: ignore[reportIncompatibleMethodOverride]
39+
self,
40+
job_metadata: JobMetadata,
41+
llm_setting_main: LLMSetting,
42+
llm_prompt_for_text: LLMPrompt,
43+
wfid: Optional[str] = None,
44+
) -> str:
45+
func_name = "make_llm_text"
46+
log.dev(f"🤡 DRY RUN: {self.__class__.__name__}.{func_name}")
47+
prompt_truncated = llm_prompt_for_text.desc(truncate_text_length=self._text_gen_truncate_length)
48+
generated_text = f"DRY RUN: {func_name} • llm_setting={llm_setting_main.desc()} • prompt={prompt_truncated}"
49+
return generated_text
50+
51+
@override
52+
@update_job_metadata
53+
async def make_object_direct( # pyright: ignore[reportIncompatibleMethodOverride]
54+
self,
55+
job_metadata: JobMetadata,
56+
object_class: Type[BaseModelTypeVar],
57+
llm_setting_for_object: LLMSetting,
58+
llm_prompt_for_object: LLMPrompt,
59+
wfid: Optional[str] = None,
60+
) -> BaseModelTypeVar:
61+
func_name = "make_object_direct"
62+
log.dev(f"🤡 DRY RUN: {self.__class__.__name__}.{func_name}")
63+
64+
class ObjectFactory(ModelFactory[object_class]): # type: ignore
65+
__model__ = object_class
66+
67+
obj = ObjectFactory.build()
68+
return obj
69+
70+
@override
71+
@update_job_metadata
72+
async def make_text_then_object( # pyright: ignore[reportIncompatibleMethodOverride]
73+
self,
74+
job_metadata: JobMetadata,
75+
object_class: Type[BaseModelTypeVar],
76+
llm_setting_main: LLMSetting,
77+
llm_setting_for_object: LLMSetting,
78+
llm_prompt_for_text: LLMPrompt,
79+
llm_prompt_factory_for_object: Optional[LLMPromptFactoryAbstract] = None,
80+
wfid: Optional[str] = None,
81+
) -> BaseModelTypeVar:
82+
func_name = "make_text_then_object"
83+
log.dev(f"🤡 DRY RUN: {self.__class__.__name__}.{func_name}")
84+
return await self.make_object_direct(
85+
job_metadata=job_metadata,
86+
object_class=object_class,
87+
llm_setting_for_object=llm_setting_for_object,
88+
llm_prompt_for_object=llm_prompt_for_text,
89+
)
90+
91+
@override
92+
@update_job_metadata
93+
async def make_object_list_direct( # pyright: ignore[reportIncompatibleMethodOverride]
94+
self,
95+
job_metadata: JobMetadata,
96+
object_class: Type[BaseModelTypeVar],
97+
llm_setting_for_object_list: LLMSetting,
98+
llm_prompt_for_object_list: LLMPrompt,
99+
wfid: Optional[str] = None,
100+
) -> List[BaseModelTypeVar]:
101+
func_name = "make_object_list_direct"
102+
log.dev(f"🤡 DRY RUN: {self.__class__.__name__}.{func_name}")
103+
object_1 = await self.make_object_direct(
104+
job_metadata=job_metadata,
105+
object_class=object_class,
106+
llm_setting_for_object=llm_setting_for_object_list,
107+
llm_prompt_for_object=llm_prompt_for_object_list,
108+
)
109+
object_2 = await self.make_object_direct(
110+
job_metadata=job_metadata,
111+
object_class=object_class,
112+
llm_setting_for_object=llm_setting_for_object_list,
113+
llm_prompt_for_object=llm_prompt_for_object_list,
114+
)
115+
two_objects = [object_1, object_2]
116+
return two_objects
117+
118+
@override
119+
@update_job_metadata
120+
async def make_text_then_object_list( # pyright: ignore[reportIncompatibleMethodOverride]
121+
self,
122+
job_metadata: JobMetadata,
123+
object_class: Type[BaseModelTypeVar],
124+
llm_setting_main: LLMSetting,
125+
llm_setting_for_object_list: LLMSetting,
126+
llm_prompt_for_text: LLMPrompt,
127+
llm_prompt_factory_for_object_list: Optional[LLMPromptFactoryAbstract] = None,
128+
wfid: Optional[str] = None,
129+
) -> List[BaseModelTypeVar]:
130+
func_name = "make_text_then_object_list"
131+
log.dev(f"🤡 DRY RUN: {self.__class__.__name__}.{func_name}")
132+
return await self.make_object_list_direct(
133+
job_metadata=job_metadata,
134+
object_class=object_class,
135+
llm_setting_for_object_list=llm_setting_for_object_list,
136+
llm_prompt_for_object_list=llm_prompt_for_text,
137+
)
138+
139+
@override
140+
@update_job_metadata
141+
async def make_single_image( # pyright: ignore[reportIncompatibleMethodOverride]
142+
self,
143+
job_metadata: JobMetadata,
144+
imgg_handle: ImggHandle,
145+
imgg_prompt: ImggPrompt,
146+
imgg_job_params: Optional[ImggJobParams] = None,
147+
imgg_job_config: Optional[ImggJobConfig] = None,
148+
wfid: Optional[str] = None,
149+
) -> GeneratedImage:
150+
func_name = "make_single_image"
151+
log.dev(f"🤡 DRY RUN: {self.__class__.__name__}.{func_name}")
152+
generated_image = GeneratedImage(
153+
url="https://storage.googleapis.com/public_test_files_7fa6_4277_9ab/fashion/fashion_photo_1.jpg",
154+
width=1536,
155+
height=2752,
156+
)
157+
return generated_image
158+
159+
@override
160+
@update_job_metadata
161+
async def make_image_list( # pyright: ignore[reportIncompatibleMethodOverride]
162+
self,
163+
job_metadata: JobMetadata,
164+
imgg_handle: ImggHandle,
165+
imgg_prompt: ImggPrompt,
166+
nb_images: int,
167+
imgg_job_params: Optional[ImggJobParams] = None,
168+
imgg_job_config: Optional[ImggJobConfig] = None,
169+
wfid: Optional[str] = None,
170+
) -> List[GeneratedImage]:
171+
func_name = "make_image_list"
172+
log.dev(f"🤡 DRY RUN: {self.__class__.__name__}.{func_name}")
173+
generated_image_list = [
174+
GeneratedImage(
175+
url="https://storage.googleapis.com/public_test_files_7fa6_4277_9ab/fashion/fashion_photo_1.jpg",
176+
width=1536,
177+
height=2752,
178+
),
179+
GeneratedImage(
180+
url="https://storage.googleapis.com/public_test_files_7fa6_4277_9ab/fashion/fashion_photo_2.png",
181+
width=1024,
182+
height=1536,
183+
),
184+
]
185+
return generated_image_list
186+
187+
@override
188+
async def make_jinja2_text(
189+
self,
190+
context: Dict[str, Any],
191+
jinja2_name: Optional[str] = None,
192+
jinja2: Optional[str] = None,
193+
prompting_style: Optional[PromptingStyle] = None,
194+
template_category: Jinja2TemplateCategory = Jinja2TemplateCategory.LLM_PROMPT,
195+
wfid: Optional[str] = None,
196+
) -> str:
197+
func_name = "make_jinja2_text"
198+
log.dev(f"🤡 DRY RUN: {self.__class__.__name__}.{func_name}")
199+
jinja2_truncated = jinja2[: self._text_gen_truncate_length] if jinja2 else None
200+
jinja2_text = (
201+
f"DRY RUN: {func_name} • context={context} • jinja2_name={jinja2_name} • "
202+
f"jinja2={jinja2_truncated} • prompting_style={prompting_style} • template_category={template_category}"
203+
)
204+
return jinja2_text
205+
206+
@override
207+
async def make_ocr_extract_pages(
208+
self,
209+
job_metadata: JobMetadata,
210+
ocr_input: OcrInput,
211+
ocr_handle: OcrHandle,
212+
ocr_job_params: Optional[OcrJobParams] = None,
213+
ocr_job_config: Optional[OcrJobConfig] = None,
214+
wfid: Optional[str] = None,
215+
) -> OcrOutput:
216+
func_name = "make_ocr_extract_pages"
217+
log.dev(f"🤡 DRY RUN: {self.__class__.__name__}.{func_name}")
218+
if ocr_input.image_uri:
219+
ocr_image_as_page = Page(
220+
text="DRY RUN: OCR text",
221+
extracted_images=[],
222+
page_view=None,
223+
)
224+
ocr_output = OcrOutput(
225+
pages={1: ocr_image_as_page},
226+
)
227+
else:
228+
ocr_page_1 = Page(
229+
text="DRY RUN: OCR text",
230+
extracted_images=[],
231+
page_view=ExtractedImageFromPage(
232+
image_id="page_view_1",
233+
base_64="",
234+
caption="DRY RUN: OCR text",
235+
),
236+
)
237+
ocr_page_2 = Page(
238+
text="DRY RUN: OCR text",
239+
extracted_images=[],
240+
page_view=ExtractedImageFromPage(
241+
image_id="page_view_2",
242+
base_64="",
243+
caption="DRY RUN: OCR text",
244+
),
245+
)
246+
ocr_output = OcrOutput(
247+
pages={1: ocr_page_1, 2: ocr_page_2},
248+
)
249+
return ocr_output

0 commit comments

Comments
 (0)