GenAI servables - refactor input processing by mzegla · Pull Request #4318 · openvinotoolkit/model_server

mzegla · 2026-06-23T09:54:10Z

No description provided.

Copilot

Pull request overview

This PR refactors GenAI servable input handling to use a unified InputRequest + InputProcessor chain, moving generation-config extraction into the API handler and deferring multimodal image decoding (and related validation) out of the OpenAI request parsers.

Changes:

Introduces InputRequest and an InputProcessor pipeline (raw prompt extraction, chat template application, tokenization, deferred image decoding, text-content normalization).
Updates LM/VLM servables and executors to consume executionContext->inputRequest (and removes legacy prepareInputs overrides in VLM servables).
Updates OpenAI handlers/tests to preserve multimodal content arrays in ChatHistory, removes processedJson/imageHistory, and adjusts tools parsing assertions.

Reviewed changes

Copilot reviewed 36 out of 36 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
src/test/llm/llmnode_test.cpp	Updates tests to use `executionContext.inputRequest.inputIds`.
src/test/llm/input_processing/raw_prompt_extractor_test.cpp	Adds unit tests for `RawPromptExtractor`.
src/test/llm/input_processing/image_decoding_processor_test.cpp	Adds unit tests for `ImageDecodingProcessor` behavior without actual decoding.
src/test/http_openai_handler_test.cpp	Updates parsing tests to assert `ChatHistory` preservation and tool map contents (no `processedJson`/`imageHistory`).
src/llm/visual_language_model/legacy/servable.hpp	Removes VLM legacy `inputText/inputImages` fields and sets `isVLM` in processor context.
src/llm/visual_language_model/legacy/servable.cpp	Switches to `extractInputRequest()`; removes legacy VLM `prepareInputs` implementation.
src/llm/visual_language_model/legacy/legacy_executor.cpp	Uses `inputRequest.promptText/inputImages/generationConfig` for generation.
src/llm/visual_language_model/continuous_batching/servable.hpp	Removes VLM CB `inputText/inputImages` fields and sets `isVLM` in processor context.
src/llm/visual_language_model/continuous_batching/servable.cpp	Uses `inputRequest.*` when adding requests; removes legacy VLM `prepareInputs`.
src/llm/servable.hpp	Replaces `inputIds`/`GenerationConfigBuilder` in execution context with `InputRequest`; adds `InputProcessorContext`.
src/llm/servable.cpp	Refactors base `parseRequest/prepareInputs` to build and process `InputRequest`.
src/llm/servable_initializer.cpp	Populates `InputProcessorContext` (tokenizer + optional Python template processor).
src/llm/language_model/legacy/servable.cpp	Uses `inputRequest` for generation config and NPU input-length validation.
src/llm/language_model/legacy/legacy_executor.cpp	Uses `inputRequest.inputIds/generationConfig` for generation.
src/llm/language_model/continuous_batching/servable.cpp	Uses `inputRequest` for scheduler limits and pipeline add_request.
src/llm/io_processing/input_request.hpp	Adds `InputRequest` and `InputPayload` variant.
src/llm/io_processing/input_processors/tokenization_processor.hpp	Adds tokenization processor definition.
src/llm/io_processing/input_processors/tokenization_processor.cpp	Implements tokenization into `req.inputIds`.
src/llm/io_processing/input_processors/text_content_normalization_processor.hpp	Adds text-only content-array normalizer (LM paths).
src/llm/io_processing/input_processors/text_content_normalization_processor.cpp	Implements content-array flattening to string with `\\n` joins.
src/llm/io_processing/input_processors/raw_prompt_extractor.hpp	Adds raw prompt extractor (COMPLETIONS path).
src/llm/io_processing/input_processors/image_decoding_processor.hpp	Adds deferred image decoding processor (VLM paths).
src/llm/io_processing/input_processors/image_decoding_processor.cpp	Implements image decoding + `<ov_genai_image_N>` injection into message content.
src/llm/io_processing/input_processors/chat_template_processor.hpp	Adds chat template processor (Python and native paths).
src/llm/io_processing/input_processors/chat_template_processor.cpp	Implements prompt building from `ChatHistory`.
src/llm/io_processing/input_processor.hpp	Adds orchestrator selecting processors based on config + payload variant.
src/llm/io_processing/input_processor.cpp	Builds and executes the processor chain.
src/llm/io_processing/input_processor_context.hpp	Adds per-deployment resources for input processing.
src/llm/io_processing/input_processing_config.hpp	Adds deployment-level processing config (`isVLM`).
src/llm/io_processing/base_input_processor.hpp	Adds base interface for processing steps.
src/llm/BUILD	Adds Bazel targets/deps for new IO processing components.
src/llm/apis/openai_responses.cpp	Preserves content arrays in `ChatHistory` and removes Python `processedJson` path + eager image decoding.
src/llm/apis/openai_request.hpp	Removes `processedJson` and `imageHistory` from `OpenAIRequest`.
src/llm/apis/openai_completions.cpp	Preserves multimodal `content` arrays in `ChatHistory` and removes eager image decoding + `processedJson` rebuild.
src/llm/apis/openai_api_handler.hpp	Removes `getProcessedJson/getImageHistory`; adds `extractInputRequest()`.
src/llm/apis/openai_api_handler.cpp	Implements `extractInputRequest()` and removes `processedJson` mutations from tools parsing.

Copilot

Pull request overview

Copilot reviewed 36 out of 36 changed files in this pull request and generated 7 comments.

Copilot

Pull request overview

Copilot reviewed 41 out of 41 changed files in this pull request and generated 4 comments.

Copilot

Pull request overview

Copilot reviewed 41 out of 42 changed files in this pull request and generated 1 comment.

dkalinowski · 2026-06-25T12:23:55Z

+        req.input = request.chatHistory;
+        // Populate tools and chat_template_kwargs on the copied ChatHistory so
+        // ChatTemplateProcessor can access them via get_tools()/get_extra_context().
+        auto& chatHistory = std::get<ov::genai::ChatHistory>(req.input);


it can throw an exception if req.input is not of type ChatHistory. can we try catch that and return error if it happens?

dkalinowski · 2026-06-25T12:40:21Z

+    absl::Status process(InputRequest& req) override;
+
+private:
+    ov::genai::Tokenizer* tokenizer;  // non-owning; lifetime tied to InputProcessorContext


nit pick - we can do that since all constructors take tokenizer ref as argument

Suggested change

ov::genai::Tokenizer* tokenizer; // non-owning; lifetime tied to InputProcessorContext

ov::genai::Tokenizer& tokenizer; // non-owning; lifetime tied to InputProcessorContext

dkalinowski · 2026-06-25T12:41:07Z

+    allowedMediaDomains(std::move(allowedMediaDomains)) {}
+
+absl::Status ImageDecodingProcessor::process(InputRequest& req) {
+    ov::genai::ChatHistory& chatHistory = std::get<ov::genai::ChatHistory>(req.input);


also, do we need try catch and return error?

dkalinowski · 2026-06-25T12:43:46Z

+    absl::Status process(InputRequest& req) override;
+
+private:
+    ov::genai::Tokenizer* tokenizer;  // non-owning; lifetime tied to InputProcessorContext


Suggested change

ov::genai::Tokenizer* tokenizer; // non-owning; lifetime tied to InputProcessorContext

ov::genai::Tokenizer& tokenizer; // non-owning; lifetime tied to InputProcessorContext

dkalinowski · 2026-06-25T12:44:23Z

+    // True when the GenAI built-in tokenizer.apply_chat_template() should be used
+    // even on Python-enabled builds (i.e. ChatTemplateMode::MINJA).
+    // False (default) uses PyJinjaTemplateProcessor when PYTHON_DISABLE==0.
+    bool useMinja = false;


do I put auto deduced caps/workarounds here?

dkalinowski · 2026-06-25T12:47:11Z

-            const auto& chatTemplateKwargs = chatTemplateKwargsParsingResult.value();
-            if (llm_calculator_logger->should_log(spdlog::level::trace)) {
-                SPDLOG_LOGGER_TRACE(llm_calculator_logger, "VLM chatHistory messages: {}", chatHistory.get_messages().to_json_string());
-                SPDLOG_LOGGER_TRACE(llm_calculator_logger, "VLM chatHistory.get_tools(): {}", chatHistory.get_tools().to_json_string());


do we still have such logs somewhere? I think they were useful

dkalinowski · 2026-06-25T12:48:50Z

+    if (!getProperties()->inputProcessorContext.config.isVLM &&
+        std::holds_alternative<ov::genai::ChatHistory>(executionContext->inputRequest.input)) {
+        const auto& ch = std::get<ov::genai::ChatHistory>(executionContext->inputRequest.input);
+        for (size_t i = 0; i < ch.size(); i++) {


could be separate util function

dkalinowski · 2026-06-25T12:50:10Z

                        "/wd6240", 
                        "/wd6326",
                        "/wd6385",
+                        "/wd6386",


whats this warning?

dkalinowski · 2026-06-25T14:19:17Z

-TEST_P(HttpOpenAIHandlerChatAndResponsesParsingTest, ParsingImageStringWithNoMimePrefixFails) {
-    // Without a "data:..." prefix the URL falls through to the local-filesystem loader,
-    // which is disabled by default.
+TEST_P(HttpOpenAIHandlerChatAndResponsesParsingTest, ParsingImageStringWithNoMimePrefixPreservedInChatHistory) {


if we change the assert here, do we have separate test checking if these requests are rejected during other phase?

dkalinowski · 2026-06-25T14:22:53Z

+TEST_F(HttpOpenAIHandlerParsingTest, ParsingMessagesWithNoContentFieldAddsEmptyStringToChatHistory) {
+    // Assistant turns that carry only tool_calls legitimately omit the "content" field.
+    // parseMessages injects an empty-string content entry so chat templates always see
+    // the field.  Verifies the JsonContainer proxy write-through: lastMessage["content"] = ""


this is one of "workarounds" llama.cpp and minja applies. I see we already have it - for all models without autodetection. Is it new behavior or you added this in this PR? If so, which processor handles that?

mzegla force-pushed the servables_input_flow_refactor_1 branch 4 times, most recently from f8f5106 to 6fe4461 Compare June 23, 2026 14:53

mzegla requested a review from Copilot June 23, 2026 14:54

Copilot started reviewing on behalf of mzegla June 23, 2026 14:55 View session

Copilot AI reviewed Jun 23, 2026

View reviewed changes

init

f5b6037

mzegla force-pushed the servables_input_flow_refactor_1 branch from 6fe4461 to f5b6037 Compare June 24, 2026 08:44

mzegla requested a review from Copilot June 24, 2026 08:48

Copilot AI reviewed Jun 24, 2026

View reviewed changes

Copilot started reviewing on behalf of mzegla June 24, 2026 09:18 View session

mzegla requested review from dkalinowski and dtrawins June 24, 2026 09:53

mzegla added 3 commits June 24, 2026 13:41

copilot comments

8ae413b

drop modelsPath

a811b0e

bring back non const tokenizer

5b2f535

mzegla requested a review from Copilot June 24, 2026 12:35

Copilot started reviewing on behalf of mzegla June 24, 2026 12:36 View session

Copilot AI reviewed Jun 24, 2026

View reviewed changes

Comment thread src/llm/servable.hpp

Comment thread src/llm/servable.hpp

Comment thread src/llm/io_processing/input_processor_context.hpp

Comment thread src/llm/servable.cpp

mzegla added 2 commits June 24, 2026 15:36

minor suggestions

b8ac775

openai handler tests cleanup

8e56150

mzegla requested a review from Copilot June 25, 2026 10:56

mzegla marked this pull request as ready for review June 25, 2026 10:56

Copilot started reviewing on behalf of mzegla June 25, 2026 10:57 View session

Copilot AI reviewed Jun 25, 2026

View reviewed changes

Comment thread src/llm/apis/openai_completions.cpp Outdated

copilot

46b790f

mzegla requested a review from pgladkows June 25, 2026 11:54

dkalinowski reviewed Jun 25, 2026

View reviewed changes

Comment thread common_settings.bzl

"/wd6240",

"/wd6326",

"/wd6385",

"/wd6386",

dkalinowski Jun 25, 2026

Copy link
Copy Markdown

Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whats this warning?

dkalinowski reviewed Jun 25, 2026

View reviewed changes

	ov::genai::Tokenizer* tokenizer; // non-owning; lifetime tied to InputProcessorContext
	ov::genai::Tokenizer& tokenizer; // non-owning; lifetime tied to InputProcessorContext

Uh oh!

Conversation

mzegla commented Jun 23, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dkalinowski Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dkalinowski Jun 25, 2026 •

edited

Loading