Skip to content

fix(models): preserve media blocks in _flatten_ollama_content#5296

Open
saivedant169 wants to merge 1 commit intogoogle:mainfrom
saivedant169:fix/ollama-inline-image-parts
Open

fix(models): preserve media blocks in _flatten_ollama_content#5296
saivedant169 wants to merge 1 commit intogoogle:mainfrom
saivedant169:fix/ollama-inline-image-parts

Conversation

@saivedant169
Copy link
Copy Markdown

Fixes #4975

Problem

When sending multimodal messages (text + image) through the /run endpoint with an ollama_chat model, the image data is silently dropped. The model responds as if no image was attached, and LiteLLM's debug output shows images: [].

The root cause is in _flatten_ollama_content() (src/google/adk/models/lite_llm.py). It iterates multipart content blocks and keeps only type == "text" entries, joining them into a plain string. Any image_url, video_url, or audio_url blocks are discarded before LiteLLM ever sees them.

LiteLLM's Ollama handler already knows how to convert multipart arrays containing image_url blocks into Ollama's native images field. But it needs the list to do that — once ADK flattens everything to a string, the conversion path is unreachable.

Fix

Before flattening to a string, check whether any block has a media type (image_url, video_url, audio_url). If so, return the original list instead of flattening. Text-only content is still flattened to a plain string for compatibility.

The return type of _flatten_ollama_content changes from str | None to OpenAIMessageContent | str | None to reflect that it may now return the list unchanged.

Testing plan

Unit tests (all pass):

  • test_flatten_ollama_content_preserves_image_url_blocks — image-only content returns as list
  • test_flatten_ollama_content_preserves_mixed_text_and_image — text + image returns full list
  • test_flatten_ollama_content_preserves_video_url_blocks — video_url also preserved
  • test_flatten_ollama_content_serializes_non_media_non_text_blocks_to_json — unknown types still serialize to JSON
  • Updated test_generate_content_async_ollama_chat_preserves_multimodal_content — integration test confirms multimodal content reaches LiteLLM as a list
  • Updated test_generate_content_async_custom_provider_preserves_multimodal — same for custom_llm_provider path
  • All existing text-only flatten tests still pass unchanged
pytest tests/unittests/models/test_litellm.py
======================== 247 passed, 1 warning in 3.03s ========================

_flatten_ollama_content was stripping image_url, video_url, and
audio_url blocks when flattening multipart content for ollama_chat.
This meant LiteLLM never received the image data, so Ollama's native
images field was always empty.

The fix checks for media block types before flattening. When any
media block is present, the full multipart list is returned so
LiteLLM can convert it to Ollama's format. Text-only content is
still flattened to a plain string as before.

Fixes google#4975
@adk-bot adk-bot added the models [Component] Issues related to model support label Apr 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

models [Component] Issues related to model support

Projects

None yet

Development

Successfully merging this pull request may close these issues.

inlineData image parts dropped when using /run endpoint

2 participants