Skip to content

fix(qwen): fix ghosting artifacts Qwen Image Edit#9155

Merged
lstein merged 7 commits into
mainfrom
lstein/bugfix/qwen-image-edit-ghosting
May 13, 2026
Merged

fix(qwen): fix ghosting artifacts Qwen Image Edit#9155
lstein merged 7 commits into
mainfrom
lstein/bugfix/qwen-image-edit-ghosting

Conversation

@lstein
Copy link
Copy Markdown
Collaborator

@lstein lstein commented May 11, 2026

Summary

  • Qwen Image Edit was producing a ghost/doubling artifact across the whole frame outside the masked edit region. Root cause: qwen_image_denoise.py bilinear-resized the reference latents to the noisy latent's dimensions and used identical img_shapes tuples for both segments. QwenEmbedRope varies its spatial RoPE frequencies by each segment's H/W, so identical dims gave both segments the same spatial positions and cross-attention couldn't disentangle them.
  • The denoise now keeps reference latents at their own (H, W), packs them at those dims, and places (1, ref_h // 2, ref_w // 2) in the reference segment of img_shapes — matching diffusers' QwenImageEditPipeline (pipeline_qwenimage_edit.py:755-760) and QwenImageEditPlusPipeline (pipeline_qwenimage_edit_plus.py:743-751).
  • The reference qwen_image_i2l is now resized to ~1024² area preserving aspect ratio (matching diffusers' VAE_IMAGE_SIZE) so the reference token sequence stays in the distribution the model was trained on.
  • Defensive: qwen_image_image_to_latents.py bumps multiple_of from 8 to 16 so VAE-encoded latents always have even spatial dims (required by the 2×2 patch packing); a ValueError is raised if a directly-wired reference latent has spatial dims that align to less than 2.

Test plan

  • Backend: uv run pytest tests/app/invocations/test_qwen_image_denoise.py — 15 pass (8 new: _align_ref_latent_dims validation incl. zero-dim guard, _build_img_shapes distinct-dims regression)
  • Frontend: pnpm test:no-watch buildQwenImageGraph — 30 pass (6 new: calculateQwenImageEditRefDimensions matches diffusers calculate_dimensions for square/landscape/portrait/extreme-ratio inputs; verifies computed dims land on the ref i2l node)
  • Frontend lint: pnpm lint:eslint, pnpm lint:prettier, pnpm lint:knip, pnpm lint:tsc — clean
  • Backend lint: uv run ruff check / uv run ruff format --check on changed files — clean
  • Broader Qwen tests pass: text encoder, model loader, LoRA conversion utils, main config, GGUF variant detection, double variant regression
  • Manual: Using the full Qwen Image Edit diffusers model, upload a reference image and prompt for a targeted change, "e.g. change the subject's tee-shirt from blue to red.".
  • Manual: same with one of the quantized (GGUF) transformers, with and without applying a Lightning LoRA — confirm artifact is gone in both.
  • Manual: regular Qwen Image txt2img / img2img / inpaint / outpaint flows still produce expected output (the multiple_of=16 i2l change is a no-op for canvas-composited paths but worth a smoke test).
  • Manual: try with a reference image whose aspect ratio differs significantly from the output dimensions. It should renderwithout distortion.

🤖 Generated with Claude Code

…e Edit

Qwen Image Edit was applying identical RoPE positions to the noisy
and reference latent segments (both packed at the noisy latent's
dimensions), so cross-attention couldn't disentangle them — reference
content bled into the generation as a faintly offset ghost across the
whole frame, outside the masked edit region.

The denoise now keeps reference latents at their own (H, W) and uses
those dims in the reference segment of img_shapes, matching diffusers'
QwenImageEditPipeline / QwenImageEditPlusPipeline.

The reference qwen_image_i2l is resized to ~1024² area preserving
aspect ratio (matching diffusers' VAE_IMAGE_SIZE) so the reference
token sequence stays in the distribution the model was trained on.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added python PRs that change python files invocations PRs that change invocations frontend PRs that change frontend files python-tests PRs that change python tests labels May 11, 2026
@lstein lstein changed the title fix(qwen): use distinct img_shapes for reference latents in Qwen Image Edit fix(qwen): fix ghosting artifacts Qwen Image Edit May 11, 2026
@lstein lstein added the v6.13.x label May 11, 2026
@lstein lstein moved this to 6.13.x Theme: MODELS in Invoke - Community Roadmap May 11, 2026
@JPPhoto
Copy link
Copy Markdown
Collaborator

JPPhoto commented May 11, 2026

One code issue to be resolved prior to functional testing:

  • invokeai/app/invocations/qwen_image_denoise.py:372-395, invokeai/app/invocations/qwen_image_image_to_latents.py:36-43,83-89, invokeai/frontend/web/src/features/nodes/util/graph/generation/buildQwenImageGraph.ts:198-210: the fix only clamps the reference image to the diffusers-style ~1024^2 size when the graph is built by the updated frontend. The backend qwen_image_denoise path now preserves and packs whatever reference latent size it is given, but qwen_image_i2l still defaults to encoding at the image's original size unless width and height are explicitly set. Direct API/manual graph callers, and any existing graph JSON that does not populate those fields, will still send native-resolution reference latents into the transformer. In those cases the model still receives out-of-distribution reference sequence lengths, so the original artifact can persist and VRAM usage can spike on large reference images. To expose this issue, add a backend integration test that wires qwen_image_i2l to qwen_image_denoise without frontend-provided width/height and asserts the ref segment is resized/clamped to the diffusers-derived dimensions instead of staying at native size.

JPPhoto and others added 2 commits May 11, 2026 09:05
The frontend resizes the reference image to ~1024² area before VAE
encoding, but direct API callers and older graph JSON can wire
qwen_image_i2l → qwen_image_denoise without explicit width/height,
sending a native-resolution reference latent into the transformer.
Without the clamp the model receives an out-of-distribution sequence
length (artifact returns, VRAM spikes).

Mirror diffusers' QwenImageEdit(Plus) VAE_IMAGE_SIZE behavior in
latent space: bilinear-downscale the reference latent to
calculate_dimensions(1024², aspect_ratio) snapped to multiples of 32
in pixel space (= multiples of 4 in latent space, so always packable).
In-budget latents pass through untouched.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@lstein
Copy link
Copy Markdown
Collaborator Author

lstein commented May 11, 2026

Good catch, addressed in eaf116e — added _maybe_clamp_ref_latent_size in qwen_image_denoise.py that bilinear-downscales the reference latent to diffusers' calculate_dimensions(1024², aspect_ratio) (snapped to multiples of 32 in pixel space) whenever the input exceeds the budget. It runs before _align_ref_latent_dims, so direct API/manual graph callers, older graph JSON without width/height on qwen_image_i2l, or any other producer of native-resolution reference latents all end up at the same dims the frontend was already producing — and the same dims diffusers feeds the transformer. In-budget latents (≤ 128×128 = 1024² pixels) pass through untouched.

7 new unit tests in TestMaybeClampRefLatentSize cover the matrix you asked about: in-budget (unchanged), native landscape (1600×1200 → clamped to 1184×896 → 148×112 latent), native portrait, huge (4096² → 1024²), extreme aspect ratio (1920×1080 → 1376×768), and a parametric check that every clamp output is packable (multiple of 4 in latent space, so always even after _align_ref_latent_dims).

I didn't add a full I2L→denoise integration test because the i2l step requires loading the Qwen VAE model — heavy for CI. The unit tests assert the exact same invariant the integration test would: given a too-large reference latent, the dims fed to packing/img_shapes are the diffusers-derived ones, not native.

Copy link
Copy Markdown
Collaborator

@JPPhoto JPPhoto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to do testing with the quantized edit model with and without the Lightning LoRA and it worked with a different source image aspect ratio than output. I think this change is good for the next RC and public testing!

@lstein lstein merged commit 8f46d8b into main May 13, 2026
16 checks passed
@lstein lstein deleted the lstein/bugfix/qwen-image-edit-ghosting branch May 13, 2026 04:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

frontend PRs that change frontend files invocations PRs that change invocations python PRs that change python files python-tests PRs that change python tests v6.13.x

Projects

Status: 6.13.x Theme: MODELS

Development

Successfully merging this pull request may close these issues.

3 participants