feat(ltx2): implement centralized, configuration-driven logical sharding strategy for LTX-2 and LTX-2.3 by Perseus14 · Pull Request #414 · AI-Hypercomputer/maxdiffusion

Perseus14 · 2026-05-21T07:53:34Z

Summary

This PR introduces a centralized, configuration-driven logical sharding strategy registry for LTX-2 and LTX-2.3 in MaxDiffusion. It eliminates ad-hoc hardware checks and hardcoded sharding constraints in model layers by moving sharding specifications to a centralized, hardware-aware registry.

Key Changes

Centralized Specs Registry: Created logical_sharding_ltx2.py to define sharding spec profiles for Ironwood (TPU v7x, 1D heads-wise sharding) and Trillium (TPU v6e, 2D heads + embed sharding).
Model Weight Parameterization: Replaced hardcoded partitioning in LTX2 transformer, attention, VAE timestep embeddings, connectors, and the new LTX-2.3 gated attention projection layers.
Decoupled Shared Layers: Parameterized shared FFN and text projection layers in attention_flax.py and embeddings_flax.py using generic duck-typing interfaces (getattr fallback logic) to prevent code coupling.
Config-Driven Pipeline Choices: Moved VAE replication (force_replication) and text-encoding batching (use_batched_text_encoder) pipeline decisions to be configuration-driven under the central spec registry.
Robust Configuration & CLI support: Added sharding, text_encoder_dtype, compile_text_encoder, and base_output_directory parameters to LTX-2/2.3 configs, enabling dynamic text-encoder compilation and clean overrides via the CLI.
Verification: Added non-brittle unit tests (test_logical_sharding_ltx2.py) to verify routing and hardware auto-detection logic.

Performance

Model	Baseline (Denoising)	This PR (Denoising)	Change
LTX2	10.5s	10.5s	No change
LTX2.3	19.6s	19.6s	No change

Conclusion: This change is purely structuring and does not impact peformance

github-actions · 2026-05-21T07:53:46Z

e2e testgrid: https://8bcf50593faf4ea38060e236169827e5-dot-us-central1.composer.googleusercontent.com/dags/maxdiffusion_tpu_e2e/grid

github-actions · 2026-05-21T17:03:20Z

🤖 Hi @Perseus14, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

github-actions

## 📋 Review Summary

The Pull Request successfully centralizes the sharding strategy for LTX-2 and LTX-2.3, which is a great architectural improvement. It eliminates hardware-specific logic from individual model layers and moves it to a configuration-driven registry. This significantly improves maintainability and makes it easier to support new hardware in the future.

🔍 General Feedback

Correctness: Identified a potential AttributeError in core model files (attention_flax.py, embeddings_flax.py) when sharding_specs is None. This needs to be addressed as it will cause crashes when these components are used with default arguments.
Efficiency: The sharding specs are resolved repeatedly during inference in the pipeline. Storing these specs as pipeline attributes during initialization would be a minor but worthwhile optimization.
Robustness: The strategy lookup logic silently defaults to a specific hardware profile on unknown input, which could hide configuration typos.
Tests: The inclusion of test_logical_sharding_ltx2.py and updates to existing tests provide good coverage for the new sharding logic.

github-actions · 2026-05-21T17:41:36Z

🤖 Hi @Perseus14, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

github-actions

## 📋 Review Summary

This PR successfully implements a centralized, configuration-driven logical sharding strategy registry for LTX-2 and LTX-2.3. The refactoring significantly improves the modularity and maintainability of the sharding logic by decoupling it from individual model layers and hardware-specific checks.

🔍 General Feedback

Architecture: The introduction of logical_sharding_ltx2.py is a great architectural improvement, making sharding strategies explicit and easily extensible.
Robustness: The use of safe_getattr and fallback logic ensures that the model remains functional even with incomplete sharding specifications.
Performance: Moving pipeline decisions like VAE replication and text-encoder batching to the registry allows for better hardware-specific tuning.
Minor Issues: I've noted a likely discrepancy in the text encoder batching logic for Ironwood and a change in the default VAE replication behavior that should be confirmed.

github-actions · 2026-05-21T18:01:24Z

🤖 Hi @Perseus14, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

github-actions · 2026-05-22T12:38:36Z

🤖 I'm sorry @Perseus14, but I was unable to process your request. Please see the logs for more details.

github-actions · 2026-05-22T18:11:41Z

🤖 Hi @Perseus14, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

github-actions

## 📋 Review Summary

This PR successfully introduces a centralized, configuration-driven sharding strategy for LTX-2 and LTX-2.3 models, which is a significant improvement over hardcoded hardware checks. The implementation uses a registry-based approach that correctly maps hardware profiles to logical sharding specifications.

🔍 General Feedback

Regression on Ironwood: The use_batched_text_encoder flag for the ironwood profile is currently set to False, which contradicts the previous logic that enabled it for TPU v7x. This should be corrected to avoid a performance/correctness regression.
Incomplete Parameterization: While the major components (Transformer, Attention) are well-parameterized, some internal layers in the VAE ResNet blocks and the Text Encoder Feature Extractor were missed and still use hardcoded partitioning.
Improved Maintainability: Moving hardware-specific logic to logical_sharding_ltx2.py greatly simplifies the model code and makes it easier to support future hardware.
Unit Testing: The addition of test_logical_sharding_ltx2.py provides good coverage for the new factory logic.

prishajain1

Just one comment about batched text encoder for ironwood, rest looks good to me!

mbohlool

there are merge conflicts. generally looks good. just two comments.

Perseus14 · 2026-05-27T11:20:31Z

PTAL @mbohlool, I have addressed your comments

Perseus14 requested a review from entrpn as a code owner May 21, 2026 07:53

Perseus14 force-pushed the ltx2_sharding branch 4 times, most recently from f6789b2 to a37a799 Compare May 21, 2026 10:12

Perseus14 requested review from mbohlool and prishajain1 May 21, 2026 10:19

Perseus14 force-pushed the ltx2_sharding branch 4 times, most recently from c35d3da to b45ac10 Compare May 21, 2026 16:04

Perseus14 self-assigned this May 21, 2026

Perseus14 added the gemini-review label May 21, 2026

github-actions Bot reviewed May 21, 2026

View reviewed changes

Perseus14 force-pushed the ltx2_sharding branch 4 times, most recently from 5bcf2e8 to 48c2d3d Compare May 21, 2026 17:24

Perseus14 added gemini-review and removed gemini-review labels May 21, 2026

github-actions Bot reviewed May 21, 2026

View reviewed changes

Comment thread src/maxdiffusion/models/ltx2/logical_sharding_ltx2.py

Comment thread src/maxdiffusion/models/ltx2/logical_sharding_ltx2.py

Comment thread src/maxdiffusion/models/ltx2/transformer_ltx2.py

Perseus14 force-pushed the ltx2_sharding branch 3 times, most recently from 0abfd7e to 794703f Compare May 21, 2026 17:55

Perseus14 added gemini-review and removed gemini-review labels May 21, 2026

Perseus14 added gemini-review and removed gemini-review labels May 22, 2026

github-actions Bot reviewed May 22, 2026

View reviewed changes

Perseus14 force-pushed the ltx2_sharding branch 2 times, most recently from 26a35e7 to 441a307 Compare May 22, 2026 21:47

prishajain1 reviewed May 23, 2026

View reviewed changes

Comment thread src/maxdiffusion/models/ltx2/logical_sharding_ltx2.py Outdated

prishajain1 reviewed May 23, 2026

View reviewed changes

Perseus14 force-pushed the ltx2_sharding branch from 441a307 to 8f4b108 Compare May 25, 2026 08:50

prishajain1 self-requested a review May 25, 2026 14:35

prishajain1 previously approved these changes May 25, 2026

View reviewed changes

github-actions Bot added the pull ready label May 25, 2026

mbohlool requested changes May 26, 2026

View reviewed changes

Comment thread src/maxdiffusion/pipelines/ltx2/ltx2_pipeline.py Outdated

Comment thread src/maxdiffusion/pyconfig.py Outdated

Perseus14 dismissed prishajain1’s stale review via a5e0104 May 27, 2026 10:08

Perseus14 force-pushed the ltx2_sharding branch 6 times, most recently from 83a74c5 to 634b642 Compare May 27, 2026 11:16

Perseus14 requested review from mbohlool and prishajain1 May 27, 2026 17:34

Perseus14 force-pushed the ltx2_sharding branch 3 times, most recently from eb9598e to 3136b13 Compare May 30, 2026 06:37

feat(ltx2): implement logical sharding

2e0b568

Perseus14 force-pushed the ltx2_sharding branch from 3136b13 to 2e0b568 Compare May 30, 2026 08:27

mbohlool approved these changes May 30, 2026

View reviewed changes

Conversation

Perseus14 commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Changes

Performance

Uh oh!

github-actions Bot commented May 21, 2026

Uh oh!

github-actions Bot commented May 21, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

🔍 General Feedback

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 21, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

🔍 General Feedback

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 21, 2026

Uh oh!

github-actions Bot commented May 22, 2026

Uh oh!

github-actions Bot commented May 22, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

🔍 General Feedback

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

prishajain1 left a comment

Choose a reason for hiding this comment

Uh oh!

mbohlool left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Perseus14 commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Perseus14 commented May 21, 2026 •

edited

Loading