Skip to content

Create adding_new_model_tutorial.md#1784

Open
danielkorzekwa wants to merge 28 commits into
mainfrom
dkorzekwa/claude_qwen35
Open

Create adding_new_model_tutorial.md#1784
danielkorzekwa wants to merge 28 commits into
mainfrom
dkorzekwa/claude_qwen35

Conversation

@danielkorzekwa

@danielkorzekwa danielkorzekwa commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

BEFORE REVIEWING IT:

Type of change: new feature.
#1780

Usage

see: .agents/skills/puzzletron/adding_new_model_tutorial.md

Testing

Tested manually. Go through this tutorial before accepting this MR.

Before your PR is "Ready for review"

  • Is this change backward compatible?: ✅
  • Did you write any new necessary tests?: tested manually
  • Did you update Changelog?: ✅
  • Did you get Claude approval on this PR?: ❌

Summary by CodeRabbit

Release Notes

  • New Features

    • Experimental /puzzletron command for AI code agents enabling model pruning and compression pipelines with support for Qwen3.5 (0.8B, 2B) and Llama models.
    • Real-time progress tracking and loss monitoring for optimization runs.
  • Documentation

    • Added comprehensive tutorial for integrating new models into the pruning framework.
    • Expanded examples documentation for AI agent integration.

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
…re always defined

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
…l and mip commands

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
…ection in mip_progress.py

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
…t masked by grep exit code

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
@danielkorzekwa danielkorzekwa requested review from a team as code owners June 22, 2026 07:49
@coderabbitai

coderabbitai Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

Adds Qwen3.5 (0.8B and 2B) as supported Puzzletron models via a new converter, model descriptor, layer descriptor dataclasses, and complete Hydra config trees. Simultaneously introduces an experimental /puzzletron agent skill with a machine-readable routing spec, four Python progress/results scripts, a new-model tutorial, and minor fixes to existing LLaMA configs.

Changes

Qwen3.5 model support

Layer / File(s) Summary
Qwen3.5 converter, descriptor, and factory registration
modelopt/torch/puzzletron/anymodel/models/__init__.py, modelopt/torch/puzzletron/anymodel/models/qwen3_5/__init__.py, modelopt/torch/puzzletron/anymodel/models/qwen3_5/qwen3_5_converter.py, modelopt/torch/puzzletron/anymodel/models/qwen3_5/qwen3_5_model_descriptor.py
Qwen3_5Converter handles per-layer block config generation, directory-level config conversion, and model.language_model.model. weight name remapping. Qwen3_5ModelDescriptor provides no-op post-init stubs branching on full_attention vs linear_attention, weight grouping that handles both original VLM and already-converted checkpoints, and regex-based layer name predicates. Qwen3_5FFNIntermediateLayerDescriptor and Qwen3_5KVHeadsLayerDescriptor supply Qwen3.5-specific pruning paths. Both classes are registered and gated behind transformers >= 4.57.0.
Qwen3.5 0.8B config tree
examples/puzzletron/configs/qwen3_5-0.8b_pruneffn_memory/...
Adds qwen3_5-0.8b_pruneffn_memory.yaml (10,000 MiB target memory, intermediate_size_list: [768, 1536, 2048, 3072]), shared qwen3_5.yaml (replacement library, subblock stats, MIP objective/constraints, realization), pruning defaults, and FFN/attention/hidden-dim pruning overrides plus validate model/solutions defaults.
Qwen3.5 2B config tree
examples/puzzletron/configs/qwen3_5-2B_pruneffn_memory/...
Parallel config tree for the 2B variant with 20,000 MiB target memory and intermediate_size_list: [1280, 2560, 3584, 5120], otherwise structurally identical to the 0.8B tree.
LLaMA config corrections
examples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/llama-3_1-8B_pruneffn_memory.yaml, examples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/validate_model_defaults.yaml
puzzle_dir updated to a model-specific path, compression rate 1.0 added to the MIP sweep list, and val_dataset_name corrected from valid to validation.

Puzzletron agent skill

Layer / File(s) Summary
Agent skill spec and user-facing README
.agents/skills/puzzletron/SKILL.md, .agents/skills/puzzletron/README.md
SKILL.md defines the /puzzletron <command> [args] routing rules with early-STOP validation, argument parsing for all and mip, and a five-step add-model procedure covering factory checks, descriptor/converter implementation, version-gated registration, YAML setup, and verification. README.md documents each command with example outputs.
Progress and results scripts
.agents/skills/puzzletron/all_progress.py, .agents/skills/puzzletron/mip_progress.py, .agents/skills/puzzletron/mip_losses.py, .agents/skills/puzzletron/mip_sweep.py
all_progress.py parses log.txt to print an 8-step pipeline status table with per-step elapsed time, current-step detail (solution count, batch percent, or MIP node count), and remaining-time estimates. mip_progress.py supports both single-run and sweep modes with per-compression-rate DONE/RUNNING/pending rows. mip_losses.py reads teacher.json and solution_0.json to compare teacher vs compressed metrics. mip_sweep.py formats mip_sweep_results.csv as a results table.
Tutorial, symlink, and doc updates
.agents/skills/puzzletron/adding_new_model_tutorial.md, .claude/skills/puzzletron, examples/puzzletron/README.md, CHANGELOG.rst
Full Qwen3.5-0.8B integration tutorial covering the end-to-end conversation from initial pipeline failure through sweep analysis. .claude/skills/puzzletron symlink added pointing to .agents/skills/puzzletron. "Using with AI agents" subsection added to the examples README, and a release note added to CHANGELOG.rst.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

Possibly related PRs

  • NVIDIA/Model-Optimizer#1765: Changes the same validate_model_defaults.yaml field (val_dataset_name: validvalidation) for the LLaMA-3.1-8B config, which is also modified in this PR.

Suggested reviewers

  • kevalmorabia97
  • ChenhanYu
  • cjluo-nv

Caution

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

  • Ignore

❌ Failed checks (1 error, 1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Security Anti-Patterns ❌ Error SKILL.md contains hardcoded trust_remote_code=True at lines 130 and 148, violating SECURITY.md which explicitly forbids this RCE vector for untrusted models. Remove hardcoded trust_remote_code=True from both AutoConfig.from_pretrained() calls in SKILL.md. Default to trust_remote_code=False and prompt users to explicitly approve if needed.
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The PR title 'Create adding_new_model_tutorial.md' is partially related to the changeset. While this file is included in the PR, it represents only one of many substantial additions. The changeset introduces a complete new skill system for puzzletron (documentation, scripts, configurations, and model support) that far exceeds the scope implied by the title. Consider a more comprehensive title such as 'Add puzzletron agent skill with comprehensive documentation and Qwen 3.5 support' to better reflect the full scope of changes including the skill implementation, model support, configurations, and tutorials.
✅ Passed checks (3 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch dkorzekwa/claude_qwen35

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning

CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.

Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.

👉 Steps to fix this

Actionable comments posted: 2

🧹 Nitpick comments (1)
modelopt/torch/puzzletron/anymodel/models/qwen3_5/__init__.py (1)

16-17: 🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Add an explicit __all__ in this package __init__.py.

This new __init__.py re-exports symbols but does not define its public surface explicitly.

Suggested patch
+from .qwen3_5_converter import __all__ as _qwen3_5_converter_all
 from .qwen3_5_converter import *
+from .qwen3_5_model_descriptor import __all__ as _qwen3_5_model_descriptor_all
 from .qwen3_5_model_descriptor import *
+
+__all__ = [*_qwen3_5_converter_all, *_qwen3_5_model_descriptor_all]

As per coding guidelines: “Define the public API with __all__ at the top of each module and re-export via from .module import * in package __init__.py files.”

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@modelopt/torch/puzzletron/anymodel/models/qwen3_5/__init__.py` around lines
16 - 17, The package __init__.py file uses wildcard imports from
qwen3_5_converter and qwen3_5_model_descriptor modules but does not explicitly
define the public API surface. Add an explicit __all__ list to this __init__.py
file after the import statements that specifies which symbols from those modules
should be publicly exported by the package. This __all__ list should contain the
names of the symbols that are intended to be part of the public API for users
importing from this package.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.agents/skills/puzzletron/SKILL.md:
- Line 130: The AutoConfig.from_pretrained calls have hardcoded
trust_remote_code=True which is a security vulnerability when handling untrusted
model sources. Change both occurrences of AutoConfig.from_pretrained (at the
initial config loading and the retry logic) to default to
trust_remote_code=False. Wrap the AutoConfig.from_pretrained calls in try-except
blocks to catch custom modeling code errors, and when such an error occurs,
prompt the user to confirm they trust the model source before retrying with
trust_remote_code=True if they explicitly approve.

In
`@examples/puzzletron/configs/qwen3_5-0.8b_pruneffn_memory/pruning/ffn_pruning.yaml`:
- Around line 18-20: The intermediate_size_list in the ffn_pruning.yaml config
contains values (5120) that exceed the 0.8B teacher model's intermediate size of
3584, which causes the pruning strategy to allow expansion instead of
compression. Update the intermediate_size_list to use only values that are
smaller than the teacher size of 3584 to ensure proper FFN pruning behavior for
the 0.8B model, and remove or replace any values that are equal to or larger
than the teacher intermediate size.

---

Nitpick comments:
In `@modelopt/torch/puzzletron/anymodel/models/qwen3_5/__init__.py`:
- Around line 16-17: The package __init__.py file uses wildcard imports from
qwen3_5_converter and qwen3_5_model_descriptor modules but does not explicitly
define the public API surface. Add an explicit __all__ list to this __init__.py
file after the import statements that specifies which symbols from those modules
should be publicly exported by the package. This __all__ list should contain the
names of the symbols that are intended to be part of the public API for users
importing from this package.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: b1c0e3bb-f37c-4a69-b9a2-40102d8d5134

📥 Commits

Reviewing files that changed from the base of the PR and between d8eb5e5 and c1f2b9c.

📒 Files selected for processing (32)
  • .agents/skills/puzzletron/README.md
  • .agents/skills/puzzletron/SKILL.md
  • .agents/skills/puzzletron/adding_new_model_tutorial.md
  • .agents/skills/puzzletron/all_progress.py
  • .agents/skills/puzzletron/mip_losses.py
  • .agents/skills/puzzletron/mip_progress.py
  • .agents/skills/puzzletron/mip_sweep.py
  • .claude/skills/puzzletron
  • CHANGELOG.rst
  • examples/puzzletron/README.md
  • examples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/llama-3_1-8B_pruneffn_memory.yaml
  • examples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/validate_model_defaults.yaml
  • examples/puzzletron/configs/qwen3_5-0.8b_pruneffn_memory/pruning/attn_pruning.yaml
  • examples/puzzletron/configs/qwen3_5-0.8b_pruneffn_memory/pruning/ffn_pruning.yaml
  • examples/puzzletron/configs/qwen3_5-0.8b_pruneffn_memory/pruning/hidden_dim_pruning.yaml
  • examples/puzzletron/configs/qwen3_5-0.8b_pruneffn_memory/pruning/pruning_defaults.yaml
  • examples/puzzletron/configs/qwen3_5-0.8b_pruneffn_memory/qwen3_5-0.8b_pruneffn_memory.yaml
  • examples/puzzletron/configs/qwen3_5-0.8b_pruneffn_memory/qwen3_5.yaml
  • examples/puzzletron/configs/qwen3_5-0.8b_pruneffn_memory/validate_model_defaults.yaml
  • examples/puzzletron/configs/qwen3_5-0.8b_pruneffn_memory/validate_solutions_defaults.yaml
  • examples/puzzletron/configs/qwen3_5-2B_pruneffn_memory/pruning/attn_pruning.yaml
  • examples/puzzletron/configs/qwen3_5-2B_pruneffn_memory/pruning/ffn_pruning.yaml
  • examples/puzzletron/configs/qwen3_5-2B_pruneffn_memory/pruning/hidden_dim_pruning.yaml
  • examples/puzzletron/configs/qwen3_5-2B_pruneffn_memory/pruning/pruning_defaults.yaml
  • examples/puzzletron/configs/qwen3_5-2B_pruneffn_memory/qwen3_5-2B_pruneffn_memory.yaml
  • examples/puzzletron/configs/qwen3_5-2B_pruneffn_memory/qwen3_5.yaml
  • examples/puzzletron/configs/qwen3_5-2B_pruneffn_memory/validate_model_defaults.yaml
  • examples/puzzletron/configs/qwen3_5-2B_pruneffn_memory/validate_solutions_defaults.yaml
  • modelopt/torch/puzzletron/anymodel/models/__init__.py
  • modelopt/torch/puzzletron/anymodel/models/qwen3_5/__init__.py
  • modelopt/torch/puzzletron/anymodel/models/qwen3_5/qwen3_5_converter.py
  • modelopt/torch/puzzletron/anymodel/models/qwen3_5/qwen3_5_model_descriptor.py

import sys; sys.path.insert(0, '.')
from modelopt.torch.puzzletron.anymodel.model_descriptor import ModelDescriptorFactory
from transformers import AutoConfig
cfg = AutoConfig.from_pretrained('<hf_model_path>', trust_remote_code=True)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

CRITICAL: Remove hardcoded trust_remote_code=True from AutoConfig loading.

Lines 130 and 148 hardcode trust_remote_code=True when loading the model config. Per SECURITY.md, this is an RCE vector if the model source is untrusted or user-supplied. The flag should default to False and let the user explicitly opt in if needed.

Recommended fix: Default to trust_remote_code=False. If the user encounters a "custom modeling code" error, explain the situation and ask if they trust the model source before retrying with trust_remote_code=True.

# Step 1 (line 126-134): First attempt with default safe setting
cfg = AutoConfig.from_pretrained('<hf_model_path>', trust_remote_code=False)
# If that fails with a custom modeling error, ask the user to confirm trust

# Step 2 (line 146-154): Same approach

Also applies to: 148-148

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.agents/skills/puzzletron/SKILL.md at line 130, The
AutoConfig.from_pretrained calls have hardcoded trust_remote_code=True which is
a security vulnerability when handling untrusted model sources. Change both
occurrences of AutoConfig.from_pretrained (at the initial config loading and the
retry logic) to default to trust_remote_code=False. Wrap the
AutoConfig.from_pretrained calls in try-except blocks to catch custom modeling
code errors, and when such an error occurs, prompt the user to confirm they
trust the model source before retrying with trust_remote_code=True if they
explicitly approve.

Source: Coding guidelines

Comment on lines +18 to +20
# teacher_intermediate_size is 6144
intermediate_size_list: [1280, 2560, 3584, 5120]
mlp_init_mode: "PruneByActivationsLog"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Align 0.8B FFN candidates with the same tree’s teacher size.

Line 18/Line 19 use 2B-scale defaults (6144, 5120) that conflict with the 0.8B teacher size documented in qwen3_5-0.8b_pruneffn_memory.yaml Line 25 (3584). This preset becomes unsafe when used directly because it allows expansion instead of pruning.

Suggested fix
-# teacher_intermediate_size is 6144
-intermediate_size_list: [1280, 2560, 3584, 5120]
+# teacher_intermediate_size is 3584
+intermediate_size_list: [768, 1536, 2048, 3072]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# teacher_intermediate_size is 6144
intermediate_size_list: [1280, 2560, 3584, 5120]
mlp_init_mode: "PruneByActivationsLog"
# teacher_intermediate_size is 3584
intermediate_size_list: [768, 1536, 2048, 3072]
mlp_init_mode: "PruneByActivationsLog"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@examples/puzzletron/configs/qwen3_5-0.8b_pruneffn_memory/pruning/ffn_pruning.yaml`
around lines 18 - 20, The intermediate_size_list in the ffn_pruning.yaml config
contains values (5120) that exceed the 0.8B teacher model's intermediate size of
3584, which causes the pruning strategy to allow expansion instead of
compression. Update the intermediate_size_list to use only values that are
smaller than the teacher size of 3584 to ensure proper FFN pruning behavior for
the 0.8B model, and remove or replace any values that are equal to or larger
than the teacher intermediate size.

@codecov

codecov Bot commented Jun 22, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 58.82353% with 49 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.08%. Comparing base (bcd8dd4) to head (c1f2b9c).
⚠️ Report is 20 commits behind head on main.

Files with missing lines Patch % Lines
...nymodel/models/qwen3_5/qwen3_5_model_descriptor.py 57.95% 37 Missing ⚠️
...etron/anymodel/models/qwen3_5/qwen3_5_converter.py 57.14% 12 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #1784       +/-   ##
===========================================
+ Coverage   58.45%   76.08%   +17.62%     
===========================================
  Files         510      514        +4     
  Lines       56271    56911      +640     
===========================================
+ Hits        32896    43300    +10404     
+ Misses      23375    13611     -9764     
Flag Coverage Δ
examples 41.84% <58.82%> (+19.40%) ⬆️
gpu 57.77% <58.82%> (+37.17%) ⬆️
regression 14.67% <0.00%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's different between 0.8B and 2B yamls? Mostly looks same. Can we have just one and mention a readme in this directory that mentions what to change if users want to try pruning a larger qwen3.5 variant? We dont want to maintain pruning yamls for all qwen3.5 variants if they only different in size and everything else is same

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants