Add: support input_shape_profile for trt-rtx ep#1782
Conversation
Signed-off-by: haoxiz <haoxiz@nvidia.com>
📝 WalkthroughWalkthroughAdds ChangesInput Shape Profile Pipeline
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Caution Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional.
❌ Failed checks (1 error)
✅ Passed checks (5 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
|
There was a problem hiding this comment.
Warning
CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.
Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@modelopt/onnx/quantization/ort_utils.py`:
- Around line 595-603: The issue is that after _prepare_ep_list filters the
calibration_eps list to remove unavailable providers, the enumeration of
execution_providers uses indices from the filtered list instead of the original
list, causing the input_shapes_profile indices to misalign. To fix this,
enumerate over the original calibration_eps list instead of the filtered
execution_providers list when building the tuple pairs, using the index to
access input_shapes_profile correctly, and mapping each original ep to either
the profile (if available) or the filtered execution_providers equivalent.
In `@modelopt/onnx/quantization/quantize.py`:
- Around line 557-559: The input_shapes_profile is being created from
calibration_eps before it has been finalized by the update_trt_ep_support
function, causing potential sync issues downstream. Move the conditional block
that checks if input_shapes_profile is None and calls
create_input_shapes_profile with model_id and calibration_eps to execute after
update_trt_ep_support has been called, ensuring calibration_eps reflects the
final list of execution providers before generating the profile.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 19ed1a5a-2793-4772-b650-d3982467b520
📒 Files selected for processing (6)
modelopt/onnx/quantization/__main__.pymodelopt/onnx/quantization/fp8.pymodelopt/onnx/quantization/graph_utils.pymodelopt/onnx/quantization/int8.pymodelopt/onnx/quantization/ort_utils.pymodelopt/onnx/quantization/quantize.py
| execution_providers = _prepare_ep_list(calibration_eps) | ||
| if input_shapes_profile is not None: | ||
| assert len(calibration_eps) == len(input_shapes_profile), ( | ||
| "Number of calibration EPs and number of input-shapes-profile don't match" | ||
| ) | ||
| execution_providers = [ | ||
| (ep, input_shapes_profile[idx]) if input_shapes_profile[idx] else ep | ||
| for idx, ep in enumerate(execution_providers) | ||
| ] |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
python - <<'PY'
# Demonstrates why enumerate(filtered_execution_providers) can misalign profiles.
calibration_eps = ["trt", "cuda:0"] # requested order
input_shapes_profile = [{"trt_profile_min_shapes": "x"}, {}]
# Simulate `_prepare_ep_list(calibration_eps)` when TRT is unavailable:
execution_providers = [("CUDAExecutionProvider", {"device_id": 0})]
mapped = [
(ep, input_shapes_profile[idx]) if input_shapes_profile[idx] else ep
for idx, ep in enumerate(execution_providers)
]
print("Mapped providers:", mapped)
print("Expected: CUDA should not receive TRT profile.")
PYRepository: NVIDIA/Model-Optimizer
Length of output: 212
🏁 Script executed:
# First, let's examine the actual code at the specified file and lines
head -610 modelopt/onnx/quantization/ort_utils.py | tail -30Repository: NVIDIA/Model-Optimizer
Length of output: 1389
🏁 Script executed:
# Check the _prepare_ep_list function to understand what it does
rg -A 15 "def _prepare_ep_list" modelopt/onnx/quantization/ort_utils.pyRepository: NVIDIA/Model-Optimizer
Length of output: 937
🏁 Script executed:
# Look for context around the function that contains lines 595-603
rg -B 20 "execution_providers = _prepare_ep_list" modelopt/onnx/quantization/ort_utils.py | head -50Repository: NVIDIA/Model-Optimizer
Length of output: 798
🏁 Script executed:
# Search for where input_shapes_profile is used to understand the expected alignment
rg -n "input_shapes_profile" modelopt/onnx/quantization/ort_utils.pyRepository: NVIDIA/Model-Optimizer
Length of output: 895
🏁 Script executed:
# Get the full function containing lines 595-603
rg -B 50 "execution_providers = _prepare_ep_list" modelopt/onnx/quantization/ort_utils.py | grep -A 50 "^[0-9]*-def "Repository: NVIDIA/Model-Optimizer
Length of output: 48
🏁 Script executed:
# Check line numbers more precisely
sed -n '555,620p' modelopt/onnx/quantization/ort_utils.pyRepository: NVIDIA/Model-Optimizer
Length of output: 2616
🏁 Script executed:
# Search for tests related to input_shapes_profile and calibrate_ort
fd -e py test | xargs rg -l "input_shapes_profile|calibrate_ort" 2>/dev/nullRepository: NVIDIA/Model-Optimizer
Length of output: 50
🏁 Script executed:
# Check _prepare_ep_list complete implementation
sed -n '/^def _prepare_ep_list/,/^def /p' modelopt/onnx/quantization/ort_utils.py | head -60Repository: NVIDIA/Model-Optimizer
Length of output: 2115
🏁 Script executed:
# Search for tests in tests directory
find tests -name "*.py" -type f | xargs grep -l "input_shapes_profile\|calibrate_ort" 2>/dev/null || echo "No tests found matching pattern"Repository: NVIDIA/Model-Optimizer
Length of output: 98
🏁 Script executed:
# Check if there are any tests for ort_utils specifically
find tests -name "*ort*" -type fRepository: NVIDIA/Model-Optimizer
Length of output: 1267
🏁 Script executed:
# Look for any other calls to calibrate_ort to understand the impact
rg "calibrate_ort" --type py -A 3 -B 1Repository: NVIDIA/Model-Optimizer
Length of output: 48
🏁 Script executed:
# Check test_ort_patching.py for relevant tests
cat tests/gpu/onnx/test_ort_patching.pyRepository: NVIDIA/Model-Optimizer
Length of output: 28863
🏁 Script executed:
# Check test_onnx_export_cpu.py for input_shapes_profile usage
rg -B 5 -A 10 "input_shapes_profile" tests/unit/torch/quantization/test_onnx_export_cpu.pyRepository: NVIDIA/Model-Optimizer
Length of output: 48
🏁 Script executed:
# Look at the function signature and documentation for calibrate_ort
rg -B 10 "def calibrate_ort" modelopt/onnx/quantization/ort_utils.pyRepository: NVIDIA/Model-Optimizer
Length of output: 48
🏁 Script executed:
# Check if there are any docstrings or comments explaining the expected behavior
sed -n '535,560p' modelopt/onnx/quantization/ort_utils.pyRepository: NVIDIA/Model-Optimizer
Length of output: 1254
Profile indexing must use original EP list, not filtered providers.
Lines 597–601 use enumerate(execution_providers) after _prepare_ep_list(...) filters the input. When an early EP is dropped (e.g., TensorRT unavailable), the remaining providers are shifted, causing profile indices to misalign. For example, if calibration_eps=["trt", "cuda"] but TensorRT is unavailable, CUDA receives the TensorRT profile instead of an empty one, which can break ORT setup at runtime.
Suggested fix
- execution_providers = _prepare_ep_list(calibration_eps)
- if input_shapes_profile is not None:
- assert len(calibration_eps) == len(input_shapes_profile), (
- "Number of calibration EPs and number of input-shapes-profile don't match"
- )
- execution_providers = [
- (ep, input_shapes_profile[idx]) if input_shapes_profile[idx] else ep
- for idx, ep in enumerate(execution_providers)
- ]
+ execution_providers = []
+ if input_shapes_profile is not None:
+ assert len(calibration_eps) == len(input_shapes_profile), (
+ "Number of calibration EPs and number of input-shapes-profile don't match"
+ )
+
+ for idx, requested_ep in enumerate(calibration_eps):
+ prepared = _prepare_ep_list([requested_ep])
+ if not prepared:
+ continue
+ ep = prepared[0]
+
+ profile = {} if input_shapes_profile is None else input_shapes_profile[idx]
+ if not profile:
+ execution_providers.append(ep)
+ continue
+
+ if isinstance(ep, tuple):
+ ep_name, ep_options = ep
+ execution_providers.append((ep_name, {**ep_options, **profile}))
+ else:
+ execution_providers.append((ep, profile))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@modelopt/onnx/quantization/ort_utils.py` around lines 595 - 603, The issue is
that after _prepare_ep_list filters the calibration_eps list to remove
unavailable providers, the enumeration of execution_providers uses indices from
the filtered list instead of the original list, causing the input_shapes_profile
indices to misalign. To fix this, enumerate over the original calibration_eps
list instead of the filtered execution_providers list when building the tuple
pairs, using the index to access input_shapes_profile correctly, and mapping
each original ep to either the profile (if available) or the filtered
execution_providers equivalent.
| if input_shapes_profile is None and model_id: | ||
| input_shapes_profile = create_input_shapes_profile(model_id, calibration_eps) | ||
|
|
There was a problem hiding this comment.
Build inferred profiles after calibration_eps is finalized.
At Line 557, profiles are inferred before update_trt_ep_support(...) updates calibration_eps later in this function. That can leave input_shapes_profile out-of-sync (length/order) with final EPs and trigger downstream failures.
💡 Suggested fix
- if input_shapes_profile is None and model_id:
- input_shapes_profile = create_input_shapes_profile(model_id, calibration_eps)
@@
trt_plugins = update_trt_ep_support(calibration_eps, has_dds_op, has_custom_op, trt_plugins) # type: ignore[arg-type]
+
+ if input_shapes_profile is None and model_id:
+ input_shapes_profile = create_input_shapes_profile(model_id, calibration_eps)
+ elif input_shapes_profile is not None and len(input_shapes_profile) != len(calibration_eps):
+ raise ValueError(
+ "Number of calibration EPs and number of input-shapes-profile don't match"
+ )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@modelopt/onnx/quantization/quantize.py` around lines 557 - 559, The
input_shapes_profile is being created from calibration_eps before it has been
finalized by the update_trt_ep_support function, causing potential sync issues
downstream. Move the conditional block that checks if input_shapes_profile is
None and calls create_input_shapes_profile with model_id and calibration_eps to
execute after update_trt_ep_support has been called, ensuring calibration_eps
reflects the final list of execution providers before generating the profile.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1782 +/- ##
==========================================
- Coverage 77.09% 75.69% -1.41%
==========================================
Files 511 511
Lines 56168 58272 +2104
==========================================
+ Hits 43302 44107 +805
- Misses 12866 14165 +1299
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
What does this PR do?
Add support for onnx quantization and support model_id as input, which fix missing input_shpae_profile problem for some version of trt-rtx
Usage
Testing
Tested on 4 popular llm models on all popular quantization method(int4, fp8, int8)
Before your PR is "Ready for review"
CONTRIBUTING.md: ✅Summary by CodeRabbit
model_idparameter to the ONNX quantization CLI and core quantization functions, allowing automatic generation of input shape profiles when not explicitly provided.