docs: add inference_guide with validated 7B+ models (Ascend NPU)#268
docs: add inference_guide with validated 7B+ models (Ascend NPU)#268EdisonSu768 wants to merge 5 commits into
Conversation
New `inference_guide/` section documenting already-validated open-weight LLM inference, mirroring the structure of `training_guides/training-runtimes`: - Two validated models above the 7B/8B class: - Qwen3-14B (dense, BF16) — recommended engine vLLM-Ascend - Qwen3-30B-A3B (MoE, BF16) — recommended engine MindIE - Runtime images catalog (NPU vLLM-Ascend + MindIE; NVIDIA GPU note) - Per-model namespace-scoped `ServingRuntime` + `InferenceService` assets (not ClusterServingRuntime), one per engine/TP combination - Measured open-loop per-replica benchmark tables (guidellm, 4 workloads, rate 1-9) with the dense→vLLM / MoE→MindIE engine-selection finding Lint and build pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
WalkthroughAdds a new inference guide section for Qwen3-30B-A3B on Huawei Ascend 910B3: three KServe YAML asset files (MindIE TP=2, MindIE TP=4, vLLM-Ascend TP=4) and two MDX pages (top-level inference guide index, Qwen3-30B-A3B model page). Twelve custom dictionary terms are also appended to the cspell wordlist. ChangesInference Guide: Qwen3-30B-A3B on Ascend 910B3
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/en/inference_guide/assets/qwen3-30b-a3b/qwen3-30b-a3b-mindie-tp4.yaml`:
- Around line 60-107: The bash script in the diff lacks strict error handling
mode, which means failed commands like `source` will be silently ignored and the
script will continue executing with a potentially incomplete configuration. Add
strict mode directives (set -e and optionally set -u and set -o pipefail)
immediately after the shebang and before the help() function definition to
ensure the script fails immediately if any command fails, preventing
mindieservice_daemon from starting with partial or broken configuration.
In `@docs/en/inference_guide/index.mdx`:
- Around line 78-85: The documentation instructs users to edit the manifest file
but then applies the remote URL directly using kubectl apply, which bypasses any
local edits. This means the user's changes to metadata.namespace, image tags,
and storageUri are ignored, leaving the deployment with unintended defaults. To
fix this, modify the instructions to first download the remote YAML file to a
local location using curl or wget (storing it in a variable or file), then edit
that local file, and finally apply the local file path instead of the remote URL
in the kubectl apply command.
In `@docs/en/inference_guide/qwen3-14b.mdx`:
- Around line 43-47: The bash code snippet includes a comment stating to "edit
namespace / image tag / storageUri first" but then immediately applies the
remote file directly without demonstrating any editing step, creating a mismatch
between the instructions and the actual command. Either modify the bash commands
to show how to download the file first (using curl or wget), edit it locally,
and then apply the local copy, or update the introductory comment to accurately
reflect that the remote file is being applied directly without local
modifications.
In `@docs/en/inference_guide/qwen3-30b-a3b.mdx`:
- Around line 49-53: The bash snippet instructs users to edit namespace, image
tag, and storageUri values before applying, but then immediately applies from a
remote URL without incorporating those edits. Restructure the snippet to
download the manifest file first using curl or wget into a local variable, then
apply the local file after editing. Alternatively, show how to apply the remote
URL with kubectl set or sed to inject the edited values, ensuring the documented
edit steps actually take effect when kubectl apply is executed.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: ded490a7-b19e-4d10-bb06-8121804fb4c9
📒 Files selected for processing (7)
docs/en/inference_guide/assets/qwen3-14b/qwen3-14b-vllm-ascend-tp1.yamldocs/en/inference_guide/assets/qwen3-14b/qwen3-14b-vllm-ascend-tp2.yamldocs/en/inference_guide/assets/qwen3-30b-a3b/qwen3-30b-a3b-mindie-tp4.yamldocs/en/inference_guide/assets/qwen3-30b-a3b/qwen3-30b-a3b-vllm-ascend-tp4.yamldocs/en/inference_guide/index.mdxdocs/en/inference_guide/qwen3-14b.mdxdocs/en/inference_guide/qwen3-30b-a3b.mdx
| #!/bin/bash | ||
| # run_mindie.sh — start MindIE Service for a given model. | ||
| # Required: --model-name, --model-path. Optional: --ip, --max-seq-len, | ||
| # --max-iter-times, --world-size, ... (run with --help for the full list). | ||
| help() { awk -F'### ' '/^###/ { print $2 }' "$0"; } | ||
| if [[ $# == 0 ]] || [[ "$1" == "--help" ]]; then help; exit 1; fi | ||
|
|
||
| total_count=$(npu-smi info -l | grep "Total Count" | awk -F ':' '{print $2}' | xargs) | ||
| if [[ -z "$total_count" ]]; then | ||
| echo "Error: unable to read device info (npu-smi). Check permissions/devices." | ||
| exit 1 | ||
| fi | ||
| echo "$total_count device(s) detected!" | ||
|
|
||
| echo "Setting toolkit envs..." | ||
| source /usr/local/Ascend/ascend-toolkit/set_env.sh | ||
| echo "Setting MindIE envs..." | ||
| source /usr/local/Ascend/mindie/set_env.sh | ||
|
|
||
| MF_SCRIPTS_ROOT=$(realpath "$(dirname "$0")") | ||
| export PYTHONPATH=$MF_SCRIPTS_ROOT/../:$PYTHONPATH | ||
|
|
||
| export MIES_INSTALL_PATH=/usr/local/Ascend/mindie/latest/mindie-service | ||
| CONFIG_FILE=${MIES_INSTALL_PATH}/conf/config.json | ||
|
|
||
| # defaults | ||
| BACKEND_TYPE="atb"; MAX_SEQ_LEN=16384; MAX_PREFILL_TOKENS=16384 | ||
| MAX_ITER_TIMES=1536; MAX_INPUT_TOKEN_LEN=12288; TRUNCATION=false | ||
| HTTPS_ENABLED=false; MULTI_NODES_INFER_ENABLED=false; NPU_MEM_SIZE=-1 | ||
| MAX_PREFILL_BATCH_SIZE=50; TEMPLATE_TYPE="Standard"; MAX_PREEMPT_COUNT=0 | ||
| SUPPORT_SELECT_BATCH=false; IP_ADDRESS="0.0.0.0"; PORT=8080 | ||
| MANAGEMENT_IP_ADDRESS="127.0.0.2"; MANAGEMENT_PORT=1026; METRICS_PORT=1027 | ||
|
|
||
| while [[ "$#" -gt 0 ]]; do | ||
| case $1 in | ||
| --model-path) MODEL_WEIGHT_PATH="$2"; shift ;; | ||
| --model-name) MODEL_NAME="$2"; shift ;; | ||
| --max-seq-len) MAX_SEQ_LEN="$2"; shift ;; | ||
| --max-iter-times) MAX_ITER_TIMES="$2"; shift ;; | ||
| --max-input-token-len) MAX_INPUT_TOKEN_LEN="$2"; shift ;; | ||
| --max-prefill-tokens) MAX_PREFILL_TOKENS="$2"; shift ;; | ||
| --world-size) WORLD_SIZE="$2"; shift ;; | ||
| --ip) IP_ADDRESS="$2"; shift ;; | ||
| --port) PORT="$2"; shift ;; | ||
| *) echo "Unknown parameter: $1"; exit 1 ;; | ||
| esac | ||
| shift | ||
| done |
There was a problem hiding this comment.
Startup script should fail fast on command errors.
Without strict mode, failed source/chmod/sed steps can be ignored and mindieservice_daemon may start with partial config.
🔧 Suggested fix
#!/bin/bash
+ set -euo pipefail
# run_mindie.sh — start MindIE Service for a given model.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| #!/bin/bash | |
| # run_mindie.sh — start MindIE Service for a given model. | |
| # Required: --model-name, --model-path. Optional: --ip, --max-seq-len, | |
| # --max-iter-times, --world-size, ... (run with --help for the full list). | |
| help() { awk -F'### ' '/^###/ { print $2 }' "$0"; } | |
| if [[ $# == 0 ]] || [[ "$1" == "--help" ]]; then help; exit 1; fi | |
| total_count=$(npu-smi info -l | grep "Total Count" | awk -F ':' '{print $2}' | xargs) | |
| if [[ -z "$total_count" ]]; then | |
| echo "Error: unable to read device info (npu-smi). Check permissions/devices." | |
| exit 1 | |
| fi | |
| echo "$total_count device(s) detected!" | |
| echo "Setting toolkit envs..." | |
| source /usr/local/Ascend/ascend-toolkit/set_env.sh | |
| echo "Setting MindIE envs..." | |
| source /usr/local/Ascend/mindie/set_env.sh | |
| MF_SCRIPTS_ROOT=$(realpath "$(dirname "$0")") | |
| export PYTHONPATH=$MF_SCRIPTS_ROOT/../:$PYTHONPATH | |
| export MIES_INSTALL_PATH=/usr/local/Ascend/mindie/latest/mindie-service | |
| CONFIG_FILE=${MIES_INSTALL_PATH}/conf/config.json | |
| # defaults | |
| BACKEND_TYPE="atb"; MAX_SEQ_LEN=16384; MAX_PREFILL_TOKENS=16384 | |
| MAX_ITER_TIMES=1536; MAX_INPUT_TOKEN_LEN=12288; TRUNCATION=false | |
| HTTPS_ENABLED=false; MULTI_NODES_INFER_ENABLED=false; NPU_MEM_SIZE=-1 | |
| MAX_PREFILL_BATCH_SIZE=50; TEMPLATE_TYPE="Standard"; MAX_PREEMPT_COUNT=0 | |
| SUPPORT_SELECT_BATCH=false; IP_ADDRESS="0.0.0.0"; PORT=8080 | |
| MANAGEMENT_IP_ADDRESS="127.0.0.2"; MANAGEMENT_PORT=1026; METRICS_PORT=1027 | |
| while [[ "$#" -gt 0 ]]; do | |
| case $1 in | |
| --model-path) MODEL_WEIGHT_PATH="$2"; shift ;; | |
| --model-name) MODEL_NAME="$2"; shift ;; | |
| --max-seq-len) MAX_SEQ_LEN="$2"; shift ;; | |
| --max-iter-times) MAX_ITER_TIMES="$2"; shift ;; | |
| --max-input-token-len) MAX_INPUT_TOKEN_LEN="$2"; shift ;; | |
| --max-prefill-tokens) MAX_PREFILL_TOKENS="$2"; shift ;; | |
| --world-size) WORLD_SIZE="$2"; shift ;; | |
| --ip) IP_ADDRESS="$2"; shift ;; | |
| --port) PORT="$2"; shift ;; | |
| *) echo "Unknown parameter: $1"; exit 1 ;; | |
| esac | |
| shift | |
| done | |
| #!/bin/bash | |
| set -euo pipefail | |
| # run_mindie.sh — start MindIE Service for a given model. | |
| # Required: --model-name, --model-path. Optional: --ip, --max-seq-len, | |
| # --max-iter-times, --world-size, ... (run with --help for the full list). | |
| help() { awk -F'### ' '/^###/ { print $2 }' "$0"; } | |
| if [[ $# == 0 ]] || [[ "$1" == "--help" ]]; then help; exit 1; fi | |
| total_count=$(npu-smi info -l | grep "Total Count" | awk -F ':' '{print $2}' | xargs) | |
| if [[ -z "$total_count" ]]; then | |
| echo "Error: unable to read device info (npu-smi). Check permissions/devices." | |
| exit 1 | |
| fi | |
| echo "$total_count device(s) detected!" | |
| echo "Setting toolkit envs..." | |
| source /usr/local/Ascend/ascend-toolkit/set_env.sh | |
| echo "Setting MindIE envs..." | |
| source /usr/local/Ascend/mindie/set_env.sh | |
| MF_SCRIPTS_ROOT=$(realpath "$(dirname "$0")") | |
| export PYTHONPATH=$MF_SCRIPTS_ROOT/../:$PYTHONPATH | |
| export MIES_INSTALL_PATH=/usr/local/Ascend/mindie/latest/mindie-service | |
| CONFIG_FILE=${MIES_INSTALL_PATH}/conf/config.json | |
| # defaults | |
| BACKEND_TYPE="atb"; MAX_SEQ_LEN=16384; MAX_PREFILL_TOKENS=16384 | |
| MAX_ITER_TIMES=1536; MAX_INPUT_TOKEN_LEN=12288; TRUNCATION=false | |
| HTTPS_ENABLED=false; MULTI_NODES_INFER_ENABLED=false; NPU_MEM_SIZE=-1 | |
| MAX_PREFILL_BATCH_SIZE=50; TEMPLATE_TYPE="Standard"; MAX_PREEMPT_COUNT=0 | |
| SUPPORT_SELECT_BATCH=false; IP_ADDRESS="0.0.0.0"; PORT=8080 | |
| MANAGEMENT_IP_ADDRESS="127.0.0.2"; MANAGEMENT_PORT=1026; METRICS_PORT=1027 | |
| while [[ "$#" -gt 0 ]]; do | |
| case $1 in | |
| --model-path) MODEL_WEIGHT_PATH="$2"; shift ;; | |
| --model-name) MODEL_NAME="$2"; shift ;; | |
| --max-seq-len) MAX_SEQ_LEN="$2"; shift ;; | |
| --max-iter-times) MAX_ITER_TIMES="$2"; shift ;; | |
| --max-input-token-len) MAX_INPUT_TOKEN_LEN="$2"; shift ;; | |
| --max-prefill-tokens) MAX_PREFILL_TOKENS="$2"; shift ;; | |
| --world-size) WORLD_SIZE="$2"; shift ;; | |
| --ip) IP_ADDRESS="$2"; shift ;; | |
| --port) PORT="$2"; shift ;; | |
| *) echo "Unknown parameter: $1"; exit 1 ;; | |
| esac | |
| shift | |
| done |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/en/inference_guide/assets/qwen3-30b-a3b/qwen3-30b-a3b-mindie-tp4.yaml`
around lines 60 - 107, The bash script in the diff lacks strict error handling
mode, which means failed commands like `source` will be silently ignored and the
script will continue executing with a potentially incomplete configuration. Add
strict mode directives (set -e and optionally set -u and set -o pipefail)
immediately after the shebang and before the help() function definition to
ensure the script fails immediately if any command fails, preventing
mindieservice_daemon from starting with partial or broken configuration.
Deploying alauda-ai with
|
| Latest commit: |
46b1d6d
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://a16cec19.alauda-ai.pages.dev |
| Branch Preview URL: | https://docs-inference-guide-validat.alauda-ai.pages.dev |
… +3 models - Host all YAML assets + HTML reports under docs/public/ so customers download from the docs site (site-absolute /inference_guide/... links), not GitHub. - Show the complete benchmark data: full 22-column open-loop sweeps (rate 1-9 x 4 workloads x both engines x TP, TTFT/E2E/ITL/TPS at p90/p95/p99/mean) in collapsible <details>, plus the rendered HTML reports as downloadable artifacts. Tables generated faithfully from the source reports (no hand-transcription). - Add three more validated models (5 total): - DeepSeek-R1-Distill-Llama-8B (dense, mature Llama path anchor) - DeepSeek-R1-Distill-Llama-70B (dense, TP=8; accuracy openllm 6-task mean 0.722) - GLM-5.1-W4A8 (MoE, W4A8 quantized, TP=8; Partner-Guide chatbot sweep) Each with a namespace-scoped ServingRuntime + InferenceService asset. - Add domain terms to the cspell dictionary. Lint and build pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Move the YAML assets and HTML reports from docs/public/ back under
docs/en/inference_guide/{assets,reports}/ and link them via GitHub
(tree/raw URLs for YAML, blob URL for reports) — matching the existing
training_guides/training-runtimes convention. Reverts the docs-site
public-hosting approach.
Lint and build pass.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Remove the copied model-auto HTML benchmark reports (and their links) — do not ship them in our docs. - Keep all benchmark *results* (saturation-capacity tables, rate-1 snapshots, the full 22-column open-loop sweeps inline, accuracy table, GLM chatbot table) but remove the *analysis*: Tuning notes / Insights sections, the "Picking an engine" recommendations, and interpretive prose / "recommended" labels. Pages now present verified facts, configs, and data only. Lint and build pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…Qwen3-30B-A3B Apply the rate=1 chatbot ITL P90 ≈ 30ms SLO. Only Qwen3-30B-A3B (MindIE TP=2, ITL P90 30.8ms / mean 29.0) meets it; remove the models that do not: - Qwen3-14B (44.6ms), DeepSeek-R1-Distill-Llama-8B (~38ms), DeepSeek-R1-Distill-Llama-70B (56ms), GLM-5.1-W4A8 (218ms) — pages + assets. Add the SLO-compliant MindIE TP=2 asset (the TP=4 asset is 39.8ms, over SLO) and lead the deploy section with it. Trim the index runtime catalog and analysis text left over from the removed models. Lint and build pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
docs/en/inference_guide/qwen3-30b-a3b.mdx (1)
29-31: ⚡ Quick winClarify TP=2 availability for vLLM deployment assets.
The validation matrix states vLLM TP=2/TP=4, but the deploy table links only vLLM TP=4. Add a one-line note clarifying whether TP=2 is benchmark-only or provide the TP=2 asset link to avoid reader confusion.
Also applies to: 44-47
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/en/inference_guide/qwen3-30b-a3b.mdx` around lines 29 - 31, The validation matrix for vLLM-Ascend indicates support for both TP=2 and TP=4 configurations, but the corresponding deployment table link only references TP=4, creating ambiguity about TP=2 availability. Add a one-line clarifying note in or near the vLLM-Ascend row entries that explicitly states whether TP=2 is benchmark-only or provide the actual deployment asset link for TP=2 to resolve the discrepancy. Apply the same clarification to the other affected rows mentioned in the "Also applies to" section (lines 44-47).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/en/inference_guide/assets/qwen3-30b-a3b/qwen3-30b-a3b-mindie-tp2.yaml`:
- Around line 65-69: The validation for the total_count variable only checks if
it is empty using the -z test, but does not verify that it is a positive
integer. If total_count is zero or contains non-numeric characters, the device
ID generation logic downstream will produce invalid topology configurations.
Enhance the validation condition to check not only that total_count is non-empty
but also that it contains only digits and is greater than zero, rejecting any
non-numeric or zero values with an appropriate error message before the value is
used in device ID generation.
---
Nitpick comments:
In `@docs/en/inference_guide/qwen3-30b-a3b.mdx`:
- Around line 29-31: The validation matrix for vLLM-Ascend indicates support for
both TP=2 and TP=4 configurations, but the corresponding deployment table link
only references TP=4, creating ambiguity about TP=2 availability. Add a one-line
clarifying note in or near the vLLM-Ascend row entries that explicitly states
whether TP=2 is benchmark-only or provide the actual deployment asset link for
TP=2 to resolve the discrepancy. Apply the same clarification to the other
affected rows mentioned in the "Also applies to" section (lines 44-47).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: ad4c58bd-11d7-4bac-804b-ff593ac0fe27
📒 Files selected for processing (6)
.cspell/terms.txtdocs/en/inference_guide/assets/qwen3-30b-a3b/qwen3-30b-a3b-mindie-tp2.yamldocs/en/inference_guide/assets/qwen3-30b-a3b/qwen3-30b-a3b-mindie-tp4.yamldocs/en/inference_guide/assets/qwen3-30b-a3b/qwen3-30b-a3b-vllm-ascend-tp4.yamldocs/en/inference_guide/index.mdxdocs/en/inference_guide/qwen3-30b-a3b.mdx
✅ Files skipped from review due to trivial changes (1)
- .cspell/terms.txt
🚧 Files skipped from review as they are similar to previous changes (2)
- docs/en/inference_guide/assets/qwen3-30b-a3b/qwen3-30b-a3b-mindie-tp4.yaml
- docs/en/inference_guide/assets/qwen3-30b-a3b/qwen3-30b-a3b-vllm-ascend-tp4.yaml
| total_count=$(npu-smi info -l | grep "Total Count" | awk -F ':' '{print $2}' | xargs) | ||
| if [[ -z "$total_count" ]]; then | ||
| echo "Error: unable to read device info (npu-smi). Check permissions/devices." | ||
| exit 1 | ||
| fi |
There was a problem hiding this comment.
Validate total_count as a positive integer before building device IDs.
Line 66 only rejects empty output. If total_count is 0 or non-numeric, Line 113 can generate invalid topology and fail later with less actionable errors.
Suggested patch
- if [[ -z "$total_count" ]]; then
- echo "Error: unable to read device info (npu-smi). Check permissions/devices."
+ if [[ -z "$total_count" ]] || ! [[ "$total_count" =~ ^[0-9]+$ ]] || [[ "$total_count" -lt 1 ]]; then
+ echo "Error: invalid device count from npu-smi: '$total_count'. Check permissions/devices."
exit 1
fi
echo "$total_count device(s) detected!"
@@
# TP follows the allocated device count.
WORLD_SIZE=$total_count
NPU_DEVICE_IDS=$(seq -s, 0 $(($WORLD_SIZE - 1)))Also applies to: 112-114
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/en/inference_guide/assets/qwen3-30b-a3b/qwen3-30b-a3b-mindie-tp2.yaml`
around lines 65 - 69, The validation for the total_count variable only checks if
it is empty using the -z test, but does not verify that it is a positive
integer. If total_count is zero or contains non-numeric characters, the device
ID generation logic downstream will produce invalid topology configurations.
Enhance the validation condition to check not only that total_count is non-empty
but also that it contains only digits and is greater than zero, rejecting any
non-numeric or zero values with an appropriate error message before the value is
used in device ID generation.
What
New
docs/en/inference_guide/section documenting already-validated open-weight LLM inference, mirroring the structure oftraining_guides/training-runtimes. All content is derived from verified deployments + benchmarks (no fabricated numbers).Contents
Two validated models above the 7B/8B class:
Per the four asks:
ServingRuntime+InferenceService(not ClusterServingRuntime), one per model/engine/TP combination ✅guidellmopen-loop per-replica numbers (4 workloads × rate 1–9), with the dense→vLLM / MoE→MindIE engine-selection finding ✅Files
Scope note
Every validated 7B+ model currently runs on Ascend 910B NPU; there is no verified GPU benchmark at this size yet (only Qwen3.5-0.8B on A30). The doc therefore leads with NPU and lists the NVIDIA GPU runtime as the platform default with that gap flagged, rather than inventing GPU numbers.
Verification
yarn lint→ 0 errors / 0 warningsyarn build→ all 3 pages render🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Documentation