Commit 8d1ff1a
authored
MoE prefill bf16 perf improvement for qwen-3.5-35B-A3B (#18829)
| | Baseline | Batched | Speedup |
|--------------------|-----------|------------|---------|
| Prefill (1341 tok) | 588 tok/s | 1807 tok/s | 3.07x |
| Decode (128 tok) | 90 tok/s | 86 tok/s | ~1.0x | (noise?)1 parent b5cf3c3 commit 8d1ff1a
12 files changed
Lines changed: 1924 additions & 36 deletions
File tree
- backends/cuda
- benchmarks
- runtime
- shims
- tests
- triton/kernels
- examples/models/qwen3_5_moe
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
109 | 109 | | |
110 | 110 | | |
111 | 111 | | |
112 | | - | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
113 | 115 | | |
114 | 116 | | |
115 | 117 | | |
| |||
0 commit comments