Commit 0826d68
Rajalakshmi Srinivasaraghavan
POWER10: Change the packing format for bfloat16
As the new MMA instructions need the inputs in 4x2 order for bfloat16,
changing the format in copy/packing code. This avoids permute instructions
in the gemm kernel inner loop.1 parent 602a0c7 commit 0826d68
6 files changed
Lines changed: 1923 additions & 285 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
9 | 9 | | |
10 | 10 | | |
11 | 11 | | |
12 | | - | |
13 | | - | |
14 | | - | |
15 | | - | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
| |||
0 commit comments