Commit 8a578a7
arm64: atomics: lse: improve constraints for simple ops
We have overly conservative assembly constraints for the basic FEAT_LSE
atomic instructions, and using more accurate and permissive constraints
will allow for better code generation.
The FEAT_LSE basic atomic instructions have come in two forms:
LD{op}{order}{size} <Rs>, <Rt>, [<Rn>]
ST{op}{order}{size} <Rs>, [<Rn>]
The ST* forms are aliases of the LD* forms where:
ST{op}{order}{size} <Rs>, [<Rn>]
Is:
LD{op}{order}{size} <Rs>, XZR, [<Rn>]
For either form, both <Rs> and <Rn> are read but not written back to,
and <Rt> is written with the original value of the memory location.
Where (<Rt> == <Rs>) or (<Rt> == <Rn>), <Rt> is written *after* the
other register value(s) are consumed. There are no UNPREDICTABLE or
CONSTRAINED UNPREDICTABLE behaviours when any pair of <Rs>, <Rt>, or
<Rn> are the same register.
Our current inline assembly always uses <Rs> == <Rt>, treating this
register as both an input and an output (using a '+r' constraint). This
forces the compiler to do some unnecessary register shuffling and/or
redundant value generation.
For example, the compiler cannot reuse the <Rs> value, and currently GCC
11.1.0 will compile:
__lse_atomic_add(1, a);
__lse_atomic_add(1, b);
__lse_atomic_add(1, c);
As:
mov w3, #0x1
mov w4, w3
stadd w4, [x0]
mov w0, w3
stadd w0, [x1]
stadd w3, [x2]
We can improve this with more accurate constraints, separating <Rs> and
<Rt>, where <Rs> is an input-only register ('r'), and <Rt> is an
output-only value ('=r'). As <Rt> is written back after <Rs> is
consumed, it does not need to be earlyclobber ('=&r'), leaving the
compiler free to use the same register for both <Rs> and <Rt> where this
is desirable.
At the same time, the redundant 'r' constraint for `v` is removed, as
the `+Q` constraint is sufficient.
With this change, the above example becomes:
mov w3, #0x1
stadd w3, [x0]
stadd w3, [x1]
stadd w3, [x2]
I've made this change for the non-value-returning and FETCH ops. The
RETURN ops have a multi-instruction sequence for which we cannot use the
same constraints, and a subsequent patch will rewrite hte RETURN ops in
terms of the FETCH ops, relying on the ability for the compiler to reuse
the <Rs> value.
This is intended as an optimization.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20211210151410.2782645-5-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>1 parent 5e9e43c commit 8a578a7
1 file changed
Lines changed: 18 additions & 12 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
19 | | - | |
20 | | - | |
| 19 | + | |
| 20 | + | |
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
| |||
35 | 35 | | |
36 | 36 | | |
37 | 37 | | |
| 38 | + | |
| 39 | + | |
38 | 40 | | |
39 | 41 | | |
40 | | - | |
41 | | - | |
42 | | - | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
43 | 46 | | |
44 | 47 | | |
45 | | - | |
| 48 | + | |
46 | 49 | | |
47 | 50 | | |
48 | 51 | | |
| |||
124 | 127 | | |
125 | 128 | | |
126 | 129 | | |
127 | | - | |
128 | | - | |
| 130 | + | |
| 131 | + | |
129 | 132 | | |
130 | 133 | | |
131 | 134 | | |
| |||
143 | 146 | | |
144 | 147 | | |
145 | 148 | | |
| 149 | + | |
| 150 | + | |
146 | 151 | | |
147 | 152 | | |
148 | | - | |
149 | | - | |
150 | | - | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
151 | 157 | | |
152 | 158 | | |
153 | | - | |
| 159 | + | |
154 | 160 | | |
155 | 161 | | |
156 | 162 | | |
| |||
0 commit comments