Skip to content

Commit 535fdfc

Browse files
ctmarinaswilldeacon
authored andcommitted
arm64: Use load LSE atomics for the non-return per-CPU atomic operations
The non-return per-CPU this_cpu_*() atomic operations are implemented as STADD/STCLR/STSET when FEAT_LSE is available. On many microarchitecture implementations, these instructions tend to be executed "far" in the interconnect or memory subsystem (unless the data is already in the L1 cache). This is in general more efficient when there is contention as it avoids bouncing cache lines between CPUs. The load atomics (e.g. LDADD without XZR as destination), OTOH, tend to be executed "near" with the data loaded into the L1 cache. STADD executed back to back as in srcu_read_{lock,unlock}*() incur an additional overhead due to the default posting behaviour on several CPU implementations. Since the per-CPU atomics are unlikely to be used concurrently on the same memory location, encourage the hardware to to execute them "near" by issuing load atomics - LDADD/LDCLR/LDSET - with the destination register unused (but not XZR). Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/e7d539ed-ced0-4b96-8ecd-048a5b803b85@paulmck-laptop Reported-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: Paul E. McKenney <paulmck@kernel.org> Cc: Will Deacon <will@kernel.org> Reviewed-by: Palmer Dabbelt <palmer@dabbelt.com> [will: Add comment and link to the discussion thread] Signed-off-by: Will Deacon <will@kernel.org>
1 parent b98c94e commit 535fdfc

1 file changed

Lines changed: 11 additions & 4 deletions

File tree

arch/arm64/include/asm/percpu.h

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,7 @@ __percpu_##name##_case_##sz(void *ptr, unsigned long val) \
7777
" stxr" #sfx "\t%w[loop], %" #w "[tmp], %[ptr]\n" \
7878
" cbnz %w[loop], 1b", \
7979
/* LSE atomics */ \
80-
#op_lse "\t%" #w "[val], %[ptr]\n" \
80+
#op_lse "\t%" #w "[val], %" #w "[tmp], %[ptr]\n" \
8181
__nops(3)) \
8282
: [loop] "=&r" (loop), [tmp] "=&r" (tmp), \
8383
[ptr] "+Q"(*(u##sz *)ptr) \
@@ -124,9 +124,16 @@ PERCPU_RW_OPS(8)
124124
PERCPU_RW_OPS(16)
125125
PERCPU_RW_OPS(32)
126126
PERCPU_RW_OPS(64)
127-
PERCPU_OP(add, add, stadd)
128-
PERCPU_OP(andnot, bic, stclr)
129-
PERCPU_OP(or, orr, stset)
127+
128+
/*
129+
* Use value-returning atomics for CPU-local ops as they are more likely
130+
* to execute "near" to the CPU (e.g. in L1$).
131+
*
132+
* https://lore.kernel.org/r/e7d539ed-ced0-4b96-8ecd-048a5b803b85@paulmck-laptop
133+
*/
134+
PERCPU_OP(add, add, ldadd)
135+
PERCPU_OP(andnot, bic, ldclr)
136+
PERCPU_OP(or, orr, ldset)
130137
PERCPU_RET_OP(add, add, ldadd)
131138

132139
#undef PERCPU_RW_OPS

0 commit comments

Comments
 (0)