Skip to content

Commit ee1ee6d

Browse files
KAGA-KOKOPeter Zijlstra
authored andcommitted
atomics: Provide rcuref - scalable reference counting
atomic_t based reference counting, including refcount_t, uses atomic_inc_not_zero() for acquiring a reference. atomic_inc_not_zero() is implemented with a atomic_try_cmpxchg() loop. High contention of the reference count leads to retry loops and scales badly. There is nothing to improve on this implementation as the semantics have to be preserved. Provide rcuref as a scalable alternative solution which is suitable for RCU managed objects. Similar to refcount_t it comes with overflow and underflow detection and mitigation. rcuref treats the underlying atomic_t as an unsigned integer and partitions this space into zones: 0x00000000 - 0x7FFFFFFF valid zone (1 .. (INT_MAX + 1) references) 0x80000000 - 0xBFFFFFFF saturation zone 0xC0000000 - 0xFFFFFFFE dead zone 0xFFFFFFFF no reference rcuref_get() unconditionally increments the reference count with atomic_add_negative_relaxed(). rcuref_put() unconditionally decrements the reference count with atomic_add_negative_release(). This unconditional increment avoids the inc_not_zero() problem, but requires a more complex implementation on the put() side when the count drops from 0 to -1. When this transition is detected then it is attempted to mark the reference count dead, by setting it to the midpoint of the dead zone with a single atomic_cmpxchg_release() operation. This operation can fail due to a concurrent rcuref_get() elevating the reference count from -1 to 0 again. If the unconditional increment in rcuref_get() hits a reference count which is marked dead (or saturated) it will detect it after the fact and bring back the reference count to the midpoint of the respective zone. The zones provide enough tolerance which makes it practically impossible to escape from a zone. The racy implementation of rcuref_put() requires to protect rcuref_put() against a grace period ending in order to prevent a subtle use after free. As RCU is the only mechanism which allows to protect against that, it is not possible to fully replace the atomic_inc_not_zero() based implementation of refcount_t with this scheme. The final drop is slightly more expensive than the atomic_dec_return() counterpart, but that's not the case which this is optimized for. The optimization is on the high frequeunt get()/put() pairs and their scalability. The performance of an uncontended rcuref_get()/put() pair where the put() is not dropping the last reference is still on par with the plain atomic operations, while at the same time providing overflow and underflow detection and mitigation. The performance of rcuref compared to plain atomic_inc_not_zero() and atomic_dec_return() based reference counting under contention: - Micro benchmark: All CPUs running a increment/decrement loop on an elevated reference count, which means the 0 to -1 transition never happens. The performance gain depends on microarchitecture and the number of CPUs and has been observed in the range of 1.3X to 4.7X - Conversion of dst_entry::__refcnt to rcuref and testing with the localhost memtier/memcached benchmark. That benchmark shows the reference count contention prominently. The performance gain depends on microarchitecture and the number of CPUs and has been observed in the range of 1.1X to 2.6X over the previous fix for the false sharing issue vs. struct dst_entry::__refcnt. When memtier is run over a real 1Gb network connection, there is a small gain on top of the false sharing fix. The two changes combined result in a 2%-5% total gain for that networked test. Reported-by: Wangyang Guo <wangyang.guo@intel.com> Reported-by: Arjan Van De Ven <arjan.van.de.ven@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20230323102800.158429195@linutronix.de
1 parent e5ab9ef commit ee1ee6d

4 files changed

Lines changed: 443 additions & 1 deletion

File tree

include/linux/rcuref.h

Lines changed: 155 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,155 @@
1+
/* SPDX-License-Identifier: GPL-2.0-only */
2+
#ifndef _LINUX_RCUREF_H
3+
#define _LINUX_RCUREF_H
4+
5+
#include <linux/atomic.h>
6+
#include <linux/bug.h>
7+
#include <linux/limits.h>
8+
#include <linux/lockdep.h>
9+
#include <linux/preempt.h>
10+
#include <linux/rcupdate.h>
11+
12+
#define RCUREF_ONEREF 0x00000000U
13+
#define RCUREF_MAXREF 0x7FFFFFFFU
14+
#define RCUREF_SATURATED 0xA0000000U
15+
#define RCUREF_RELEASED 0xC0000000U
16+
#define RCUREF_DEAD 0xE0000000U
17+
#define RCUREF_NOREF 0xFFFFFFFFU
18+
19+
/**
20+
* rcuref_init - Initialize a rcuref reference count with the given reference count
21+
* @ref: Pointer to the reference count
22+
* @cnt: The initial reference count typically '1'
23+
*/
24+
static inline void rcuref_init(rcuref_t *ref, unsigned int cnt)
25+
{
26+
atomic_set(&ref->refcnt, cnt - 1);
27+
}
28+
29+
/**
30+
* rcuref_read - Read the number of held reference counts of a rcuref
31+
* @ref: Pointer to the reference count
32+
*
33+
* Return: The number of held references (0 ... N)
34+
*/
35+
static inline unsigned int rcuref_read(rcuref_t *ref)
36+
{
37+
unsigned int c = atomic_read(&ref->refcnt);
38+
39+
/* Return 0 if within the DEAD zone. */
40+
return c >= RCUREF_RELEASED ? 0 : c + 1;
41+
}
42+
43+
extern __must_check bool rcuref_get_slowpath(rcuref_t *ref);
44+
45+
/**
46+
* rcuref_get - Acquire one reference on a rcuref reference count
47+
* @ref: Pointer to the reference count
48+
*
49+
* Similar to atomic_inc_not_zero() but saturates at RCUREF_MAXREF.
50+
*
51+
* Provides no memory ordering, it is assumed the caller has guaranteed the
52+
* object memory to be stable (RCU, etc.). It does provide a control dependency
53+
* and thereby orders future stores. See documentation in lib/rcuref.c
54+
*
55+
* Return:
56+
* False if the attempt to acquire a reference failed. This happens
57+
* when the last reference has been put already
58+
*
59+
* True if a reference was successfully acquired
60+
*/
61+
static inline __must_check bool rcuref_get(rcuref_t *ref)
62+
{
63+
/*
64+
* Unconditionally increase the reference count. The saturation and
65+
* dead zones provide enough tolerance for this.
66+
*/
67+
if (likely(!atomic_add_negative_relaxed(1, &ref->refcnt)))
68+
return true;
69+
70+
/* Handle the cases inside the saturation and dead zones */
71+
return rcuref_get_slowpath(ref);
72+
}
73+
74+
extern __must_check bool rcuref_put_slowpath(rcuref_t *ref);
75+
76+
/*
77+
* Internal helper. Do not invoke directly.
78+
*/
79+
static __always_inline __must_check bool __rcuref_put(rcuref_t *ref)
80+
{
81+
RCU_LOCKDEP_WARN(!rcu_read_lock_held() && preemptible(),
82+
"suspicious rcuref_put_rcusafe() usage");
83+
/*
84+
* Unconditionally decrease the reference count. The saturation and
85+
* dead zones provide enough tolerance for this.
86+
*/
87+
if (likely(!atomic_add_negative_release(-1, &ref->refcnt)))
88+
return false;
89+
90+
/*
91+
* Handle the last reference drop and cases inside the saturation
92+
* and dead zones.
93+
*/
94+
return rcuref_put_slowpath(ref);
95+
}
96+
97+
/**
98+
* rcuref_put_rcusafe -- Release one reference for a rcuref reference count RCU safe
99+
* @ref: Pointer to the reference count
100+
*
101+
* Provides release memory ordering, such that prior loads and stores are done
102+
* before, and provides an acquire ordering on success such that free()
103+
* must come after.
104+
*
105+
* Can be invoked from contexts, which guarantee that no grace period can
106+
* happen which would free the object concurrently if the decrement drops
107+
* the last reference and the slowpath races against a concurrent get() and
108+
* put() pair. rcu_read_lock()'ed and atomic contexts qualify.
109+
*
110+
* Return:
111+
* True if this was the last reference with no future references
112+
* possible. This signals the caller that it can safely release the
113+
* object which is protected by the reference counter.
114+
*
115+
* False if there are still active references or the put() raced
116+
* with a concurrent get()/put() pair. Caller is not allowed to
117+
* release the protected object.
118+
*/
119+
static inline __must_check bool rcuref_put_rcusafe(rcuref_t *ref)
120+
{
121+
return __rcuref_put(ref);
122+
}
123+
124+
/**
125+
* rcuref_put -- Release one reference for a rcuref reference count
126+
* @ref: Pointer to the reference count
127+
*
128+
* Can be invoked from any context.
129+
*
130+
* Provides release memory ordering, such that prior loads and stores are done
131+
* before, and provides an acquire ordering on success such that free()
132+
* must come after.
133+
*
134+
* Return:
135+
*
136+
* True if this was the last reference with no future references
137+
* possible. This signals the caller that it can safely schedule the
138+
* object, which is protected by the reference counter, for
139+
* deconstruction.
140+
*
141+
* False if there are still active references or the put() raced
142+
* with a concurrent get()/put() pair. Caller is not allowed to
143+
* deconstruct the protected object.
144+
*/
145+
static inline __must_check bool rcuref_put(rcuref_t *ref)
146+
{
147+
bool released;
148+
149+
preempt_disable();
150+
released = __rcuref_put(ref);
151+
preempt_enable();
152+
return released;
153+
}
154+
155+
#endif

include/linux/types.h

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -175,6 +175,12 @@ typedef struct {
175175
} atomic64_t;
176176
#endif
177177

178+
typedef struct {
179+
atomic_t refcnt;
180+
} rcuref_t;
181+
182+
#define RCUREF_INIT(i) { .refcnt = ATOMIC_INIT(i - 1) }
183+
178184
struct list_head {
179185
struct list_head *next, *prev;
180186
};

lib/Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ obj-y += bcd.o sort.o parser.o debug_locks.o random32.o \
4747
list_sort.o uuid.o iov_iter.o clz_ctz.o \
4848
bsearch.o find_bit.o llist.o memweight.o kfifo.o \
4949
percpu-refcount.o rhashtable.o base64.o \
50-
once.o refcount.o usercopy.o errseq.o bucket_locks.o \
50+
once.o refcount.o rcuref.o usercopy.o errseq.o bucket_locks.o \
5151
generic-radix-tree.o
5252
obj-$(CONFIG_STRING_SELFTEST) += test_string.o
5353
obj-y += string_helpers.o

0 commit comments

Comments
 (0)