Skip to content

Commit b3d5fd6

Browse files
visitorckwakpm00
authored andcommitted
lib/math/gcd: use static key to select implementation at runtime
Patch series "Optimize GCD performance on RISC-V by selecting implementation at runtime", v3. The current implementation of gcd() selects between the binary GCD and the odd-even GCD algorithm at compile time, depending on whether CONFIG_CPU_NO_EFFICIENT_FFS is set. On platforms like RISC-V, however, this compile-time decision can be misleading: even when the compiler emits ctz instructions based on the assumption that they are efficient (as is the case when CONFIG_RISCV_ISA_ZBB is enabled), the actual hardware may lack support for the Zbb extension. In such cases, ffs() falls back to a software implementation at runtime, making the binary GCD algorithm significantly slower than the odd-even variant. To address this, we introduce a static key to allow runtime selection between the binary and odd-even GCD implementations. On RISC-V, the kernel now checks for Zbb support during boot. If Zbb is unavailable, the static key is disabled so that gcd() consistently uses the more efficient odd-even algorithm in that scenario. Additionally, to further reduce code size, we select CONFIG_CPU_NO_EFFICIENT_FFS automatically when CONFIG_RISCV_ISA_ZBB is not enabled, avoiding compilation of the unused binary GCD implementation entirely on systems where it would never be executed. This series ensures that the most efficient GCD algorithm is used in practice and avoids compiling unnecessary code based on hardware capabilities and kernel configuration. This patch (of 3): On platforms like RISC-V, the compiler may generate hardware FFS instructions even if the underlying CPU does not actually support them. Currently, the GCD implementation is chosen at compile time based on CONFIG_CPU_NO_EFFICIENT_FFS, which can result in suboptimal behavior on such systems. Introduce a static key, efficient_ffs_key, to enable runtime selection between the binary GCD (using ffs) and the odd-even GCD implementation. This allows the kernel to default to the faster binary GCD when FFS is efficient, while retaining the ability to fall back when needed. Link: https://lkml.kernel.org/r/20250606134758.1308400-1-visitorckw@gmail.com Link: https://lkml.kernel.org/r/20250606134758.1308400-2-visitorckw@gmail.com Co-developed-by: Yu-Chun Lin <eleanor15x@gmail.com> Signed-off-by: Yu-Chun Lin <eleanor15x@gmail.com> Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Ching-Chun (Jim) Huang <jserv@ccns.ncku.edu.tw> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
1 parent 08eabe4 commit b3d5fd6

2 files changed

Lines changed: 18 additions & 12 deletions

File tree

include/linux/gcd.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,9 @@
33
#define _GCD_H
44

55
#include <linux/compiler.h>
6+
#include <linux/jump_label.h>
7+
8+
DECLARE_STATIC_KEY_TRUE(efficient_ffs_key);
69

710
unsigned long gcd(unsigned long a, unsigned long b) __attribute_const__;
811

lib/math/gcd.c

Lines changed: 15 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -11,22 +11,16 @@
1111
* has decent hardware division.
1212
*/
1313

14+
DEFINE_STATIC_KEY_TRUE(efficient_ffs_key);
15+
1416
#if !defined(CONFIG_CPU_NO_EFFICIENT_FFS)
1517

1618
/* If __ffs is available, the even/odd algorithm benchmarks slower. */
1719

18-
/**
19-
* gcd - calculate and return the greatest common divisor of 2 unsigned longs
20-
* @a: first value
21-
* @b: second value
22-
*/
23-
unsigned long gcd(unsigned long a, unsigned long b)
20+
static unsigned long binary_gcd(unsigned long a, unsigned long b)
2421
{
2522
unsigned long r = a | b;
2623

27-
if (!a || !b)
28-
return r;
29-
3024
b >>= __ffs(b);
3125
if (b == 1)
3226
return r & -r;
@@ -44,16 +38,27 @@ unsigned long gcd(unsigned long a, unsigned long b)
4438
}
4539
}
4640

47-
#else
41+
#endif
4842

4943
/* If normalization is done by loops, the even/odd algorithm is a win. */
44+
45+
/**
46+
* gcd - calculate and return the greatest common divisor of 2 unsigned longs
47+
* @a: first value
48+
* @b: second value
49+
*/
5050
unsigned long gcd(unsigned long a, unsigned long b)
5151
{
5252
unsigned long r = a | b;
5353

5454
if (!a || !b)
5555
return r;
5656

57+
#if !defined(CONFIG_CPU_NO_EFFICIENT_FFS)
58+
if (static_branch_likely(&efficient_ffs_key))
59+
return binary_gcd(a, b);
60+
#endif
61+
5762
/* Isolate lsbit of r */
5863
r &= -r;
5964

@@ -80,6 +85,4 @@ unsigned long gcd(unsigned long a, unsigned long b)
8085
}
8186
}
8287

83-
#endif
84-
8588
EXPORT_SYMBOL_GPL(gcd);

0 commit comments

Comments
 (0)