Skip to content

Commit 19f3a40

Browse files
committed
Make rpcc() on arm64 get closer to what x86 returns
The Arm implementation of rpcc() uses the architected timer which is defined by the SBSA to be between 10-400MHz. These numbers are much smaller than the cycle counter frequency used by x86. Make the numbers closer by shifting the cycle counter up by the number of leading zeros in the cntfrq_el0 register which gets us closer to a noraml cpu clock cycle range.
1 parent 430ee31 commit 19f3a40

1 file changed

Lines changed: 3 additions & 1 deletion

File tree

common_arm64.h

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -81,10 +81,12 @@ static void __inline blas_lock(volatile BLASULONG *address){
8181
#if !defined(OS_DARWIN) && !defined (OS_ANDROID)
8282
static __inline BLASULONG rpcc(void){
8383
BLASULONG ret = 0;
84+
blasint shift;
8485

8586
__asm__ __volatile__ ("isb; mrs %0,cntvct_el0":"=r"(ret));
87+
__asm__ __volatile__ ("mrs %0,cntfrq_el0; clz %w0, %w0":"=&r"(shift));
8688

87-
return ret;
89+
return ret << shift;
8890
}
8991

9092
#define RPCC_DEFINED

0 commit comments

Comments
 (0)