Skip to content

Commit 09909e0

Browse files
James Morsebp3tk0v
authored andcommitted
x86/resctrl: Queue mon_event_read() instead of sending an IPI
Intel is blessed with an abundance of monitors, one per RMID, that can be read from any CPU in the domain. MPAMs monitors reside in the MMIO MSC, the number implemented is up to the manufacturer. This means when there are fewer monitors than needed, they need to be allocated and freed. MPAM's CSU monitors are used to back the 'llc_occupancy' monitor file. The CSU counter is allowed to return 'not ready' for a small number of micro-seconds after programming. To allow one CSU hardware monitor to be used for multiple control or monitor groups, the CPU accessing the monitor needs to be able to block when configuring and reading the counter. Worse, the domain may be broken up into slices, and the MMIO accesses for each slice may need performing from different CPUs. These two details mean MPAMs monitor code needs to be able to sleep, and IPI another CPU in the domain to read from a resource that has been sliced. mon_event_read() already invokes mon_event_count() via IPI, which means this isn't possible. On systems using nohz-full, some CPUs need to be interrupted to run kernel work as they otherwise stay in user-space running realtime workloads. Interrupting these CPUs should be avoided, and scheduling work on them may never complete. Change mon_event_read() to pick a housekeeping CPU, (one that is not using nohz_full) and schedule mon_event_count() and wait. If all the CPUs in a domain are using nohz-full, then an IPI is used as the fallback. This function is only used in response to a user-space filesystem request (not the timing sensitive overflow code). This allows MPAM to hide the slice behaviour from resctrl, and to keep the monitor-allocation in monitor.c. When the IPI fallback is used on machines where MPAM needs to make an access on multiple CPUs, the counter read will always fail. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Reviewed-by: Peter Newman <peternewman@google.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Babu Moger <babu.moger@amd.com> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com> Tested-by: Peter Newman <peternewman@google.com> Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Carl Worth <carl@os.amperecomputing.com> # arm64 Link: https://lore.kernel.org/r/20240213184438.16675-14-james.morse@arm.com Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
1 parent a4846aa commit 09909e0

2 files changed

Lines changed: 25 additions & 3 deletions

File tree

arch/x86/kernel/cpu/resctrl/ctrlmondata.c

Lines changed: 24 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,8 @@
1919
#include <linux/kernfs.h>
2020
#include <linux/seq_file.h>
2121
#include <linux/slab.h>
22+
#include <linux/tick.h>
23+
2224
#include "internal.h"
2325

2426
/*
@@ -522,12 +524,21 @@ int rdtgroup_schemata_show(struct kernfs_open_file *of,
522524
return ret;
523525
}
524526

527+
static int smp_mon_event_count(void *arg)
528+
{
529+
mon_event_count(arg);
530+
531+
return 0;
532+
}
533+
525534
void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
526535
struct rdt_domain *d, struct rdtgroup *rdtgrp,
527536
int evtid, int first)
528537
{
538+
int cpu;
539+
529540
/*
530-
* setup the parameters to send to the IPI to read the data.
541+
* Setup the parameters to pass to mon_event_count() to read the data.
531542
*/
532543
rr->rgrp = rdtgrp;
533544
rr->evtid = evtid;
@@ -536,7 +547,18 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
536547
rr->val = 0;
537548
rr->first = first;
538549

539-
smp_call_function_any(&d->cpu_mask, mon_event_count, rr, 1);
550+
cpu = cpumask_any_housekeeping(&d->cpu_mask);
551+
552+
/*
553+
* cpumask_any_housekeeping() prefers housekeeping CPUs, but
554+
* are all the CPUs nohz_full? If yes, pick a CPU to IPI.
555+
* MPAM's resctrl_arch_rmid_read() is unable to read the
556+
* counters on some platforms if its called in IRQ context.
557+
*/
558+
if (tick_nohz_full_cpu(cpu))
559+
smp_call_function_any(&d->cpu_mask, mon_event_count, rr, 1);
560+
else
561+
smp_call_on_cpu(cpu, smp_mon_event_count, rr, false);
540562
}
541563

542564
int rdtgroup_mondata_show(struct seq_file *m, void *arg)

arch/x86/kernel/cpu/resctrl/monitor.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -585,7 +585,7 @@ static void mbm_bw_count(u32 closid, u32 rmid, struct rmid_read *rr)
585585
}
586586

587587
/*
588-
* This is called via IPI to read the CQM/MBM counters
588+
* This is scheduled by mon_event_read() to read the CQM/MBM counters
589589
* on a domain.
590590
*/
591591
void mon_event_count(void *info)

0 commit comments

Comments
 (0)