Skip to content

Commit 2cb8eea

Browse files
committed
Merge tag 'x86_cache_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 resource control updates from Borislav Petkov: "Add support on AMD for assigning QoS bandwidth counters to resources (RMIDs) with the ability for those resources to be tracked by the counters as long as they're assigned to them. Previously, due to hw limitations, bandwidth counts from untracked resources would get lost when those resources are not tracked. Refactor the code and user interfaces to be able to also support other, similar features on ARM, for example" * tag 'x86_cache_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (35 commits) fs/resctrl: Fix counter auto-assignment on mkdir with mbm_event enabled MAINTAINERS: resctrl: Add myself as reviewer x86/resctrl: Configure mbm_event mode if supported fs/resctrl: Introduce the interface to switch between monitor modes fs/resctrl: Disable BMEC event configuration when mbm_event mode is enabled fs/resctrl: Introduce the interface to modify assignments in a group fs/resctrl: Introduce mbm_L3_assignments to list assignments in a group fs/resctrl: Auto assign counters on mkdir and clean up on group removal fs/resctrl: Introduce mbm_assign_on_mkdir to enable assignments on mkdir fs/resctrl: Provide interface to update the event configurations fs/resctrl: Add event configuration directory under info/L3_MON/ fs/resctrl: Support counter read/reset with mbm_event assignment mode x86/resctrl: Implement resctrl_arch_reset_cntr() and resctrl_arch_cntr_read() x86/resctrl: Refactor resctrl_arch_rmid_read() fs/resctrl: Introduce counter ID read, reset calls in mbm_event mode fs/resctrl: Pass struct rdtgroup instead of individual members fs/resctrl: Add the functionality to unassign MBM events fs/resctrl: Add the functionality to assign MBM events x86,fs/resctrl: Implement resctrl_arch_config_cntr() to assign a counter with ABMC fs/resctrl: Introduce event configuration field in struct mon_evt ...
2 parents a65879b + dd86b69 commit 2cb8eea

16 files changed

Lines changed: 2021 additions & 229 deletions

File tree

Documentation/admin-guide/kernel-parameters.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6163,7 +6163,7 @@
61636163
rdt= [HW,X86,RDT]
61646164
Turn on/off individual RDT features. List is:
61656165
cmt, mbmtotal, mbmlocal, l3cat, l3cdp, l2cat, l2cdp,
6166-
mba, smba, bmec.
6166+
mba, smba, bmec, abmc.
61676167
E.g. to turn on cmt and turn off mba use:
61686168
rdt=cmt,!mba
61696169

Documentation/filesystems/resctrl.rst

Lines changed: 325 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ MBM (Memory Bandwidth Monitoring) "cqm_mbm_total", "cqm_mbm_local"
2626
MBA (Memory Bandwidth Allocation) "mba"
2727
SMBA (Slow Memory Bandwidth Allocation) ""
2828
BMEC (Bandwidth Monitoring Event Configuration) ""
29+
ABMC (Assignable Bandwidth Monitoring Counters) ""
2930
=============================================== ================================
3031

3132
Historically, new features were made visible by default in /proc/cpuinfo. This
@@ -256,6 +257,144 @@ with the following files:
256257
# cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
257258
0=0x30;1=0x30;3=0x15;4=0x15
258259

260+
"mbm_assign_mode":
261+
The supported counter assignment modes. The enclosed brackets indicate which mode
262+
is enabled. The MBM events associated with counters may reset when "mbm_assign_mode"
263+
is changed.
264+
::
265+
266+
# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
267+
[mbm_event]
268+
default
269+
270+
"mbm_event":
271+
272+
mbm_event mode allows users to assign a hardware counter to an RMID, event
273+
pair and monitor the bandwidth usage as long as it is assigned. The hardware
274+
continues to track the assigned counter until it is explicitly unassigned by
275+
the user. Each event within a resctrl group can be assigned independently.
276+
277+
In this mode, a monitoring event can only accumulate data while it is backed
278+
by a hardware counter. Use "mbm_L3_assignments" found in each CTRL_MON and MON
279+
group to specify which of the events should have a counter assigned. The number
280+
of counters available is described in the "num_mbm_cntrs" file. Changing the
281+
mode may cause all counters on the resource to reset.
282+
283+
Moving to mbm_event counter assignment mode requires users to assign the counters
284+
to the events. Otherwise, the MBM event counters will return 'Unassigned' when read.
285+
286+
The mode is beneficial for AMD platforms that support more CTRL_MON
287+
and MON groups than available hardware counters. By default, this
288+
feature is enabled on AMD platforms with the ABMC (Assignable Bandwidth
289+
Monitoring Counters) capability, ensuring counters remain assigned even
290+
when the corresponding RMID is not actively used by any processor.
291+
292+
"default":
293+
294+
In default mode, resctrl assumes there is a hardware counter for each
295+
event within every CTRL_MON and MON group. On AMD platforms, it is
296+
recommended to use the mbm_event mode, if supported, to prevent reset of MBM
297+
events between reads resulting from hardware re-allocating counters. This can
298+
result in misleading values or display "Unavailable" if no counter is assigned
299+
to the event.
300+
301+
* To enable "mbm_event" counter assignment mode:
302+
::
303+
304+
# echo "mbm_event" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
305+
306+
* To enable "default" monitoring mode:
307+
::
308+
309+
# echo "default" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
310+
311+
"num_mbm_cntrs":
312+
The maximum number of counters (total of available and assigned counters) in
313+
each domain when the system supports mbm_event mode.
314+
315+
For example, on a system with maximum of 32 memory bandwidth monitoring
316+
counters in each of its L3 domains:
317+
::
318+
319+
# cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
320+
0=32;1=32
321+
322+
"available_mbm_cntrs":
323+
The number of counters available for assignment in each domain when mbm_event
324+
mode is enabled on the system.
325+
326+
For example, on a system with 30 available [hardware] assignable counters
327+
in each of its L3 domains:
328+
::
329+
330+
# cat /sys/fs/resctrl/info/L3_MON/available_mbm_cntrs
331+
0=30;1=30
332+
333+
"event_configs":
334+
Directory that exists when "mbm_event" counter assignment mode is supported.
335+
Contains a sub-directory for each MBM event that can be assigned to a counter.
336+
337+
Two MBM events are supported by default: mbm_local_bytes and mbm_total_bytes.
338+
Each MBM event's sub-directory contains a file named "event_filter" that is
339+
used to view and modify which memory transactions the MBM event is configured
340+
with. The file is accessible only when "mbm_event" counter assignment mode is
341+
enabled.
342+
343+
List of memory transaction types supported:
344+
345+
========================== ========================================================
346+
Name Description
347+
========================== ========================================================
348+
dirty_victim_writes_all Dirty Victims from the QOS domain to all types of memory
349+
remote_reads_slow_memory Reads to slow memory in the non-local NUMA domain
350+
local_reads_slow_memory Reads to slow memory in the local NUMA domain
351+
remote_non_temporal_writes Non-temporal writes to non-local NUMA domain
352+
local_non_temporal_writes Non-temporal writes to local NUMA domain
353+
remote_reads Reads to memory in the non-local NUMA domain
354+
local_reads Reads to memory in the local NUMA domain
355+
========================== ========================================================
356+
357+
For example::
358+
359+
# cat /sys/fs/resctrl/info/L3_MON/event_configs/mbm_total_bytes/event_filter
360+
local_reads,remote_reads,local_non_temporal_writes,remote_non_temporal_writes,
361+
local_reads_slow_memory,remote_reads_slow_memory,dirty_victim_writes_all
362+
363+
# cat /sys/fs/resctrl/info/L3_MON/event_configs/mbm_local_bytes/event_filter
364+
local_reads,local_non_temporal_writes,local_reads_slow_memory
365+
366+
Modify the event configuration by writing to the "event_filter" file within
367+
the "event_configs" directory. The read/write "event_filter" file contains the
368+
configuration of the event that reflects which memory transactions are counted by it.
369+
370+
For example::
371+
372+
# echo "local_reads, local_non_temporal_writes" >
373+
/sys/fs/resctrl/info/L3_MON/event_configs/mbm_total_bytes/event_filter
374+
375+
# cat /sys/fs/resctrl/info/L3_MON/event_configs/mbm_total_bytes/event_filter
376+
local_reads,local_non_temporal_writes
377+
378+
"mbm_assign_on_mkdir":
379+
Exists when "mbm_event" counter assignment mode is supported. Accessible
380+
only when "mbm_event" counter assignment mode is enabled.
381+
382+
Determines if a counter will automatically be assigned to an RMID, MBM event
383+
pair when its associated monitor group is created via mkdir. Enabled by default
384+
on boot, also when switched from "default" mode to "mbm_event" counter assignment
385+
mode. Users can disable this capability by writing to the interface.
386+
387+
"0":
388+
Auto assignment is disabled.
389+
"1":
390+
Auto assignment is enabled.
391+
392+
Example::
393+
394+
# echo 0 > /sys/fs/resctrl/info/L3_MON/mbm_assign_on_mkdir
395+
# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_on_mkdir
396+
0
397+
259398
"max_threshold_occupancy":
260399
Read/write file provides the largest value (in
261400
bytes) at which a previously used LLC_occupancy
@@ -380,10 +519,77 @@ When monitoring is enabled all MON groups will also contain:
380519
for the L3 cache they occupy). These are named "mon_sub_L3_YY"
381520
where "YY" is the node number.
382521

522+
When the 'mbm_event' counter assignment mode is enabled, reading
523+
an MBM event of a MON group returns 'Unassigned' if no hardware
524+
counter is assigned to it. For CTRL_MON groups, 'Unassigned' is
525+
returned if the MBM event does not have an assigned counter in the
526+
CTRL_MON group nor in any of its associated MON groups.
527+
383528
"mon_hw_id":
384529
Available only with debug option. The identifier used by hardware
385530
for the monitor group. On x86 this is the RMID.
386531

532+
When monitoring is enabled all MON groups may also contain:
533+
534+
"mbm_L3_assignments":
535+
Exists when "mbm_event" counter assignment mode is supported and lists the
536+
counter assignment states of the group.
537+
538+
The assignment list is displayed in the following format:
539+
540+
<Event>:<Domain ID>=<Assignment state>;<Domain ID>=<Assignment state>
541+
542+
Event: A valid MBM event in the
543+
/sys/fs/resctrl/info/L3_MON/event_configs directory.
544+
545+
Domain ID: A valid domain ID. When writing, '*' applies the changes
546+
to all the domains.
547+
548+
Assignment states:
549+
550+
_ : No counter assigned.
551+
552+
e : Counter assigned exclusively.
553+
554+
Example:
555+
556+
To display the counter assignment states for the default group.
557+
::
558+
559+
# cd /sys/fs/resctrl
560+
# cat /sys/fs/resctrl/mbm_L3_assignments
561+
mbm_total_bytes:0=e;1=e
562+
mbm_local_bytes:0=e;1=e
563+
564+
Assignments can be modified by writing to the interface.
565+
566+
Examples:
567+
568+
To unassign the counter associated with the mbm_total_bytes event on domain 0:
569+
::
570+
571+
# echo "mbm_total_bytes:0=_" > /sys/fs/resctrl/mbm_L3_assignments
572+
# cat /sys/fs/resctrl/mbm_L3_assignments
573+
mbm_total_bytes:0=_;1=e
574+
mbm_local_bytes:0=e;1=e
575+
576+
To unassign the counter associated with the mbm_total_bytes event on all the domains:
577+
::
578+
579+
# echo "mbm_total_bytes:*=_" > /sys/fs/resctrl/mbm_L3_assignments
580+
# cat /sys/fs/resctrl/mbm_L3_assignments
581+
mbm_total_bytes:0=_;1=_
582+
mbm_local_bytes:0=e;1=e
583+
584+
To assign a counter associated with the mbm_total_bytes event on all domains in
585+
exclusive mode:
586+
::
587+
588+
# echo "mbm_total_bytes:*=e" > /sys/fs/resctrl/mbm_L3_assignments
589+
# cat /sys/fs/resctrl/mbm_L3_assignments
590+
mbm_total_bytes:0=e;1=e
591+
mbm_local_bytes:0=e;1=e
592+
387593
When the "mba_MBps" mount option is used all CTRL_MON groups will also contain:
388594

389595
"mba_MBps_event":
@@ -1429,6 +1635,125 @@ View the llc occupancy snapshot::
14291635
# cat /sys/fs/resctrl/p1/mon_data/mon_L3_00/llc_occupancy
14301636
11234000
14311637

1638+
1639+
Examples on working with mbm_assign_mode
1640+
========================================
1641+
1642+
a. Check if MBM counter assignment mode is supported.
1643+
::
1644+
1645+
# mount -t resctrl resctrl /sys/fs/resctrl/
1646+
1647+
# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
1648+
[mbm_event]
1649+
default
1650+
1651+
The "mbm_event" mode is detected and enabled.
1652+
1653+
b. Check how many assignable counters are supported.
1654+
::
1655+
1656+
# cat /sys/fs/resctrl/info/L3_MON/num_mbm_cntrs
1657+
0=32;1=32
1658+
1659+
c. Check how many assignable counters are available for assignment in each domain.
1660+
::
1661+
1662+
# cat /sys/fs/resctrl/info/L3_MON/available_mbm_cntrs
1663+
0=30;1=30
1664+
1665+
d. To list the default group's assign states.
1666+
::
1667+
1668+
# cat /sys/fs/resctrl/mbm_L3_assignments
1669+
mbm_total_bytes:0=e;1=e
1670+
mbm_local_bytes:0=e;1=e
1671+
1672+
e. To unassign the counter associated with the mbm_total_bytes event on domain 0.
1673+
::
1674+
1675+
# echo "mbm_total_bytes:0=_" > /sys/fs/resctrl/mbm_L3_assignments
1676+
# cat /sys/fs/resctrl/mbm_L3_assignments
1677+
mbm_total_bytes:0=_;1=e
1678+
mbm_local_bytes:0=e;1=e
1679+
1680+
f. To unassign the counter associated with the mbm_total_bytes event on all domains.
1681+
::
1682+
1683+
# echo "mbm_total_bytes:*=_" > /sys/fs/resctrl/mbm_L3_assignments
1684+
# cat /sys/fs/resctrl/mbm_L3_assignment
1685+
mbm_total_bytes:0=_;1=_
1686+
mbm_local_bytes:0=e;1=e
1687+
1688+
g. To assign a counter associated with the mbm_total_bytes event on all domains in
1689+
exclusive mode.
1690+
::
1691+
1692+
# echo "mbm_total_bytes:*=e" > /sys/fs/resctrl/mbm_L3_assignments
1693+
# cat /sys/fs/resctrl/mbm_L3_assignments
1694+
mbm_total_bytes:0=e;1=e
1695+
mbm_local_bytes:0=e;1=e
1696+
1697+
h. Read the events mbm_total_bytes and mbm_local_bytes of the default group. There is
1698+
no change in reading the events with the assignment.
1699+
::
1700+
1701+
# cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
1702+
779247936
1703+
# cat /sys/fs/resctrl/mon_data/mon_L3_01/mbm_total_bytes
1704+
562324232
1705+
# cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
1706+
212122123
1707+
# cat /sys/fs/resctrl/mon_data/mon_L3_01/mbm_local_bytes
1708+
121212144
1709+
1710+
i. Check the event configurations.
1711+
::
1712+
1713+
# cat /sys/fs/resctrl/info/L3_MON/event_configs/mbm_total_bytes/event_filter
1714+
local_reads,remote_reads,local_non_temporal_writes,remote_non_temporal_writes,
1715+
local_reads_slow_memory,remote_reads_slow_memory,dirty_victim_writes_all
1716+
1717+
# cat /sys/fs/resctrl/info/L3_MON/event_configs/mbm_local_bytes/event_filter
1718+
local_reads,local_non_temporal_writes,local_reads_slow_memory
1719+
1720+
j. Change the event configuration for mbm_local_bytes.
1721+
::
1722+
1723+
# echo "local_reads, local_non_temporal_writes, local_reads_slow_memory, remote_reads" >
1724+
/sys/fs/resctrl/info/L3_MON/event_configs/mbm_local_bytes/event_filter
1725+
1726+
# cat /sys/fs/resctrl/info/L3_MON/event_configs/mbm_local_bytes/event_filter
1727+
local_reads,local_non_temporal_writes,local_reads_slow_memory,remote_reads
1728+
1729+
k. Now read the local events again. The first read may come back with "Unavailable"
1730+
status. The subsequent read of mbm_local_bytes will display the current value.
1731+
::
1732+
1733+
# cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
1734+
Unavailable
1735+
# cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
1736+
2252323
1737+
# cat /sys/fs/resctrl/mon_data/mon_L3_01/mbm_local_bytes
1738+
Unavailable
1739+
# cat /sys/fs/resctrl/mon_data/mon_L3_01/mbm_local_bytes
1740+
1566565
1741+
1742+
l. Users have the option to go back to 'default' mbm_assign_mode if required. This can be
1743+
done using the following command. Note that switching the mbm_assign_mode may reset all
1744+
the MBM counters (and thus all MBM events) of all the resctrl groups.
1745+
::
1746+
1747+
# echo "default" > /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
1748+
# cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
1749+
mbm_event
1750+
[default]
1751+
1752+
m. Unmount the resctrl filesystem.
1753+
::
1754+
1755+
# umount /sys/fs/resctrl/
1756+
14321757
Intel RDT Errata
14331758
================
14341759

MAINTAINERS

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21186,6 +21186,7 @@ M: Tony Luck <tony.luck@intel.com>
2118621186
M: Reinette Chatre <reinette.chatre@intel.com>
2118721187
R: Dave Martin <Dave.Martin@arm.com>
2118821188
R: James Morse <james.morse@arm.com>
21189+
R: Babu Moger <babu.moger@amd.com>
2118921190
L: linux-kernel@vger.kernel.org
2119021191
S: Supported
2119121192
F: Documentation/filesystems/resctrl.rst

arch/x86/include/asm/cpufeatures.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -496,6 +496,7 @@
496496
#define X86_FEATURE_TSA_L1_NO (21*32+12) /* AMD CPU not vulnerable to TSA-L1 */
497497
#define X86_FEATURE_CLEAR_CPU_BUF_VM (21*32+13) /* Clear CPU buffers using VERW before VMRUN */
498498
#define X86_FEATURE_IBPB_EXIT_TO_USER (21*32+14) /* Use IBPB on exit-to-userspace, see VMSCAPE bug */
499+
#define X86_FEATURE_ABMC (21*32+15) /* Assignable Bandwidth Monitoring Counters */
499500

500501
/*
501502
* BUG word(s)

arch/x86/include/asm/msr-index.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1230,6 +1230,8 @@
12301230
/* - AMD: */
12311231
#define MSR_IA32_MBA_BW_BASE 0xc0000200
12321232
#define MSR_IA32_SMBA_BW_BASE 0xc0000280
1233+
#define MSR_IA32_L3_QOS_ABMC_CFG 0xc00003fd
1234+
#define MSR_IA32_L3_QOS_EXT_CFG 0xc00003ff
12331235
#define MSR_IA32_EVT_CFG_BASE 0xc0000400
12341236

12351237
/* AMD-V MSRs */

0 commit comments

Comments
 (0)