Skip to content

Commit c5dca38

Browse files
fdefrancdavejiang
authored andcommitted
cxl: Documentation/driver-api/cxl: Describe the x86 Low Memory Hole solution
Add documentation on how to resolve conflicts between CXL Fixed Memory Windows, Platform Low Memory Holes, intermediate Switch and Endpoint Decoders. [dj]: Fixed inconsistent spacing after '.' [dj]: Fixed subject line from Alison. [dj]: Removed '::' before table from Bagas. Reviewed-by: Gregory Price <gourry@gourry.net> Signed-off-by: Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com> Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
1 parent c427290 commit c5dca38

1 file changed

Lines changed: 135 additions & 0 deletions

File tree

Documentation/driver-api/cxl/conventions.rst

Lines changed: 135 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,3 +45,138 @@ Detailed Description of the Change
4545
----------------------------------
4646

4747
<Propose spec language that corrects the conflict.>
48+
49+
50+
Resolve conflict between CFMWS, Platform Memory Holes, and Endpoint Decoders
51+
============================================================================
52+
53+
Document
54+
--------
55+
56+
CXL Revision 3.2, Version 1.0
57+
58+
License
59+
-------
60+
61+
SPDX-License Identifier: CC-BY-4.0
62+
63+
Creator/Contributors
64+
--------------------
65+
66+
- Fabio M. De Francesco, Intel
67+
- Dan J. Williams, Intel
68+
- Mahesh Natu, Intel
69+
70+
Summary of the Change
71+
---------------------
72+
73+
According to the current Compute Express Link (CXL) Specifications (Revision
74+
3.2, Version 1.0), the CXL Fixed Memory Window Structure (CFMWS) describes zero
75+
or more Host Physical Address (HPA) windows associated with each CXL Host
76+
Bridge. Each window represents a contiguous HPA range that may be interleaved
77+
across one or more targets, including CXL Host Bridges. Each window has a set
78+
of restrictions that govern its usage. It is the Operating System-directed
79+
configuration and Power Management (OSPM) responsibility to utilize each window
80+
for the specified use.
81+
82+
Table 9-22 of the current CXL Specifications states that the Window Size field
83+
contains the total number of consecutive bytes of HPA this window describes.
84+
This value must be a multiple of the Number of Interleave Ways (NIW) * 256 MB.
85+
86+
Platform Firmware (BIOS) might reserve physical addresses below 4 GB where a
87+
memory gap such as the Low Memory Hole for PCIe MMIO may exist. In such cases,
88+
the CFMWS Range Size may not adhere to the NIW * 256 MB rule.
89+
90+
The HPA represents the actual physical memory address space that the CXL devices
91+
can decode and respond to, while the System Physical Address (SPA), a related
92+
but distinct concept, represents the system-visible address space that users can
93+
direct transaction to and so it excludes reserved regions.
94+
95+
BIOS publishes CFMWS to communicate the active SPA ranges that, on platforms
96+
with LMH's, map to a strict subset of the HPA. The SPA range trims out the hole,
97+
resulting in lost capacity in the Endpoints with no SPA to map to that part of
98+
the HPA range that intersects the hole.
99+
100+
E.g, an x86 platform with two CFMWS and an LMH starting at 2 GB:
101+
102+
+--------+------------+-------------------+------------------+-------------------+------+
103+
| Window | CFMWS Base | CFMWS Size | HDM Decoder Base | HDM Decoder Size | Ways |
104+
+========+============+===================+==================+===================+======+
105+
|  0 | 0 GB | 2 GB | 0 GB | 3 GB | 12 |
106+
+--------+------------+-------------------+------------------+-------------------+------+
107+
|  1 | 4 GB | NIW*256MB Aligned | 4 GB | NIW*256MB Aligned | 12 |
108+
+--------+------------+-------------------+------------------+-------------------+------+
109+
110+
HDM decoder base and HDM decoder size represent all the 12 Endpoint Decoders of
111+
a 12 ways region and all the intermediate Switch Decoders. They are configured
112+
by the BIOS according to the NIW * 256MB rule, resulting in a HPA range size of
113+
3GB. Instead, the CFMWS Base and CFMWS Size are used to configure the Root
114+
Decoder HPA range that results smaller (2GB) than that of the Switch and
115+
Endpoint Decoders in the hierarchy (3GB).
116+
117+
This creates 2 issues which lead to a failure to construct a region:
118+
119+
1) A mismatch in region size between root and any HDM decoder. The root decoders
120+
will always be smaller due to the trim.
121+
122+
2) The trim causes the root decoder to violate the (NIW * 256MB) rule.
123+
124+
This change allows a region with a base address of 0GB to bypass these checks to
125+
allow for region creation with the trimmed root decoder address range.
126+
127+
This change does not allow for any other arbitrary region to violate these
128+
checks - it is intended exclusively to enable x86 platforms which map CXL memory
129+
under 4GB.
130+
131+
Despite the HDM decoders covering the PCIE hole HPA region, it is expected that
132+
the platform will never route address accesses to the CXL complex because the
133+
root decoder only covers the trimmed region (which excludes this). This is
134+
outside the ability of Linux to enforce.
135+
136+
On the example platform, only the first 2GB will be potentially usable, but
137+
Linux, aiming to adhere to the current specifications, fails to construct
138+
Regions and attach Endpoint and intermediate Switch Decoders to them.
139+
140+
There are several points of failure that due to the expectation that the Root
141+
Decoder HPA size, that is equal to the CFMWS from which it is configured, has
142+
to be greater or equal to the matching Switch and Endpoint HDM Decoders.
143+
144+
In order to succeed with construction and attachment, Linux must construct a
145+
Region with Root Decoder HPA range size, and then attach to that all the
146+
intermediate Switch Decoders and Endpoint Decoders that belong to the hierarchy
147+
regardless of their range sizes.
148+
149+
Benefits of the Change
150+
----------------------
151+
152+
Without the change, the OSPM wouldn't match intermediate Switch and Endpoint
153+
Decoders with Root Decoders configured with CFMWS HPA sizes that don't align
154+
with the NIW * 256MB constraint, and so it leads to lost memdev capacity.
155+
156+
This change allows the OSPM to construct Regions and attach intermediate Switch
157+
and Endpoint Decoders to them, so that the addressable part of the memory
158+
devices total capacity is made available to the users.
159+
160+
References
161+
----------
162+
163+
Compute Express Link Specification Revision 3.2, Version 1.0
164+
<https://www.computeexpresslink.org/>
165+
166+
Detailed Description of the Change
167+
----------------------------------
168+
169+
The description of the Window Size field in table 9-22 needs to account for
170+
platforms with Low Memory Holes, where SPA ranges might be subsets of the
171+
endpoints HPA. Therefore, it has to be changed to the following:
172+
173+
"The total number of consecutive bytes of HPA this window represents. This value
174+
shall be a multiple of NIW * 256 MB.
175+
176+
On platforms that reserve physical addresses below 4 GB, such as the Low Memory
177+
Hole for PCIe MMIO on x86, an instance of CFMWS whose Base HPA range is 0 might
178+
have a size that doesn't align with the NIW * 256 MB constraint.
179+
180+
Note that the matching intermediate Switch Decoders and the Endpoint Decoders
181+
HPA range sizes must still align to the above-mentioned rule, but the memory
182+
capacity that exceeds the CFMWS window size won't be accessible.".

0 commit comments

Comments
 (0)