Skip to content

Commit e6efbd2

Browse files
Robert Richterdavejiang
authored andcommitted
cxl, doc: Moving conventions in separate files
Moving conventions in separate files. Cc: Jonathan Corbet <corbet@lwn.net> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Signed-off-by: Robert Richter <rrichter@amd.com> Link: https://patch.msgid.link/20260203173604.1440334-2-rrichter@amd.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
1 parent 7297118 commit e6efbd2

3 files changed

Lines changed: 178 additions & 170 deletions

File tree

Lines changed: 6 additions & 170 deletions
Original file line numberDiff line numberDiff line change
@@ -1,181 +1,17 @@
11
.. SPDX-License-Identifier: GPL-2.0
22
3-
=======================================
43
Compute Express Link: Linux Conventions
5-
=======================================
4+
#######################################
65

76
There exists shipping platforms that bend or break CXL specification
87
expectations. Record the details and the rationale for those deviations.
98
Borrow the ACPI Code First template format to capture the assumptions
109
and tradeoffs such that multiple platform implementations can follow the
1110
same convention.
1211

13-
<(template) Title>
14-
==================
12+
.. toctree::
13+
:maxdepth: 1
14+
:caption: Contents
1515

16-
Document
17-
--------
18-
CXL Revision <rev>, Version <ver>
19-
20-
License
21-
-------
22-
SPDX-License Identifier: CC-BY-4.0
23-
24-
Creator/Contributors
25-
--------------------
26-
27-
Summary of the Change
28-
---------------------
29-
30-
<Detail the conflict with the specification and where available the
31-
assumptions and tradeoffs taken by the hardware platform.>
32-
33-
34-
Benefits of the Change
35-
----------------------
36-
37-
<Detail what happens if platforms and Linux do not adopt this
38-
convention.>
39-
40-
References
41-
----------
42-
43-
Detailed Description of the Change
44-
----------------------------------
45-
46-
<Propose spec language that corrects the conflict.>
47-
48-
49-
Resolve conflict between CFMWS, Platform Memory Holes, and Endpoint Decoders
50-
============================================================================
51-
52-
Document
53-
--------
54-
55-
CXL Revision 3.2, Version 1.0
56-
57-
License
58-
-------
59-
60-
SPDX-License Identifier: CC-BY-4.0
61-
62-
Creator/Contributors
63-
--------------------
64-
65-
- Fabio M. De Francesco, Intel
66-
- Dan J. Williams, Intel
67-
- Mahesh Natu, Intel
68-
69-
Summary of the Change
70-
---------------------
71-
72-
According to the current Compute Express Link (CXL) Specifications (Revision
73-
3.2, Version 1.0), the CXL Fixed Memory Window Structure (CFMWS) describes zero
74-
or more Host Physical Address (HPA) windows associated with each CXL Host
75-
Bridge. Each window represents a contiguous HPA range that may be interleaved
76-
across one or more targets, including CXL Host Bridges. Each window has a set
77-
of restrictions that govern its usage. It is the Operating System-directed
78-
configuration and Power Management (OSPM) responsibility to utilize each window
79-
for the specified use.
80-
81-
Table 9-22 of the current CXL Specifications states that the Window Size field
82-
contains the total number of consecutive bytes of HPA this window describes.
83-
This value must be a multiple of the Number of Interleave Ways (NIW) * 256 MB.
84-
85-
Platform Firmware (BIOS) might reserve physical addresses below 4 GB where a
86-
memory gap such as the Low Memory Hole for PCIe MMIO may exist. In such cases,
87-
the CFMWS Range Size may not adhere to the NIW * 256 MB rule.
88-
89-
The HPA represents the actual physical memory address space that the CXL devices
90-
can decode and respond to, while the System Physical Address (SPA), a related
91-
but distinct concept, represents the system-visible address space that users can
92-
direct transaction to and so it excludes reserved regions.
93-
94-
BIOS publishes CFMWS to communicate the active SPA ranges that, on platforms
95-
with LMH's, map to a strict subset of the HPA. The SPA range trims out the hole,
96-
resulting in lost capacity in the Endpoints with no SPA to map to that part of
97-
the HPA range that intersects the hole.
98-
99-
E.g, an x86 platform with two CFMWS and an LMH starting at 2 GB:
100-
101-
+--------+------------+-------------------+------------------+-------------------+------+
102-
| Window | CFMWS Base | CFMWS Size | HDM Decoder Base | HDM Decoder Size | Ways |
103-
+========+============+===================+==================+===================+======+
104-
|  0 | 0 GB | 2 GB | 0 GB | 3 GB | 12 |
105-
+--------+------------+-------------------+------------------+-------------------+------+
106-
|  1 | 4 GB | NIW*256MB Aligned | 4 GB | NIW*256MB Aligned | 12 |
107-
+--------+------------+-------------------+------------------+-------------------+------+
108-
109-
HDM decoder base and HDM decoder size represent all the 12 Endpoint Decoders of
110-
a 12 ways region and all the intermediate Switch Decoders. They are configured
111-
by the BIOS according to the NIW * 256MB rule, resulting in a HPA range size of
112-
3GB. Instead, the CFMWS Base and CFMWS Size are used to configure the Root
113-
Decoder HPA range that results smaller (2GB) than that of the Switch and
114-
Endpoint Decoders in the hierarchy (3GB).
115-
116-
This creates 2 issues which lead to a failure to construct a region:
117-
118-
1) A mismatch in region size between root and any HDM decoder. The root decoders
119-
will always be smaller due to the trim.
120-
121-
2) The trim causes the root decoder to violate the (NIW * 256MB) rule.
122-
123-
This change allows a region with a base address of 0GB to bypass these checks to
124-
allow for region creation with the trimmed root decoder address range.
125-
126-
This change does not allow for any other arbitrary region to violate these
127-
checks - it is intended exclusively to enable x86 platforms which map CXL memory
128-
under 4GB.
129-
130-
Despite the HDM decoders covering the PCIE hole HPA region, it is expected that
131-
the platform will never route address accesses to the CXL complex because the
132-
root decoder only covers the trimmed region (which excludes this). This is
133-
outside the ability of Linux to enforce.
134-
135-
On the example platform, only the first 2GB will be potentially usable, but
136-
Linux, aiming to adhere to the current specifications, fails to construct
137-
Regions and attach Endpoint and intermediate Switch Decoders to them.
138-
139-
There are several points of failure that due to the expectation that the Root
140-
Decoder HPA size, that is equal to the CFMWS from which it is configured, has
141-
to be greater or equal to the matching Switch and Endpoint HDM Decoders.
142-
143-
In order to succeed with construction and attachment, Linux must construct a
144-
Region with Root Decoder HPA range size, and then attach to that all the
145-
intermediate Switch Decoders and Endpoint Decoders that belong to the hierarchy
146-
regardless of their range sizes.
147-
148-
Benefits of the Change
149-
----------------------
150-
151-
Without the change, the OSPM wouldn't match intermediate Switch and Endpoint
152-
Decoders with Root Decoders configured with CFMWS HPA sizes that don't align
153-
with the NIW * 256MB constraint, and so it leads to lost memdev capacity.
154-
155-
This change allows the OSPM to construct Regions and attach intermediate Switch
156-
and Endpoint Decoders to them, so that the addressable part of the memory
157-
devices total capacity is made available to the users.
158-
159-
References
160-
----------
161-
162-
Compute Express Link Specification Revision 3.2, Version 1.0
163-
<https://www.computeexpresslink.org/>
164-
165-
Detailed Description of the Change
166-
----------------------------------
167-
168-
The description of the Window Size field in table 9-22 needs to account for
169-
platforms with Low Memory Holes, where SPA ranges might be subsets of the
170-
endpoints HPA. Therefore, it has to be changed to the following:
171-
172-
"The total number of consecutive bytes of HPA this window represents. This value
173-
shall be a multiple of NIW * 256 MB.
174-
175-
On platforms that reserve physical addresses below 4 GB, such as the Low Memory
176-
Hole for PCIe MMIO on x86, an instance of CFMWS whose Base HPA range is 0 might
177-
have a size that doesn't align with the NIW * 256 MB constraint.
178-
179-
Note that the matching intermediate Switch Decoders and the Endpoint Decoders
180-
HPA range sizes must still align to the above-mentioned rule, but the memory
181-
capacity that exceeds the CFMWS window size won't be accessible.".
16+
conventions/cxl-lmh.rst
17+
conventions/template.rst
Lines changed: 135 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,135 @@
1+
.. SPDX-License-Identifier: GPL-2.0
2+
3+
Resolve conflict between CFMWS, Platform Memory Holes, and Endpoint Decoders
4+
============================================================================
5+
6+
Document
7+
--------
8+
9+
CXL Revision 3.2, Version 1.0
10+
11+
License
12+
-------
13+
14+
SPDX-License Identifier: CC-BY-4.0
15+
16+
Creator/Contributors
17+
--------------------
18+
19+
- Fabio M. De Francesco, Intel
20+
- Dan J. Williams, Intel
21+
- Mahesh Natu, Intel
22+
23+
Summary of the Change
24+
---------------------
25+
26+
According to the current Compute Express Link (CXL) Specifications (Revision
27+
3.2, Version 1.0), the CXL Fixed Memory Window Structure (CFMWS) describes zero
28+
or more Host Physical Address (HPA) windows associated with each CXL Host
29+
Bridge. Each window represents a contiguous HPA range that may be interleaved
30+
across one or more targets, including CXL Host Bridges. Each window has a set
31+
of restrictions that govern its usage. It is the Operating System-directed
32+
configuration and Power Management (OSPM) responsibility to utilize each window
33+
for the specified use.
34+
35+
Table 9-22 of the current CXL Specifications states that the Window Size field
36+
contains the total number of consecutive bytes of HPA this window describes.
37+
This value must be a multiple of the Number of Interleave Ways (NIW) * 256 MB.
38+
39+
Platform Firmware (BIOS) might reserve physical addresses below 4 GB where a
40+
memory gap such as the Low Memory Hole for PCIe MMIO may exist. In such cases,
41+
the CFMWS Range Size may not adhere to the NIW * 256 MB rule.
42+
43+
The HPA represents the actual physical memory address space that the CXL devices
44+
can decode and respond to, while the System Physical Address (SPA), a related
45+
but distinct concept, represents the system-visible address space that users can
46+
direct transaction to and so it excludes reserved regions.
47+
48+
BIOS publishes CFMWS to communicate the active SPA ranges that, on platforms
49+
with LMH's, map to a strict subset of the HPA. The SPA range trims out the hole,
50+
resulting in lost capacity in the Endpoints with no SPA to map to that part of
51+
the HPA range that intersects the hole.
52+
53+
E.g, an x86 platform with two CFMWS and an LMH starting at 2 GB:
54+
55+
+--------+------------+-------------------+------------------+-------------------+------+
56+
| Window | CFMWS Base | CFMWS Size | HDM Decoder Base | HDM Decoder Size | Ways |
57+
+========+============+===================+==================+===================+======+
58+
|  0 | 0 GB | 2 GB | 0 GB | 3 GB | 12 |
59+
+--------+------------+-------------------+------------------+-------------------+------+
60+
|  1 | 4 GB | NIW*256MB Aligned | 4 GB | NIW*256MB Aligned | 12 |
61+
+--------+------------+-------------------+------------------+-------------------+------+
62+
63+
HDM decoder base and HDM decoder size represent all the 12 Endpoint Decoders of
64+
a 12 ways region and all the intermediate Switch Decoders. They are configured
65+
by the BIOS according to the NIW * 256MB rule, resulting in a HPA range size of
66+
3GB. Instead, the CFMWS Base and CFMWS Size are used to configure the Root
67+
Decoder HPA range that results smaller (2GB) than that of the Switch and
68+
Endpoint Decoders in the hierarchy (3GB).
69+
70+
This creates 2 issues which lead to a failure to construct a region:
71+
72+
1) A mismatch in region size between root and any HDM decoder. The root decoders
73+
will always be smaller due to the trim.
74+
75+
2) The trim causes the root decoder to violate the (NIW * 256MB) rule.
76+
77+
This change allows a region with a base address of 0GB to bypass these checks to
78+
allow for region creation with the trimmed root decoder address range.
79+
80+
This change does not allow for any other arbitrary region to violate these
81+
checks - it is intended exclusively to enable x86 platforms which map CXL memory
82+
under 4GB.
83+
84+
Despite the HDM decoders covering the PCIE hole HPA region, it is expected that
85+
the platform will never route address accesses to the CXL complex because the
86+
root decoder only covers the trimmed region (which excludes this). This is
87+
outside the ability of Linux to enforce.
88+
89+
On the example platform, only the first 2GB will be potentially usable, but
90+
Linux, aiming to adhere to the current specifications, fails to construct
91+
Regions and attach Endpoint and intermediate Switch Decoders to them.
92+
93+
There are several points of failure that due to the expectation that the Root
94+
Decoder HPA size, that is equal to the CFMWS from which it is configured, has
95+
to be greater or equal to the matching Switch and Endpoint HDM Decoders.
96+
97+
In order to succeed with construction and attachment, Linux must construct a
98+
Region with Root Decoder HPA range size, and then attach to that all the
99+
intermediate Switch Decoders and Endpoint Decoders that belong to the hierarchy
100+
regardless of their range sizes.
101+
102+
Benefits of the Change
103+
----------------------
104+
105+
Without the change, the OSPM wouldn't match intermediate Switch and Endpoint
106+
Decoders with Root Decoders configured with CFMWS HPA sizes that don't align
107+
with the NIW * 256MB constraint, and so it leads to lost memdev capacity.
108+
109+
This change allows the OSPM to construct Regions and attach intermediate Switch
110+
and Endpoint Decoders to them, so that the addressable part of the memory
111+
devices total capacity is made available to the users.
112+
113+
References
114+
----------
115+
116+
Compute Express Link Specification Revision 3.2, Version 1.0
117+
<https://www.computeexpresslink.org/>
118+
119+
Detailed Description of the Change
120+
----------------------------------
121+
122+
The description of the Window Size field in table 9-22 needs to account for
123+
platforms with Low Memory Holes, where SPA ranges might be subsets of the
124+
endpoints HPA. Therefore, it has to be changed to the following:
125+
126+
"The total number of consecutive bytes of HPA this window represents. This value
127+
shall be a multiple of NIW * 256 MB.
128+
129+
On platforms that reserve physical addresses below 4 GB, such as the Low Memory
130+
Hole for PCIe MMIO on x86, an instance of CFMWS whose Base HPA range is 0 might
131+
have a size that doesn't align with the NIW * 256 MB constraint.
132+
133+
Note that the matching intermediate Switch Decoders and the Endpoint Decoders
134+
HPA range sizes must still align to the above-mentioned rule, but the memory
135+
capacity that exceeds the CFMWS window size won't be accessible.".
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
.. SPDX-License-Identifier: GPL-2.0
2+
3+
.. :: Template Title here:
4+
5+
Template File
6+
=============
7+
8+
Document
9+
--------
10+
CXL Revision <rev>, Version <ver>
11+
12+
License
13+
-------
14+
SPDX-License Identifier: CC-BY-4.0
15+
16+
Creator/Contributors
17+
--------------------
18+
19+
Summary of the Change
20+
---------------------
21+
22+
<Detail the conflict with the specification and where available the
23+
assumptions and tradeoffs taken by the hardware platform.>
24+
25+
Benefits of the Change
26+
----------------------
27+
28+
<Detail what happens if platforms and Linux do not adopt this
29+
convention.>
30+
31+
References
32+
----------
33+
34+
Detailed Description of the Change
35+
----------------------------------
36+
37+
<Propose spec language that corrects the conflict.>

0 commit comments

Comments
 (0)