Skip to content

Commit a3ebb59

Browse files
committed
Merge tag 'vfio-v6.19-rc1' of https://github.com/awilliam/linux-vfio
Pull VFIO updates from Alex Williamson: - Move libvfio selftest artifacts in preparation of more tightly coupled integration with KVM selftests (David Matlack) - Fix comment typo in mtty driver (Chu Guangqing) - Support for new hardware revision in the hisi_acc vfio-pci variant driver where the migration registers can now be accessed via the PF. When enabled for this support, the full BAR can be exposed to the user (Longfang Liu) - Fix vfio cdev support for VF token passing, using the correct size for the kernel structure, thereby actually allowing userspace to provide a non-zero UUID token. Also set the match token callback for the hisi_acc, fixing VF token support for this this vfio-pci variant driver (Raghavendra Rao Ananta) - Introduce internal callbacks on vfio devices to simplify and consolidate duplicate code for generating VFIO_DEVICE_GET_REGION_INFO data, removing various ioctl intercepts with a more structured solution (Jason Gunthorpe) - Introduce dma-buf support for vfio-pci devices, allowing MMIO regions to be exposed through dma-buf objects with lifecycle managed through move operations. This enables low-level interactions such as a vfio-pci based SPDK drivers interacting directly with dma-buf capable RDMA devices to enable peer-to-peer operations. IOMMUFD is also now able to build upon this support to fill a long standing feature gap versus the legacy vfio type1 IOMMU backend with an implementation of P2P support for VM use cases that better manages the lifecycle of the P2P mapping (Leon Romanovsky, Jason Gunthorpe, Vivek Kasireddy) - Convert eventfd triggering for error and request signals to use RCU mechanisms in order to avoid a 3-way lockdep reported deadlock issue (Alex Williamson) - Fix a 32-bit overflow introduced via dma-buf support manifesting with large DMA buffers (Alex Mastro) - Convert nvgrace-gpu vfio-pci variant driver to insert mappings on fault rather than at mmap time. This conversion serves both to make use of huge PFNMAPs but also to both avoid corrected RAS events during reset by now being subject to vfio-pci-core's use of unmap_mapping_range(), and to enable a device readiness test after reset (Ankit Agrawal) - Refactoring of vfio selftests to support multi-device tests and split code to provide better separation between IOMMU and device objects. This work also enables a new test suite addition to measure parallel device initialization latency (David Matlack) * tag 'vfio-v6.19-rc1' of https://github.com/awilliam/linux-vfio: (65 commits) vfio: selftests: Add vfio_pci_device_init_perf_test vfio: selftests: Eliminate INVALID_IOVA vfio: selftests: Split libvfio.h into separate header files vfio: selftests: Move vfio_selftests_*() helpers into libvfio.c vfio: selftests: Rename vfio_util.h to libvfio.h vfio: selftests: Stop passing device for IOMMU operations vfio: selftests: Move IOVA allocator into iova_allocator.c vfio: selftests: Move IOMMU library code into iommu.c vfio: selftests: Rename struct vfio_dma_region to dma_region vfio: selftests: Upgrade driver logging to dev_err() vfio: selftests: Prefix logs with device BDF where relevant vfio: selftests: Eliminate overly chatty logging vfio: selftests: Support multiple devices in the same container/iommufd vfio: selftests: Introduce struct iommu vfio: selftests: Rename struct vfio_iommu_mode to iommu_mode vfio: selftests: Allow passing multiple BDFs on the command line vfio: selftests: Split run.sh into separate scripts vfio: selftests: Move run.sh into scripts directory vfio/nvgrace-gpu: wait for the GPU mem to be ready vfio/nvgrace-gpu: Inform devmem unmapped after reset ...
2 parents ce5cfb0 + d721f52 commit a3ebb59

72 files changed

Lines changed: 3426 additions & 1899 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

Documentation/driver-api/pci/p2pdma.rst

Lines changed: 74 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -9,22 +9,48 @@ between two devices on the bus. This type of transaction is henceforth
99
called Peer-to-Peer (or P2P). However, there are a number of issues that
1010
make P2P transactions tricky to do in a perfectly safe way.
1111

12-
One of the biggest issues is that PCI doesn't require forwarding
13-
transactions between hierarchy domains, and in PCIe, each Root Port
14-
defines a separate hierarchy domain. To make things worse, there is no
15-
simple way to determine if a given Root Complex supports this or not.
16-
(See PCIe r4.0, sec 1.3.1). Therefore, as of this writing, the kernel
17-
only supports doing P2P when the endpoints involved are all behind the
18-
same PCI bridge, as such devices are all in the same PCI hierarchy
19-
domain, and the spec guarantees that all transactions within the
20-
hierarchy will be routable, but it does not require routing
21-
between hierarchies.
22-
23-
The second issue is that to make use of existing interfaces in Linux,
24-
memory that is used for P2P transactions needs to be backed by struct
25-
pages. However, PCI BARs are not typically cache coherent so there are
26-
a few corner case gotchas with these pages so developers need to
27-
be careful about what they do with them.
12+
For PCIe the routing of Transaction Layer Packets (TLPs) is well-defined up
13+
until they reach a host bridge or root port. If the path includes PCIe switches
14+
then based on the ACS settings the transaction can route entirely within
15+
the PCIe hierarchy and never reach the root port. The kernel will evaluate
16+
the PCIe topology and always permit P2P in these well-defined cases.
17+
18+
However, if the P2P transaction reaches the host bridge then it might have to
19+
hairpin back out the same root port, be routed inside the CPU SOC to another
20+
PCIe root port, or routed internally to the SOC.
21+
22+
The PCIe specification doesn't define the forwarding of transactions between
23+
hierarchy domains and kernel defaults to blocking such routing. There is an
24+
allow list to allow detecting known-good HW, in which case P2P between any
25+
two PCIe devices will be permitted.
26+
27+
Since P2P inherently is doing transactions between two devices it requires two
28+
drivers to be co-operating inside the kernel. The providing driver has to convey
29+
its MMIO to the consuming driver. To meet the driver model lifecycle rules the
30+
MMIO must have all DMA mapping removed, all CPU accesses prevented, all page
31+
table mappings undone before the providing driver completes remove().
32+
33+
This requires the providing and consuming driver to actively work together to
34+
guarantee that the consuming driver has stopped using the MMIO during a removal
35+
cycle. This is done by either a synchronous invalidation shutdown or waiting
36+
for all usage refcounts to reach zero.
37+
38+
At the lowest level the P2P subsystem offers a naked struct p2p_provider that
39+
delegates lifecycle management to the providing driver. It is expected that
40+
drivers using this option will wrap their MMIO memory in DMABUF and use DMABUF
41+
to provide an invalidation shutdown. These MMIO addresess have no struct page, and
42+
if used with mmap() must create special PTEs. As such there are very few
43+
kernel uAPIs that can accept pointers to them; in particular they cannot be used
44+
with read()/write(), including O_DIRECT.
45+
46+
Building on this, the subsystem offers a layer to wrap the MMIO in a ZONE_DEVICE
47+
pgmap of MEMORY_DEVICE_PCI_P2PDMA to create struct pages. The lifecycle of
48+
pgmap ensures that when the pgmap is destroyed all other drivers have stopped
49+
using the MMIO. This option works with O_DIRECT flows, in some cases, if the
50+
underlying subsystem supports handling MEMORY_DEVICE_PCI_P2PDMA through
51+
FOLL_PCI_P2PDMA. The use of FOLL_LONGTERM is prevented. As this relies on pgmap
52+
it also relies on architecture support along with alignment and minimum size
53+
limitations.
2854

2955

3056
Driver Writer's Guide
@@ -114,14 +140,39 @@ allocating scatter-gather lists with P2P memory.
114140
Struct Page Caveats
115141
-------------------
116142

117-
Driver writers should be very careful about not passing these special
118-
struct pages to code that isn't prepared for it. At this time, the kernel
119-
interfaces do not have any checks for ensuring this. This obviously
120-
precludes passing these pages to userspace.
143+
While the MEMORY_DEVICE_PCI_P2PDMA pages can be installed in VMAs,
144+
pin_user_pages() and related will not return them unless FOLL_PCI_P2PDMA is set.
121145

122-
P2P memory is also technically IO memory but should never have any side
123-
effects behind it. Thus, the order of loads and stores should not be important
124-
and ioreadX(), iowriteX() and friends should not be necessary.
146+
The MEMORY_DEVICE_PCI_P2PDMA pages require care to support in the kernel. The
147+
KVA is still MMIO and must still be accessed through the normal
148+
readX()/writeX()/etc helpers. Direct CPU access (e.g. memcpy) is forbidden, just
149+
like any other MMIO mapping. While this will actually work on some
150+
architectures, others will experience corruption or just crash in the kernel.
151+
Supporting FOLL_PCI_P2PDMA in a subsystem requires scrubbing it to ensure no CPU
152+
access happens.
153+
154+
155+
Usage With DMABUF
156+
=================
157+
158+
DMABUF provides an alternative to the above struct page-based
159+
client/provider/orchestrator system and should be used when struct page
160+
doesn't exist. In this mode the exporting driver will wrap
161+
some of its MMIO in a DMABUF and give the DMABUF FD to userspace.
162+
163+
Userspace can then pass the FD to an importing driver which will ask the
164+
exporting driver to map it to the importer.
165+
166+
In this case the initiator and target pci_devices are known and the P2P subsystem
167+
is used to determine the mapping type. The phys_addr_t-based DMA API is used to
168+
establish the dma_addr_t.
169+
170+
Lifecycle is controlled by DMABUF move_notify(). When the exporting driver wants
171+
to remove() it must deliver an invalidation shutdown to all DMABUF importing
172+
drivers through move_notify() and synchronously DMA unmap all the MMIO.
173+
174+
No importing driver can continue to have a DMA map to the MMIO after the
175+
exporting driver has destroyed its p2p_provider.
125176

126177

127178
P2P DMA Support Library

block/blk-mq-dma.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -84,7 +84,7 @@ static inline bool blk_can_dma_map_iova(struct request *req,
8484

8585
static bool blk_dma_map_bus(struct blk_dma_iter *iter, struct phys_vec *vec)
8686
{
87-
iter->addr = pci_p2pdma_bus_addr_map(&iter->p2pdma, vec->paddr);
87+
iter->addr = pci_p2pdma_bus_addr_map(iter->p2pdma.mem, vec->paddr);
8888
iter->len = vec->len;
8989
return true;
9090
}

drivers/crypto/hisilicon/qm.c

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3032,11 +3032,36 @@ static void qm_put_pci_res(struct hisi_qm *qm)
30323032
pci_release_mem_regions(pdev);
30333033
}
30343034

3035+
static void hisi_mig_region_clear(struct hisi_qm *qm)
3036+
{
3037+
u32 val;
3038+
3039+
/* Clear migration region set of PF */
3040+
if (qm->fun_type == QM_HW_PF && qm->ver > QM_HW_V3) {
3041+
val = readl(qm->io_base + QM_MIG_REGION_SEL);
3042+
val &= ~QM_MIG_REGION_EN;
3043+
writel(val, qm->io_base + QM_MIG_REGION_SEL);
3044+
}
3045+
}
3046+
3047+
static void hisi_mig_region_enable(struct hisi_qm *qm)
3048+
{
3049+
u32 val;
3050+
3051+
/* Select migration region of PF */
3052+
if (qm->fun_type == QM_HW_PF && qm->ver > QM_HW_V3) {
3053+
val = readl(qm->io_base + QM_MIG_REGION_SEL);
3054+
val |= QM_MIG_REGION_EN;
3055+
writel(val, qm->io_base + QM_MIG_REGION_SEL);
3056+
}
3057+
}
3058+
30353059
static void hisi_qm_pci_uninit(struct hisi_qm *qm)
30363060
{
30373061
struct pci_dev *pdev = qm->pdev;
30383062

30393063
pci_free_irq_vectors(pdev);
3064+
hisi_mig_region_clear(qm);
30403065
qm_put_pci_res(qm);
30413066
pci_disable_device(pdev);
30423067
}
@@ -5752,6 +5777,7 @@ int hisi_qm_init(struct hisi_qm *qm)
57525777
goto err_free_qm_memory;
57535778

57545779
qm_cmd_init(qm);
5780+
hisi_mig_region_enable(qm);
57555781

57565782
return 0;
57575783

@@ -5890,6 +5916,7 @@ static int qm_rebuild_for_resume(struct hisi_qm *qm)
58905916
}
58915917

58925918
qm_cmd_init(qm);
5919+
hisi_mig_region_enable(qm);
58935920
hisi_qm_dev_err_init(qm);
58945921
/* Set the doorbell timeout to QM_DB_TIMEOUT_CFG ns. */
58955922
writel(QM_DB_TIMEOUT_SET, qm->io_base + QM_DB_TIMEOUT_CFG);

drivers/dma-buf/Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# SPDX-License-Identifier: GPL-2.0-only
22
obj-y := dma-buf.o dma-fence.o dma-fence-array.o dma-fence-chain.o \
3-
dma-fence-unwrap.o dma-resv.o
3+
dma-fence-unwrap.o dma-resv.o dma-buf-mapping.o
44
obj-$(CONFIG_DMABUF_HEAPS) += dma-heap.o
55
obj-$(CONFIG_DMABUF_HEAPS) += heaps/
66
obj-$(CONFIG_SYNC_FILE) += sync_file.o

0 commit comments

Comments
 (0)