Skip to content

Commit a353e72

Browse files
committed
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio updates from Michael Tsirkin: - in-order support in virtio core - multiple address space support in vduse - fixes, cleanups all over the place, notably dma alignment fixes for non-cache-coherent systems * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (59 commits) vduse: avoid adding implicit padding vhost: fix caching attributes of MMIO regions by setting them explicitly vdpa/mlx5: update MAC address handling in mlx5_vdpa_set_attr() vdpa/mlx5: reuse common function for MAC address updates vdpa/mlx5: update mlx_features with driver state check crypto: virtio: Replace package id with numa node id crypto: virtio: Remove duplicated virtqueue_kick in virtio_crypto_skcipher_crypt_req crypto: virtio: Add spinlock protection with virtqueue notification Documentation: Add documentation for VDUSE Address Space IDs vduse: bump version number vduse: add vq group asid support vduse: merge tree search logic of IOTLB_GET_FD and IOTLB_GET_INFO ioctls vduse: take out allocations from vduse_dev_alloc_coherent vduse: remove unused vaddr parameter of vduse_domain_free_coherent vduse: refactor vdpa_dev_add for goto err handling vhost: forbid change vq groups ASID if DRIVER_OK is set vdpa: document set_group_asid thread safety vduse: return internal vq group struct as map token vduse: add vq group support vduse: add v1 API definition ...
2 parents cb55738 + ebcff9d commit a353e72

23 files changed

Lines changed: 1553 additions & 505 deletions

File tree

Documentation/core-api/dma-api-howto.rst

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -146,6 +146,58 @@ What about block I/O and networking buffers? The block I/O and
146146
networking subsystems make sure that the buffers they use are valid
147147
for you to DMA from/to.
148148

149+
__dma_from_device_group_begin/end annotations
150+
=============================================
151+
152+
As explained previously, when a structure contains a DMA_FROM_DEVICE /
153+
DMA_BIDIRECTIONAL buffer (device writes to memory) alongside fields that the
154+
CPU writes to, cache line sharing between the DMA buffer and CPU-written fields
155+
can cause data corruption on CPUs with DMA-incoherent caches.
156+
157+
The ``__dma_from_device_group_begin(GROUP)/__dma_from_device_group_end(GROUP)``
158+
macros ensure proper alignment to prevent this::
159+
160+
struct my_device {
161+
spinlock_t lock1;
162+
__dma_from_device_group_begin();
163+
char dma_buffer1[16];
164+
char dma_buffer2[16];
165+
__dma_from_device_group_end();
166+
spinlock_t lock2;
167+
};
168+
169+
To isolate a DMA buffer from adjacent fields, use
170+
``__dma_from_device_group_begin(GROUP)`` before the first DMA buffer
171+
field and ``__dma_from_device_group_end(GROUP)`` after the last DMA
172+
buffer field (with the same GROUP name). This protects both the head
173+
and tail of the buffer from cache line sharing.
174+
175+
The GROUP parameter is an optional identifier that names the DMA buffer group
176+
(in case you have several in the same structure)::
177+
178+
struct my_device {
179+
spinlock_t lock1;
180+
__dma_from_device_group_begin(buffer1);
181+
char dma_buffer1[16];
182+
__dma_from_device_group_end(buffer1);
183+
spinlock_t lock2;
184+
__dma_from_device_group_begin(buffer2);
185+
char dma_buffer2[16];
186+
__dma_from_device_group_end(buffer2);
187+
};
188+
189+
On cache-coherent platforms these macros expand to zero-length array markers.
190+
On non-coherent platforms, they also ensure the minimal DMA alignment, which
191+
can be as large as 128 bytes.
192+
193+
.. note::
194+
195+
It is allowed (though somewhat fragile) to include extra fields, not
196+
intended for DMA from the device, within the group (in order to pack the
197+
structure tightly) - but only as long as the CPU does not write these
198+
fields while any fields in the group are mapped for DMA_FROM_DEVICE or
199+
DMA_BIDIRECTIONAL.
200+
149201
DMA addressing capabilities
150202
===========================
151203

Documentation/core-api/dma-attributes.rst

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -148,3 +148,12 @@ DMA_ATTR_MMIO is appropriate.
148148
For architectures that require cache flushing for DMA coherence
149149
DMA_ATTR_MMIO will not perform any cache flushing. The address
150150
provided must never be mapped cacheable into the CPU.
151+
152+
DMA_ATTR_CPU_CACHE_CLEAN
153+
------------------------
154+
155+
This attribute indicates the CPU will not dirty any cacheline overlapping this
156+
DMA_FROM_DEVICE/DMA_BIDIRECTIONAL buffer while it is mapped. This allows
157+
multiple small buffers to safely share a cacheline without risk of data
158+
corruption, suppressing DMA debug warnings about overlapping mappings.
159+
All mappings sharing a cacheline should have this attribute.

Documentation/userspace-api/vduse.rst

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -230,4 +230,57 @@ able to start the dataplane processing as follows:
230230
5. Inject an interrupt for specific virtqueue with the VDUSE_INJECT_VQ_IRQ ioctl
231231
after the used ring is filled.
232232

233+
Enabling ASID (API version 1)
234+
------------------------------
235+
236+
VDUSE supports per-address-space identifiers (ASIDs) starting with API
237+
version 1. Set it up with ioctl(VDUSE_SET_API_VERSION) on `/dev/vduse/control`
238+
and pass `VDUSE_API_VERSION_1` before creating a new VDUSE instance with
239+
ioctl(VDUSE_CREATE_DEV).
240+
241+
Afterwards, you can use the member asid of ioctl(VDUSE_VQ_SETUP) argument to
242+
select the address space of the IOTLB you are querying. The driver could
243+
change the address space of any virtqueue group by using the
244+
VDUSE_SET_VQ_GROUP_ASID VDUSE message type, and the VDUSE instance needs to
245+
reply with VDUSE_REQ_RESULT_OK if it was possible to change it.
246+
247+
Similarly, you can use ioctl(VDUSE_IOTLB_GET_FD2) to obtain the file descriptor
248+
describing an IOVA region of a specific ASID. Example usage:
249+
250+
.. code-block:: c
251+
252+
static void *iova_to_va(int dev_fd, uint32_t asid, uint64_t iova,
253+
uint64_t *len)
254+
{
255+
int fd;
256+
void *addr;
257+
size_t size;
258+
struct vduse_iotlb_entry_v2 entry = { 0 };
259+
260+
entry.v1.start = iova;
261+
entry.v1.last = iova;
262+
entry.asid = asid;
263+
264+
fd = ioctl(dev_fd, VDUSE_IOTLB_GET_FD2, &entry);
265+
if (fd < 0)
266+
return NULL;
267+
268+
size = entry.v1.last - entry.v1.start + 1;
269+
*len = entry.v1.last - iova + 1;
270+
addr = mmap(0, size, perm_to_prot(entry.v1.perm), MAP_SHARED,
271+
fd, entry.v1.offset);
272+
close(fd);
273+
if (addr == MAP_FAILED)
274+
return NULL;
275+
276+
/*
277+
* Using some data structures such as linked list to store
278+
* the iotlb mapping. The munmap(2) should be called for the
279+
* cached mapping when the corresponding VDUSE_UPDATE_IOTLB
280+
* message is received or the device is reset.
281+
*/
282+
283+
return addr + iova - entry.v1.start;
284+
}
285+
233286
For more details on the uAPI, please see include/uapi/linux/vduse.h.

drivers/char/hw_random/virtio-rng.c

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
#include <linux/spinlock.h>
1212
#include <linux/virtio.h>
1313
#include <linux/virtio_rng.h>
14+
#include <linux/dma-mapping.h>
1415
#include <linux/module.h>
1516
#include <linux/slab.h>
1617

@@ -28,11 +29,13 @@ struct virtrng_info {
2829
unsigned int data_avail;
2930
unsigned int data_idx;
3031
/* minimal size returned by rng_buffer_size() */
32+
__dma_from_device_group_begin();
3133
#if SMP_CACHE_BYTES < 32
3234
u8 data[32];
3335
#else
3436
u8 data[SMP_CACHE_BYTES];
3537
#endif
38+
__dma_from_device_group_end();
3639
};
3740

3841
static void random_recv_done(struct virtqueue *vq)

drivers/gpio/gpio-virtio.c

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010
*/
1111

1212
#include <linux/completion.h>
13+
#include <linux/dma-mapping.h>
1314
#include <linux/err.h>
1415
#include <linux/gpio/driver.h>
1516
#include <linux/io.h>
@@ -24,9 +25,13 @@
2425
struct virtio_gpio_line {
2526
struct mutex lock; /* Protects line operation */
2627
struct completion completion;
27-
struct virtio_gpio_request req ____cacheline_aligned;
28-
struct virtio_gpio_response res ____cacheline_aligned;
28+
2929
unsigned int rxlen;
30+
31+
__dma_from_device_group_begin();
32+
struct virtio_gpio_request req;
33+
struct virtio_gpio_response res;
34+
__dma_from_device_group_end();
3035
};
3136

3237
struct vgpio_irq_line {
@@ -37,8 +42,10 @@ struct vgpio_irq_line {
3742
bool update_pending;
3843
bool queue_pending;
3944

40-
struct virtio_gpio_irq_request ireq ____cacheline_aligned;
41-
struct virtio_gpio_irq_response ires ____cacheline_aligned;
45+
__dma_from_device_group_begin();
46+
struct virtio_gpio_irq_request ireq;
47+
struct virtio_gpio_irq_response ires;
48+
__dma_from_device_group_end();
4249
};
4350

4451
struct virtio_gpio {

drivers/scsi/virtio_scsi.c

Lines changed: 12 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@
2929
#include <scsi/scsi_tcq.h>
3030
#include <scsi/scsi_devinfo.h>
3131
#include <linux/seqlock.h>
32+
#include <linux/dma-mapping.h>
3233

3334
#include "sd.h"
3435

@@ -61,7 +62,7 @@ struct virtio_scsi_cmd {
6162

6263
struct virtio_scsi_event_node {
6364
struct virtio_scsi *vscsi;
64-
struct virtio_scsi_event event;
65+
struct virtio_scsi_event *event;
6566
struct work_struct work;
6667
};
6768

@@ -89,6 +90,11 @@ struct virtio_scsi {
8990

9091
struct virtio_scsi_vq ctrl_vq;
9192
struct virtio_scsi_vq event_vq;
93+
94+
__dma_from_device_group_begin();
95+
struct virtio_scsi_event events[VIRTIO_SCSI_EVENT_LEN];
96+
__dma_from_device_group_end();
97+
9298
struct virtio_scsi_vq req_vqs[];
9399
};
94100

@@ -237,12 +243,12 @@ static int virtscsi_kick_event(struct virtio_scsi *vscsi,
237243
unsigned long flags;
238244

239245
INIT_WORK(&event_node->work, virtscsi_handle_event);
240-
sg_init_one(&sg, &event_node->event, sizeof(struct virtio_scsi_event));
246+
sg_init_one(&sg, event_node->event, sizeof(struct virtio_scsi_event));
241247

242248
spin_lock_irqsave(&vscsi->event_vq.vq_lock, flags);
243249

244-
err = virtqueue_add_inbuf(vscsi->event_vq.vq, &sg, 1, event_node,
245-
GFP_ATOMIC);
250+
err = virtqueue_add_inbuf_cache_clean(vscsi->event_vq.vq, &sg, 1, event_node,
251+
GFP_ATOMIC);
246252
if (!err)
247253
virtqueue_kick(vscsi->event_vq.vq);
248254

@@ -257,6 +263,7 @@ static int virtscsi_kick_event_all(struct virtio_scsi *vscsi)
257263

258264
for (i = 0; i < VIRTIO_SCSI_EVENT_LEN; i++) {
259265
vscsi->event_list[i].vscsi = vscsi;
266+
vscsi->event_list[i].event = &vscsi->events[i];
260267
virtscsi_kick_event(vscsi, &vscsi->event_list[i]);
261268
}
262269

@@ -380,7 +387,7 @@ static void virtscsi_handle_event(struct work_struct *work)
380387
struct virtio_scsi_event_node *event_node =
381388
container_of(work, struct virtio_scsi_event_node, work);
382389
struct virtio_scsi *vscsi = event_node->vscsi;
383-
struct virtio_scsi_event *event = &event_node->event;
390+
struct virtio_scsi_event *event = event_node->event;
384391

385392
if (event->event &
386393
cpu_to_virtio32(vscsi->vdev, VIRTIO_SCSI_T_EVENTS_MISSED)) {

0 commit comments

Comments
 (0)