Skip to content

Commit 9ae440b

Browse files
committed
Merge branch 'splice-net-switch-over-users-of-sendpage-and-remove-it'
David Howells says: ==================== splice, net: Switch over users of sendpage() and remove it Here's the final set of patches towards the removal of sendpage. All the drivers that use sendpage() get switched over to using sendmsg() with MSG_SPLICE_PAGES. The following changes are made: (1) Make the protocol drivers behave according to MSG_MORE, not MSG_SENDPAGE_NOTLAST. The latter is restricted to turning on MSG_MORE in the sendpage() wrappers. (2) Fix ocfs2 to allocate its global protocol buffers with folio_alloc() rather than kzalloc() so as not to invoke the !sendpage_ok warning in skb_splice_from_iter(). (3) Make ceph/rds, skb_send_sock, dlm, nvme, smc, ocfs2, drbd and iscsi use sendmsg(), not sendpage and make them specify MSG_MORE instead of MSG_SENDPAGE_NOTLAST. (4) Kill off sendpage and clean up MSG_SENDPAGE_NOTLAST. Link: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=51c78a4d532efe9543a4df019ff405f05c6157f6 # part 1 Link: https://lore.kernel.org/r/20230616161301.622169-1-dhowells@redhat.com/ # v1 Link: https://lore.kernel.org/r/20230617121146.716077-1-dhowells@redhat.com/ # v2 Link: https://lore.kernel.org/r/20230620145338.1300897-1-dhowells@redhat.com/ # v3 ==================== Link: https://lore.kernel.org/r/20230623225513.2732256-1-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2 parents b545a13 + b848b26 commit 9ae440b

88 files changed

Lines changed: 230 additions & 748 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

Documentation/bpf/map_sockmap.rst

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -240,11 +240,11 @@ offsets into ``msg``, respectively.
240240
If a program of type ``BPF_PROG_TYPE_SK_MSG`` is run on a ``msg`` it can only
241241
parse data that the (``data``, ``data_end``) pointers have already consumed.
242242
For ``sendmsg()`` hooks this is likely the first scatterlist element. But for
243-
calls relying on the ``sendpage`` handler (e.g., ``sendfile()``) this will be
244-
the range (**0**, **0**) because the data is shared with user space and by
245-
default the objective is to avoid allowing user space to modify data while (or
246-
after) BPF verdict is being decided. This helper can be used to pull in data
247-
and to set the start and end pointers to given values. Data will be copied if
243+
calls relying on MSG_SPLICE_PAGES (e.g., ``sendfile()``) this will be the
244+
range (**0**, **0**) because the data is shared with user space and by default
245+
the objective is to avoid allowing user space to modify data while (or after)
246+
BPF verdict is being decided. This helper can be used to pull in data and to
247+
set the start and end pointers to given values. Data will be copied if
248248
necessary (i.e., if data was not linear and if start and end pointers do not
249249
point to the same chunk).
250250

Documentation/filesystems/locking.rst

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -521,8 +521,6 @@ prototypes::
521521
int (*fsync) (struct file *, loff_t start, loff_t end, int datasync);
522522
int (*fasync) (int, struct file *, int);
523523
int (*lock) (struct file *, int, struct file_lock *);
524-
ssize_t (*sendpage) (struct file *, struct page *, int, size_t,
525-
loff_t *, int);
526524
unsigned long (*get_unmapped_area)(struct file *, unsigned long,
527525
unsigned long, unsigned long, unsigned long);
528526
int (*check_flags)(int);

Documentation/filesystems/vfs.rst

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1086,7 +1086,6 @@ This describes how the VFS can manipulate an open file. As of kernel
10861086
int (*fsync) (struct file *, loff_t, loff_t, int datasync);
10871087
int (*fasync) (int, struct file *, int);
10881088
int (*lock) (struct file *, int, struct file_lock *);
1089-
ssize_t (*sendpage) (struct file *, struct page *, int, size_t, loff_t *, int);
10901089
unsigned long (*get_unmapped_area)(struct file *, unsigned long, unsigned long, unsigned long, unsigned long);
10911090
int (*check_flags)(int);
10921091
int (*flock) (struct file *, int, struct file_lock *);

Documentation/networking/scaling.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -269,8 +269,8 @@ a single application thread handles flows with many different flow hashes.
269269
rps_sock_flow_table is a global flow table that contains the *desired* CPU
270270
for flows: the CPU that is currently processing the flow in userspace.
271271
Each table value is a CPU index that is updated during calls to recvmsg
272-
and sendmsg (specifically, inet_recvmsg(), inet_sendmsg(), inet_sendpage()
273-
and tcp_splice_read()).
272+
and sendmsg (specifically, inet_recvmsg(), inet_sendmsg() and
273+
tcp_splice_read()).
274274

275275
When the scheduler moves a thread to a new CPU while it has outstanding
276276
receive packets on the old CPU, packets may arrive out of order. To

crypto/af_alg.c

Lines changed: 0 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -482,7 +482,6 @@ static const struct proto_ops alg_proto_ops = {
482482
.listen = sock_no_listen,
483483
.shutdown = sock_no_shutdown,
484484
.mmap = sock_no_mmap,
485-
.sendpage = sock_no_sendpage,
486485
.sendmsg = sock_no_sendmsg,
487486
.recvmsg = sock_no_recvmsg,
488487

@@ -1106,33 +1105,6 @@ int af_alg_sendmsg(struct socket *sock, struct msghdr *msg, size_t size,
11061105
}
11071106
EXPORT_SYMBOL_GPL(af_alg_sendmsg);
11081107

1109-
/**
1110-
* af_alg_sendpage - sendpage system call handler
1111-
* @sock: socket of connection to user space to write to
1112-
* @page: data to send
1113-
* @offset: offset into page to begin sending
1114-
* @size: length of data
1115-
* @flags: message send/receive flags
1116-
*
1117-
* This is a generic implementation of sendpage to fill ctx->tsgl_list.
1118-
*/
1119-
ssize_t af_alg_sendpage(struct socket *sock, struct page *page,
1120-
int offset, size_t size, int flags)
1121-
{
1122-
struct bio_vec bvec;
1123-
struct msghdr msg = {
1124-
.msg_flags = flags | MSG_SPLICE_PAGES,
1125-
};
1126-
1127-
if (flags & MSG_SENDPAGE_NOTLAST)
1128-
msg.msg_flags |= MSG_MORE;
1129-
1130-
bvec_set_page(&bvec, page, size, offset);
1131-
iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size);
1132-
return sock_sendmsg(sock, &msg);
1133-
}
1134-
EXPORT_SYMBOL_GPL(af_alg_sendpage);
1135-
11361108
/**
11371109
* af_alg_free_resources - release resources required for crypto request
11381110
* @areq: Request holding the TX and RX SGL

crypto/algif_aead.c

Lines changed: 4 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -9,10 +9,10 @@
99
* The following concept of the memory management is used:
1010
*
1111
* The kernel maintains two SGLs, the TX SGL and the RX SGL. The TX SGL is
12-
* filled by user space with the data submitted via sendpage. Filling up
13-
* the TX SGL does not cause a crypto operation -- the data will only be
14-
* tracked by the kernel. Upon receipt of one recvmsg call, the caller must
15-
* provide a buffer which is tracked with the RX SGL.
12+
* filled by user space with the data submitted via sendmsg (maybe with
13+
* MSG_SPLICE_PAGES). Filling up the TX SGL does not cause a crypto operation
14+
* -- the data will only be tracked by the kernel. Upon receipt of one recvmsg
15+
* call, the caller must provide a buffer which is tracked with the RX SGL.
1616
*
1717
* During the processing of the recvmsg operation, the cipher request is
1818
* allocated and prepared. As part of the recvmsg operation, the processed
@@ -370,7 +370,6 @@ static struct proto_ops algif_aead_ops = {
370370

371371
.release = af_alg_release,
372372
.sendmsg = aead_sendmsg,
373-
.sendpage = af_alg_sendpage,
374373
.recvmsg = aead_recvmsg,
375374
.poll = af_alg_poll,
376375
};
@@ -422,18 +421,6 @@ static int aead_sendmsg_nokey(struct socket *sock, struct msghdr *msg,
422421
return aead_sendmsg(sock, msg, size);
423422
}
424423

425-
static ssize_t aead_sendpage_nokey(struct socket *sock, struct page *page,
426-
int offset, size_t size, int flags)
427-
{
428-
int err;
429-
430-
err = aead_check_key(sock);
431-
if (err)
432-
return err;
433-
434-
return af_alg_sendpage(sock, page, offset, size, flags);
435-
}
436-
437424
static int aead_recvmsg_nokey(struct socket *sock, struct msghdr *msg,
438425
size_t ignored, int flags)
439426
{
@@ -461,7 +448,6 @@ static struct proto_ops algif_aead_ops_nokey = {
461448

462449
.release = af_alg_release,
463450
.sendmsg = aead_sendmsg_nokey,
464-
.sendpage = aead_sendpage_nokey,
465451
.recvmsg = aead_recvmsg_nokey,
466452
.poll = af_alg_poll,
467453
};

crypto/algif_rng.c

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -174,7 +174,6 @@ static struct proto_ops algif_rng_ops = {
174174
.bind = sock_no_bind,
175175
.accept = sock_no_accept,
176176
.sendmsg = sock_no_sendmsg,
177-
.sendpage = sock_no_sendpage,
178177

179178
.release = af_alg_release,
180179
.recvmsg = rng_recvmsg,
@@ -192,7 +191,6 @@ static struct proto_ops __maybe_unused algif_rng_test_ops = {
192191
.mmap = sock_no_mmap,
193192
.bind = sock_no_bind,
194193
.accept = sock_no_accept,
195-
.sendpage = sock_no_sendpage,
196194

197195
.release = af_alg_release,
198196
.recvmsg = rng_test_recvmsg,

crypto/algif_skcipher.c

Lines changed: 0 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -194,7 +194,6 @@ static struct proto_ops algif_skcipher_ops = {
194194

195195
.release = af_alg_release,
196196
.sendmsg = skcipher_sendmsg,
197-
.sendpage = af_alg_sendpage,
198197
.recvmsg = skcipher_recvmsg,
199198
.poll = af_alg_poll,
200199
};
@@ -246,18 +245,6 @@ static int skcipher_sendmsg_nokey(struct socket *sock, struct msghdr *msg,
246245
return skcipher_sendmsg(sock, msg, size);
247246
}
248247

249-
static ssize_t skcipher_sendpage_nokey(struct socket *sock, struct page *page,
250-
int offset, size_t size, int flags)
251-
{
252-
int err;
253-
254-
err = skcipher_check_key(sock);
255-
if (err)
256-
return err;
257-
258-
return af_alg_sendpage(sock, page, offset, size, flags);
259-
}
260-
261248
static int skcipher_recvmsg_nokey(struct socket *sock, struct msghdr *msg,
262249
size_t ignored, int flags)
263250
{
@@ -285,7 +272,6 @@ static struct proto_ops algif_skcipher_ops_nokey = {
285272

286273
.release = af_alg_release,
287274
.sendmsg = skcipher_sendmsg_nokey,
288-
.sendpage = skcipher_sendpage_nokey,
289275
.recvmsg = skcipher_recvmsg_nokey,
290276
.poll = af_alg_poll,
291277
};

drivers/block/drbd/drbd_main.c

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1540,6 +1540,8 @@ static int _drbd_send_page(struct drbd_peer_device *peer_device, struct page *pa
15401540
int offset, size_t size, unsigned msg_flags)
15411541
{
15421542
struct socket *socket = peer_device->connection->data.socket;
1543+
struct msghdr msg = { .msg_flags = msg_flags, };
1544+
struct bio_vec bvec;
15431545
int len = size;
15441546
int err = -EIO;
15451547

@@ -1549,15 +1551,17 @@ static int _drbd_send_page(struct drbd_peer_device *peer_device, struct page *pa
15491551
* put_page(); and would cause either a VM_BUG directly, or
15501552
* __page_cache_release a page that would actually still be referenced
15511553
* by someone, leading to some obscure delayed Oops somewhere else. */
1552-
if (drbd_disable_sendpage || !sendpage_ok(page))
1553-
return _drbd_no_send_page(peer_device, page, offset, size, msg_flags);
1554+
if (!drbd_disable_sendpage && sendpage_ok(page))
1555+
msg.msg_flags |= MSG_NOSIGNAL | MSG_SPLICE_PAGES;
15541556

1555-
msg_flags |= MSG_NOSIGNAL;
15561557
drbd_update_congested(peer_device->connection);
15571558
do {
15581559
int sent;
15591560

1560-
sent = socket->ops->sendpage(socket, page, offset, len, msg_flags);
1561+
bvec_set_page(&bvec, page, offset, len);
1562+
iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, len);
1563+
1564+
sent = sock_sendmsg(socket, &msg);
15611565
if (sent <= 0) {
15621566
if (sent == -EAGAIN) {
15631567
if (we_should_drop_the_connection(peer_device->connection, socket))

drivers/infiniband/sw/siw/siw_qp_tx.c

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -325,8 +325,7 @@ static int siw_tcp_sendpages(struct socket *s, struct page **page, int offset,
325325
{
326326
struct bio_vec bvec;
327327
struct msghdr msg = {
328-
.msg_flags = (MSG_MORE | MSG_DONTWAIT | MSG_SENDPAGE_NOTLAST |
329-
MSG_SPLICE_PAGES),
328+
.msg_flags = (MSG_MORE | MSG_DONTWAIT | MSG_SPLICE_PAGES),
330329
};
331330
struct sock *sk = s->sk;
332331
int i = 0, rv = 0, sent = 0;
@@ -335,7 +334,7 @@ static int siw_tcp_sendpages(struct socket *s, struct page **page, int offset,
335334
size_t bytes = min_t(size_t, PAGE_SIZE - offset, size);
336335

337336
if (size + offset <= PAGE_SIZE)
338-
msg.msg_flags &= ~MSG_SENDPAGE_NOTLAST;
337+
msg.msg_flags &= ~MSG_MORE;
339338

340339
tcp_rate_check_app_limited(sk);
341340
bvec_set_page(&bvec, page[i], bytes, offset);

0 commit comments

Comments
 (0)