Skip to content

Commit ef9369e

Browse files
dtatuleaSaeed Mahameed
authored andcommitted
net/mlx5e: RX, Fix page_pool allocation failure recovery for legacy rq
When a page allocation fails during refill in mlx5e_refill_rx_wqes, the page will be released again on the next refill call. This triggers the page_pool negative page fragment count warning below: [ 338.326070] WARNING: CPU: 4 PID: 0 at include/net/page_pool/helpers.h:130 mlx5e_page_release_fragmented.isra.0+0x42/0x50 [mlx5_core] ... [ 338.328993] RIP: 0010:mlx5e_page_release_fragmented.isra.0+0x42/0x50 [mlx5_core] [ 338.329094] Call Trace: [ 338.329097] <IRQ> [ 338.329100] ? __warn+0x7d/0x120 [ 338.329105] ? mlx5e_page_release_fragmented.isra.0+0x42/0x50 [mlx5_core] [ 338.329173] ? report_bug+0x155/0x180 [ 338.329179] ? handle_bug+0x3c/0x60 [ 338.329183] ? exc_invalid_op+0x13/0x60 [ 338.329187] ? asm_exc_invalid_op+0x16/0x20 [ 338.329192] ? mlx5e_page_release_fragmented.isra.0+0x42/0x50 [mlx5_core] [ 338.329259] mlx5e_post_rx_wqes+0x210/0x5a0 [mlx5_core] [ 338.329327] ? mlx5e_poll_rx_cq+0x88/0x6f0 [mlx5_core] [ 338.329394] mlx5e_napi_poll+0x127/0x6b0 [mlx5_core] [ 338.329461] __napi_poll+0x25/0x1a0 [ 338.329465] net_rx_action+0x28a/0x300 [ 338.329468] __do_softirq+0xcd/0x279 [ 338.329473] irq_exit_rcu+0x6a/0x90 [ 338.329477] common_interrupt+0x82/0xa0 [ 338.329482] </IRQ> This patch fixes the legacy rq case by releasing all allocated fragments and then setting the skip flag on all released fragments. It is important to note that the number of released fragments will be higher than the number of allocated fragments when an allocation error occurs. Fixes: 3f93f82 ("net/mlx5e: RX, Defer page release in legacy rq for better recycling") Tested-by: Chris Mason <clm@fb.com> Reported-by: Chris Mason <clm@fb.com> Closes: https://lore.kernel.org/netdev/117FF31A-7BE0-4050-B2BB-E41F224FF72F@meta.com Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
1 parent be43b74 commit ef9369e

1 file changed

Lines changed: 24 additions & 9 deletions

File tree

  • drivers/net/ethernet/mellanox/mlx5/core

drivers/net/ethernet/mellanox/mlx5/core/en_rx.c

Lines changed: 24 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -457,26 +457,41 @@ static int mlx5e_alloc_rx_wqes(struct mlx5e_rq *rq, u16 ix, int wqe_bulk)
457457
static int mlx5e_refill_rx_wqes(struct mlx5e_rq *rq, u16 ix, int wqe_bulk)
458458
{
459459
int remaining = wqe_bulk;
460-
int i = 0;
460+
int total_alloc = 0;
461+
int refill_alloc;
462+
int refill;
461463

462464
/* The WQE bulk is split into smaller bulks that are sized
463465
* according to the page pool cache refill size to avoid overflowing
464466
* the page pool cache due to too many page releases at once.
465467
*/
466468
do {
467-
int refill = min_t(u16, rq->wqe.info.refill_unit, remaining);
468-
int alloc_count;
469+
refill = min_t(u16, rq->wqe.info.refill_unit, remaining);
469470

470-
mlx5e_free_rx_wqes(rq, ix + i, refill);
471-
alloc_count = mlx5e_alloc_rx_wqes(rq, ix + i, refill);
472-
i += alloc_count;
473-
if (unlikely(alloc_count != refill))
474-
break;
471+
mlx5e_free_rx_wqes(rq, ix + total_alloc, refill);
472+
refill_alloc = mlx5e_alloc_rx_wqes(rq, ix + total_alloc, refill);
473+
if (unlikely(refill_alloc != refill))
474+
goto err_free;
475475

476+
total_alloc += refill_alloc;
476477
remaining -= refill;
477478
} while (remaining);
478479

479-
return i;
480+
return total_alloc;
481+
482+
err_free:
483+
mlx5e_free_rx_wqes(rq, ix, total_alloc + refill_alloc);
484+
485+
for (int i = 0; i < total_alloc + refill; i++) {
486+
int j = mlx5_wq_cyc_ctr2ix(&rq->wqe.wq, ix + i);
487+
struct mlx5e_wqe_frag_info *frag;
488+
489+
frag = get_frag(rq, j);
490+
for (int k = 0; k < rq->wqe.info.num_frags; k++, frag++)
491+
frag->flags |= BIT(MLX5E_WQE_FRAG_SKIP_RELEASE);
492+
}
493+
494+
return 0;
480495
}
481496

482497
static void

0 commit comments

Comments
 (0)