Skip to content

Commit 4219196

Browse files
sukadevdavem330
authored andcommitted
ibmvnic: fix race between xmit and reset
There is a race between reset and the transmit paths that can lead to ibmvnic_xmit() accessing an scrq after it has been freed in the reset path. It can result in a crash like: Kernel attempted to read user page (0) - exploit attempt? (uid: 0) BUG: Kernel NULL pointer dereference on read at 0x00000000 Faulting instruction address: 0xc0080000016189f8 Oops: Kernel access of bad area, sig: 11 [#1] ... NIP [c0080000016189f8] ibmvnic_xmit+0x60/0xb60 [ibmvnic] LR [c000000000c0046c] dev_hard_start_xmit+0x11c/0x280 Call Trace: [c008000001618f08] ibmvnic_xmit+0x570/0xb60 [ibmvnic] (unreliable) [c000000000c0046c] dev_hard_start_xmit+0x11c/0x280 [c000000000c9cfcc] sch_direct_xmit+0xec/0x330 [c000000000bfe640] __dev_xmit_skb+0x3a0/0x9d0 [c000000000c00ad4] __dev_queue_xmit+0x394/0x730 [c008000002db813c] __bond_start_xmit+0x254/0x450 [bonding] [c008000002db8378] bond_start_xmit+0x40/0xc0 [bonding] [c000000000c0046c] dev_hard_start_xmit+0x11c/0x280 [c000000000c00ca4] __dev_queue_xmit+0x564/0x730 [c000000000cf97e0] neigh_hh_output+0xd0/0x180 [c000000000cfa69c] ip_finish_output2+0x31c/0x5c0 [c000000000cfd244] __ip_queue_xmit+0x194/0x4f0 [c000000000d2a3c4] __tcp_transmit_skb+0x434/0x9b0 [c000000000d2d1e0] __tcp_retransmit_skb+0x1d0/0x6a0 [c000000000d2d984] tcp_retransmit_skb+0x34/0x130 [c000000000d310e8] tcp_retransmit_timer+0x388/0x6d0 [c000000000d315ec] tcp_write_timer_handler+0x1bc/0x330 [c000000000d317bc] tcp_write_timer+0x5c/0x200 [c000000000243270] call_timer_fn+0x50/0x1c0 [c000000000243704] __run_timers.part.0+0x324/0x460 [c000000000243894] run_timer_softirq+0x54/0xa0 [c000000000ea713c] __do_softirq+0x15c/0x3e0 [c000000000166258] __irq_exit_rcu+0x158/0x190 [c000000000166420] irq_exit+0x20/0x40 [c00000000002853c] timer_interrupt+0x14c/0x2b0 [c000000000009a00] decrementer_common_virt+0x210/0x220 --- interrupt: 900 at plpar_hcall_norets_notrace+0x18/0x2c The immediate cause of the crash is the access of tx_scrq in the following snippet during a reset, where the tx_scrq can be either NULL or an address that will soon be invalid: ibmvnic_xmit() { ... tx_scrq = adapter->tx_scrq[queue_num]; txq = netdev_get_tx_queue(netdev, queue_num); ind_bufp = &tx_scrq->ind_buf; if (test_bit(0, &adapter->resetting)) { ... } But beyond that, the call to ibmvnic_xmit() itself is not safe during a reset and the reset path attempts to avoid this by stopping the queue in ibmvnic_cleanup(). However just after the queue was stopped, an in-flight ibmvnic_complete_tx() could have restarted the queue even as the reset is progressing. Since the queue was restarted we could get a call to ibmvnic_xmit() which can then access the bad tx_scrq (or other fields). We cannot however simply have ibmvnic_complete_tx() check the ->resetting bit and skip starting the queue. This can race at the "back-end" of a good reset which just restarted the queue but has not cleared the ->resetting bit yet. If we skip restarting the queue due to ->resetting being true, the queue would remain stopped indefinitely potentially leading to transmit timeouts. IOW ->resetting is too broad for this purpose. Instead use a new flag that indicates whether or not the queues are active. Only the open/ reset paths control when the queues are active. ibmvnic_complete_tx() and others wake up the queue only if the queue is marked active. So we will have: A. reset/open thread in ibmvnic_cleanup() and __ibmvnic_open() ->resetting = true ->tx_queues_active = false disable tx queues ... ->tx_queues_active = true start tx queues B. Tx interrupt in ibmvnic_complete_tx(): if (->tx_queues_active) netif_wake_subqueue(); To ensure that ->tx_queues_active and state of the queues are consistent, we need a lock which: - must also be taken in the interrupt path (ibmvnic_complete_tx()) - shared across the multiple queues in the adapter (so they don't become serialized) Use rcu_read_lock() and have the reset thread synchronize_rcu() after updating the ->tx_queues_active state. While here, consolidate a few boolean fields in ibmvnic_adapter for better alignment. Based on discussions with Brian King and Dany Madden. Fixes: 7ed5b31 ("net/ibmvnic: prevent more than one thread from running in reset") Reported-by: Vaishnavi Bhat <vaish123@in.ibm.com> Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
1 parent 4fa331b commit 4219196

2 files changed

Lines changed: 55 additions & 15 deletions

File tree

drivers/net/ethernet/ibm/ibmvnic.c

Lines changed: 50 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1429,6 +1429,15 @@ static int __ibmvnic_open(struct net_device *netdev)
14291429
return rc;
14301430
}
14311431

1432+
adapter->tx_queues_active = true;
1433+
1434+
/* Since queues were stopped until now, there shouldn't be any
1435+
* one in ibmvnic_complete_tx() or ibmvnic_xmit() so maybe we
1436+
* don't need the synchronize_rcu()? Leaving it for consistency
1437+
* with setting ->tx_queues_active = false.
1438+
*/
1439+
synchronize_rcu();
1440+
14321441
netif_tx_start_all_queues(netdev);
14331442

14341443
if (prev_state == VNIC_CLOSED) {
@@ -1603,6 +1612,14 @@ static void ibmvnic_cleanup(struct net_device *netdev)
16031612
struct ibmvnic_adapter *adapter = netdev_priv(netdev);
16041613

16051614
/* ensure that transmissions are stopped if called by do_reset */
1615+
1616+
adapter->tx_queues_active = false;
1617+
1618+
/* Ensure complete_tx() and ibmvnic_xmit() see ->tx_queues_active
1619+
* update so they don't restart a queue after we stop it below.
1620+
*/
1621+
synchronize_rcu();
1622+
16061623
if (test_bit(0, &adapter->resetting))
16071624
netif_tx_disable(netdev);
16081625
else
@@ -1842,14 +1859,21 @@ static void ibmvnic_tx_scrq_clean_buffer(struct ibmvnic_adapter *adapter,
18421859
tx_buff->skb = NULL;
18431860
adapter->netdev->stats.tx_dropped++;
18441861
}
1862+
18451863
ind_bufp->index = 0;
1864+
18461865
if (atomic_sub_return(entries, &tx_scrq->used) <=
18471866
(adapter->req_tx_entries_per_subcrq / 2) &&
1848-
__netif_subqueue_stopped(adapter->netdev, queue_num) &&
1849-
!test_bit(0, &adapter->resetting)) {
1850-
netif_wake_subqueue(adapter->netdev, queue_num);
1851-
netdev_dbg(adapter->netdev, "Started queue %d\n",
1852-
queue_num);
1867+
__netif_subqueue_stopped(adapter->netdev, queue_num)) {
1868+
rcu_read_lock();
1869+
1870+
if (adapter->tx_queues_active) {
1871+
netif_wake_subqueue(adapter->netdev, queue_num);
1872+
netdev_dbg(adapter->netdev, "Started queue %d\n",
1873+
queue_num);
1874+
}
1875+
1876+
rcu_read_unlock();
18531877
}
18541878
}
18551879

@@ -1904,11 +1928,12 @@ static netdev_tx_t ibmvnic_xmit(struct sk_buff *skb, struct net_device *netdev)
19041928
int index = 0;
19051929
u8 proto = 0;
19061930

1907-
tx_scrq = adapter->tx_scrq[queue_num];
1908-
txq = netdev_get_tx_queue(netdev, queue_num);
1909-
ind_bufp = &tx_scrq->ind_buf;
1910-
1911-
if (test_bit(0, &adapter->resetting)) {
1931+
/* If a reset is in progress, drop the packet since
1932+
* the scrqs may get torn down. Otherwise use the
1933+
* rcu to ensure reset waits for us to complete.
1934+
*/
1935+
rcu_read_lock();
1936+
if (!adapter->tx_queues_active) {
19121937
dev_kfree_skb_any(skb);
19131938

19141939
tx_send_failed++;
@@ -1917,13 +1942,18 @@ static netdev_tx_t ibmvnic_xmit(struct sk_buff *skb, struct net_device *netdev)
19171942
goto out;
19181943
}
19191944

1945+
tx_scrq = adapter->tx_scrq[queue_num];
1946+
txq = netdev_get_tx_queue(netdev, queue_num);
1947+
ind_bufp = &tx_scrq->ind_buf;
1948+
19201949
if (ibmvnic_xmit_workarounds(skb, netdev)) {
19211950
tx_dropped++;
19221951
tx_send_failed++;
19231952
ret = NETDEV_TX_OK;
19241953
ibmvnic_tx_scrq_flush(adapter, tx_scrq);
19251954
goto out;
19261955
}
1956+
19271957
if (skb_is_gso(skb))
19281958
tx_pool = &adapter->tso_pool[queue_num];
19291959
else
@@ -2078,6 +2108,7 @@ static netdev_tx_t ibmvnic_xmit(struct sk_buff *skb, struct net_device *netdev)
20782108
netif_carrier_off(netdev);
20792109
}
20802110
out:
2111+
rcu_read_unlock();
20812112
netdev->stats.tx_dropped += tx_dropped;
20822113
netdev->stats.tx_bytes += tx_bytes;
20832114
netdev->stats.tx_packets += tx_packets;
@@ -3732,9 +3763,15 @@ static int ibmvnic_complete_tx(struct ibmvnic_adapter *adapter,
37323763
(adapter->req_tx_entries_per_subcrq / 2) &&
37333764
__netif_subqueue_stopped(adapter->netdev,
37343765
scrq->pool_index)) {
3735-
netif_wake_subqueue(adapter->netdev, scrq->pool_index);
3736-
netdev_dbg(adapter->netdev, "Started queue %d\n",
3737-
scrq->pool_index);
3766+
rcu_read_lock();
3767+
if (adapter->tx_queues_active) {
3768+
netif_wake_subqueue(adapter->netdev,
3769+
scrq->pool_index);
3770+
netdev_dbg(adapter->netdev,
3771+
"Started queue %d\n",
3772+
scrq->pool_index);
3773+
}
3774+
rcu_read_unlock();
37383775
}
37393776
}
37403777

drivers/net/ethernet/ibm/ibmvnic.h

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1006,11 +1006,14 @@ struct ibmvnic_adapter {
10061006
struct work_struct ibmvnic_reset;
10071007
struct delayed_work ibmvnic_delayed_reset;
10081008
unsigned long resetting;
1009-
bool napi_enabled, from_passive_init;
1010-
bool login_pending;
10111009
/* last device reset time */
10121010
unsigned long last_reset_time;
10131011

1012+
bool napi_enabled;
1013+
bool from_passive_init;
1014+
bool login_pending;
1015+
/* protected by rcu */
1016+
bool tx_queues_active;
10141017
bool failover_pending;
10151018
bool force_reset_recovery;
10161019

0 commit comments

Comments
 (0)