Skip to content

Commit 7700fce

Browse files
Florian Westphalgregkh
authored andcommitted
netfilter: nft_set_pipapo: don't check genbit from packetpath lookups
[ Upstream commit c4eaca2 ] The pipapo set type is special in that it has two copies of its datastructure: one live copy containing only valid elements and one on-demand clone used during transaction where adds/deletes happen. This clone is not visible to the datapath. This is unlike all other set types in nftables, those all link new elements into their live hlist/tree. For those sets, the lookup functions must skip the new elements while the transaction is ongoing to ensure consistency. As the clone is shallow, removal does have an effect on the packet path: once the transaction enters the commit phase the 'gencursor' bit that determines which elements are active and which elements should be ignored (because they are no longer valid) is flipped. This causes the datapath lookup to ignore these elements if they are found during lookup. This opens up a small race window where pipapo has an inconsistent view of the dataset from when the transaction-cpu flipped the genbit until the transaction-cpu calls nft_pipapo_commit() to swap live/clone pointers: cpu0 cpu1 has added new elements to clone has marked elements as being inactive in new generation perform lookup in the set enters commit phase: I) increments the genbit A) observes new genbit removes elements from the clone so they won't be found anymore B) lookup in datastructure can't see new elements yet, but old elements are ignored -> Only matches elements that were not changed in the transaction II) calls nft_pipapo_commit(), clone and live pointers are swapped. C New nft_lookup happening now will find matching elements. Consider a packet matching range r1-r2: cpu0 processes following transaction: 1. remove r1-r2 2. add r1-r3 P is contained in both ranges. Therefore, cpu1 should always find a match for P. Due to above race, this is not the case: cpu1 does find r1-r2, but then ignores it due to the genbit indicating the range has been removed. At the same time, r1-r3 is not visible yet, because it can only be found in the clone. The situation persists for all lookups until after cpu0 hits II). The fix is easy: Don't check the genbit from pipapo lookup functions. This is possible because unlike the other set types, the new elements are not reachable from the live copy of the dataset. The clone/live pointer swap is enough to avoid matching on old elements while at the same time all new elements are exposed in one go. After this change, step B above returns a match in r1-r2. This is fine: r1-r2 only becomes truly invalid the moment they get freed. This happens after a synchronize_rcu() call and rcu read lock is held via netfilter hook traversal (nf_hook_slow()). Cc: Stefano Brivio <sbrivio@redhat.com> Fixes: 3c4287f ("nf_tables: Add set type for arbitrary concatenation of ranges") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
1 parent 9eab0ef commit 7700fce

2 files changed

Lines changed: 19 additions & 5 deletions

File tree

net/netfilter/nft_set_pipapo.c

Lines changed: 18 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -511,6 +511,23 @@ static struct nft_pipapo_elem *pipapo_get(const struct nft_pipapo_match *m,
511511
*
512512
* This function is called from the data path. It will search for
513513
* an element matching the given key in the current active copy.
514+
* Unlike other set types, this uses NFT_GENMASK_ANY instead of
515+
* nft_genmask_cur().
516+
*
517+
* This is because new (future) elements are not reachable from
518+
* priv->match, they get added to priv->clone instead.
519+
* When the commit phase flips the generation bitmask, the
520+
* 'now old' entries are skipped but without the 'now current'
521+
* elements becoming visible. Using nft_genmask_cur() thus creates
522+
* inconsistent state: matching old entries get skipped but thew
523+
* newly matching entries are unreachable.
524+
*
525+
* GENMASK will still find the 'now old' entries which ensures consistent
526+
* priv->match view.
527+
*
528+
* nft_pipapo_commit swaps ->clone and ->match shortly after the
529+
* genbit flip. As ->clone doesn't contain the old entries in the first
530+
* place, lookup will only find the now-current ones.
514531
*
515532
* Return: ntables API extension pointer or NULL if no match.
516533
*/
@@ -519,12 +536,11 @@ nft_pipapo_lookup(const struct net *net, const struct nft_set *set,
519536
const u32 *key)
520537
{
521538
struct nft_pipapo *priv = nft_set_priv(set);
522-
u8 genmask = nft_genmask_cur(net);
523539
const struct nft_pipapo_match *m;
524540
const struct nft_pipapo_elem *e;
525541

526542
m = rcu_dereference(priv->match);
527-
e = pipapo_get(m, (const u8 *)key, genmask, get_jiffies_64());
543+
e = pipapo_get(m, (const u8 *)key, NFT_GENMASK_ANY, get_jiffies_64());
528544

529545
return e ? &e->ext : NULL;
530546
}

net/netfilter/nft_set_pipapo_avx2.c

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1153,7 +1153,6 @@ nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set,
11531153
struct nft_pipapo *priv = nft_set_priv(set);
11541154
const struct nft_set_ext *ext = NULL;
11551155
struct nft_pipapo_scratch *scratch;
1156-
u8 genmask = nft_genmask_cur(net);
11571156
const struct nft_pipapo_match *m;
11581157
const struct nft_pipapo_field *f;
11591158
const u8 *rp = (const u8 *)key;
@@ -1249,8 +1248,7 @@ nft_pipapo_avx2_lookup(const struct net *net, const struct nft_set *set,
12491248
if (last) {
12501249
const struct nft_set_ext *e = &f->mt[ret].e->ext;
12511250

1252-
if (unlikely(nft_set_elem_expired(e) ||
1253-
!nft_set_elem_active(e, genmask)))
1251+
if (unlikely(nft_set_elem_expired(e)))
12541252
goto next_match;
12551253

12561254
ext = e;

0 commit comments

Comments
 (0)