Skip to content

Commit 950fe35

Browse files
committed
Merge branch 'ipv6-expired-routes'
Kui-Feng Lee says: ==================== Remove expired routes with a separated list of routes. FIB6 GC walks trees of fib6_tables to remove expired routes. Walking a tree can be expensive if the number of routes in a table is big, even if most of them are permanent. Checking routes in a separated list of routes having expiration will avoid this potential issue. Background ========== The size of a Linux IPv6 routing table can become a big problem if not managed appropriately. Now, Linux has a garbage collector to remove expired routes periodically. However, this may lead to a situation in which the routing path is blocked for a long period due to an excessive number of routes. For example, years ago, there is a commit c7bb4b8 ("ipv6: tcp: drop silly ICMPv6 packet too big messages"). The root cause is that malicious ICMPv6 packets were sent back for every small packet sent to them. These packets add routes with an expiration time that prompts the GC to periodically check all routes in the tables, including permanent ones. Why Route Expires ================= Users can add IPv6 routes with an expiration time manually. However, the Neighbor Discovery protocol may also generate routes that can expire. For example, Router Advertisement (RA) messages may create a default route with an expiration time. [RFC 4861] For IPv4, it is not possible to set an expiration time for a route, and there is no RA, so there is no need to worry about such issues. Create Routes with Expires ========================== You can create routes with expires with the command. For example, ip -6 route add 2001:b000:591::3 via fe80::5054:ff:fe12:3457 \ dev enp0s3 expires 30 The route that has been generated will be deleted automatically in 30 seconds. GC of FIB6 ========== The function called fib6_run_gc() is responsible for performing garbage collection (GC) for the Linux IPv6 stack. It checks for the expiration of every route by traversing the trees of routing tables. The time taken to traverse a routing table increases with its size. Holding the routing table lock during traversal is particularly undesirable. Therefore, it is preferable to keep the lock for the shortest possible duration. Solution ======== The cause of the issue is keeping the routing table locked during the traversal of large trees. To solve this problem, we can create a separate list of routes that have expiration. This will prevent GC from checking permanent routes. Result ====== We conducted a test to measure the execution times of fib6_gc_timer_cb() and observed that it enhances the GC of FIB6. During the test, we added permanent routes with the following numbers: 1000, 3000, 6000, and 9000. Additionally, we added a route with an expiration time. Here are the average execution times for the kernel without the patch. - 120020 ns with 1000 permanent routes - 308920 ns with 3000 ... - 581470 ns with 6000 ... - 855310 ns with 9000 ... The kernel with the patch consistently takes around 14000 ns to execute, regardless of the number of permanent routes that are installed. Major changes from v7: - Fix warings raised by the patchwork. Major changes from v6: - Remove unnecessary check of tb6 in fib6_clean_expires_locked(). - Use ib6_clean_expires_locked() instead in fib6_purge_rt(). Major changes from v5: - Change the order of adding new routes to the GC list and starting GC timer. - Remove time measurements from the test case. - Stop forcing GC flush. Major changes from v4: - Detect existence of 'strace' in the test case. Major changes from v3: - Fix the type of arg according to feedback. - Add 1k temporary routes and 5K permanent routes in the test case. Measure time spending on GC with strace. Major changes from v2: - Remove unnecessary and incorrect sysctl restoring in the test case. Major changes from v1: - Moved gc_link to avoid creating a hole in fib6_info. - Moved fib6_set_expires*() and fib6_clean_expires*() to the header file and inlined. And removed duplicated lines. - Added a test case. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2 parents d147085 + a63e10d commit 950fe35

4 files changed

Lines changed: 172 additions & 25 deletions

File tree

include/net/ip6_fib.h

Lines changed: 51 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -179,6 +179,9 @@ struct fib6_info {
179179

180180
refcount_t fib6_ref;
181181
unsigned long expires;
182+
183+
struct hlist_node gc_link;
184+
182185
struct dst_metrics *fib6_metrics;
183186
#define fib6_pmtu fib6_metrics->metrics[RTAX_MTU-1]
184187

@@ -247,26 +250,18 @@ static inline bool fib6_requires_src(const struct fib6_info *rt)
247250
return rt->fib6_src.plen > 0;
248251
}
249252

250-
static inline void fib6_clean_expires(struct fib6_info *f6i)
251-
{
252-
f6i->fib6_flags &= ~RTF_EXPIRES;
253-
f6i->expires = 0;
254-
}
255-
256-
static inline void fib6_set_expires(struct fib6_info *f6i,
257-
unsigned long expires)
258-
{
259-
f6i->expires = expires;
260-
f6i->fib6_flags |= RTF_EXPIRES;
261-
}
262-
263253
static inline bool fib6_check_expired(const struct fib6_info *f6i)
264254
{
265255
if (f6i->fib6_flags & RTF_EXPIRES)
266256
return time_after(jiffies, f6i->expires);
267257
return false;
268258
}
269259

260+
static inline bool fib6_has_expires(const struct fib6_info *f6i)
261+
{
262+
return f6i->fib6_flags & RTF_EXPIRES;
263+
}
264+
270265
/* Function to safely get fn->fn_sernum for passed in rt
271266
* and store result in passed in cookie.
272267
* Return true if we can get cookie safely
@@ -388,6 +383,7 @@ struct fib6_table {
388383
struct inet_peer_base tb6_peers;
389384
unsigned int flags;
390385
unsigned int fib_seq;
386+
struct hlist_head tb6_gc_hlist; /* GC candidates */
391387
#define RT6_TABLE_HAS_DFLT_ROUTER BIT(0)
392388
};
393389

@@ -504,6 +500,48 @@ void fib6_gc_cleanup(void);
504500

505501
int fib6_init(void);
506502

503+
/* fib6_info must be locked by the caller, and fib6_info->fib6_table can be
504+
* NULL.
505+
*/
506+
static inline void fib6_set_expires_locked(struct fib6_info *f6i,
507+
unsigned long expires)
508+
{
509+
struct fib6_table *tb6;
510+
511+
tb6 = f6i->fib6_table;
512+
f6i->expires = expires;
513+
if (tb6 && !fib6_has_expires(f6i))
514+
hlist_add_head(&f6i->gc_link, &tb6->tb6_gc_hlist);
515+
f6i->fib6_flags |= RTF_EXPIRES;
516+
}
517+
518+
/* fib6_info must be locked by the caller, and fib6_info->fib6_table can be
519+
* NULL. If fib6_table is NULL, the fib6_info will no be inserted into the
520+
* list of GC candidates until it is inserted into a table.
521+
*/
522+
static inline void fib6_set_expires(struct fib6_info *f6i,
523+
unsigned long expires)
524+
{
525+
spin_lock_bh(&f6i->fib6_table->tb6_lock);
526+
fib6_set_expires_locked(f6i, expires);
527+
spin_unlock_bh(&f6i->fib6_table->tb6_lock);
528+
}
529+
530+
static inline void fib6_clean_expires_locked(struct fib6_info *f6i)
531+
{
532+
if (fib6_has_expires(f6i))
533+
hlist_del_init(&f6i->gc_link);
534+
f6i->fib6_flags &= ~RTF_EXPIRES;
535+
f6i->expires = 0;
536+
}
537+
538+
static inline void fib6_clean_expires(struct fib6_info *f6i)
539+
{
540+
spin_lock_bh(&f6i->fib6_table->tb6_lock);
541+
fib6_clean_expires_locked(f6i);
542+
spin_unlock_bh(&f6i->fib6_table->tb6_lock);
543+
}
544+
507545
struct ipv6_route_iter {
508546
struct seq_net_private p;
509547
struct fib6_walker w;

net/ipv6/ip6_fib.c

Lines changed: 49 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -160,6 +160,8 @@ struct fib6_info *fib6_info_alloc(gfp_t gfp_flags, bool with_fib6_nh)
160160
INIT_LIST_HEAD(&f6i->fib6_siblings);
161161
refcount_set(&f6i->fib6_ref, 1);
162162

163+
INIT_HLIST_NODE(&f6i->gc_link);
164+
163165
return f6i;
164166
}
165167

@@ -246,6 +248,7 @@ static struct fib6_table *fib6_alloc_table(struct net *net, u32 id)
246248
net->ipv6.fib6_null_entry);
247249
table->tb6_root.fn_flags = RTN_ROOT | RTN_TL_ROOT | RTN_RTINFO;
248250
inet_peer_base_init(&table->tb6_peers);
251+
INIT_HLIST_HEAD(&table->tb6_gc_hlist);
249252
}
250253

251254
return table;
@@ -1057,6 +1060,8 @@ static void fib6_purge_rt(struct fib6_info *rt, struct fib6_node *fn,
10571060
lockdep_is_held(&table->tb6_lock));
10581061
}
10591062
}
1063+
1064+
fib6_clean_expires_locked(rt);
10601065
}
10611066

10621067
/*
@@ -1118,9 +1123,10 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct fib6_info *rt,
11181123
if (!(iter->fib6_flags & RTF_EXPIRES))
11191124
return -EEXIST;
11201125
if (!(rt->fib6_flags & RTF_EXPIRES))
1121-
fib6_clean_expires(iter);
1126+
fib6_clean_expires_locked(iter);
11221127
else
1123-
fib6_set_expires(iter, rt->expires);
1128+
fib6_set_expires_locked(iter,
1129+
rt->expires);
11241130

11251131
if (rt->fib6_pmtu)
11261132
fib6_metric_set(iter, RTAX_MTU,
@@ -1479,6 +1485,10 @@ int fib6_add(struct fib6_node *root, struct fib6_info *rt,
14791485
if (rt->nh)
14801486
list_add(&rt->nh_list, &rt->nh->f6i_list);
14811487
__fib6_update_sernum_upto_root(rt, fib6_new_sernum(info->nl_net));
1488+
1489+
if (fib6_has_expires(rt))
1490+
hlist_add_head(&rt->gc_link, &table->tb6_gc_hlist);
1491+
14821492
fib6_start_gc(info->nl_net, rt);
14831493
}
14841494

@@ -2285,17 +2295,16 @@ static void fib6_flush_trees(struct net *net)
22852295
* Garbage collection
22862296
*/
22872297

2288-
static int fib6_age(struct fib6_info *rt, void *arg)
2298+
static int fib6_age(struct fib6_info *rt, struct fib6_gc_args *gc_args)
22892299
{
2290-
struct fib6_gc_args *gc_args = arg;
22912300
unsigned long now = jiffies;
22922301

22932302
/*
22942303
* check addrconf expiration here.
22952304
* Routes are expired even if they are in use.
22962305
*/
22972306

2298-
if (rt->fib6_flags & RTF_EXPIRES && rt->expires) {
2307+
if (fib6_has_expires(rt) && rt->expires) {
22992308
if (time_after(now, rt->expires)) {
23002309
RT6_TRACE("expiring %p\n", rt);
23012310
return -1;
@@ -2312,6 +2321,40 @@ static int fib6_age(struct fib6_info *rt, void *arg)
23122321
return 0;
23132322
}
23142323

2324+
static void fib6_gc_table(struct net *net,
2325+
struct fib6_table *tb6,
2326+
struct fib6_gc_args *gc_args)
2327+
{
2328+
struct fib6_info *rt;
2329+
struct hlist_node *n;
2330+
struct nl_info info = {
2331+
.nl_net = net,
2332+
.skip_notify = false,
2333+
};
2334+
2335+
hlist_for_each_entry_safe(rt, n, &tb6->tb6_gc_hlist, gc_link)
2336+
if (fib6_age(rt, gc_args) == -1)
2337+
fib6_del(rt, &info);
2338+
}
2339+
2340+
static void fib6_gc_all(struct net *net, struct fib6_gc_args *gc_args)
2341+
{
2342+
struct fib6_table *table;
2343+
struct hlist_head *head;
2344+
unsigned int h;
2345+
2346+
rcu_read_lock();
2347+
for (h = 0; h < FIB6_TABLE_HASHSZ; h++) {
2348+
head = &net->ipv6.fib_table_hash[h];
2349+
hlist_for_each_entry_rcu(table, head, tb6_hlist) {
2350+
spin_lock_bh(&table->tb6_lock);
2351+
fib6_gc_table(net, table, gc_args);
2352+
spin_unlock_bh(&table->tb6_lock);
2353+
}
2354+
}
2355+
rcu_read_unlock();
2356+
}
2357+
23152358
void fib6_run_gc(unsigned long expires, struct net *net, bool force)
23162359
{
23172360
struct fib6_gc_args gc_args;
@@ -2327,7 +2370,7 @@ void fib6_run_gc(unsigned long expires, struct net *net, bool force)
23272370
net->ipv6.sysctl.ip6_rt_gc_interval;
23282371
gc_args.more = 0;
23292372

2330-
fib6_clean_all(net, fib6_age, &gc_args);
2373+
fib6_gc_all(net, &gc_args);
23312374
now = jiffies;
23322375
net->ipv6.ip6_rt_last_gc = now;
23332376

net/ipv6/route.c

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3761,10 +3761,10 @@ static struct fib6_info *ip6_route_info_create(struct fib6_config *cfg,
37613761
rt->dst_nocount = true;
37623762

37633763
if (cfg->fc_flags & RTF_EXPIRES)
3764-
fib6_set_expires(rt, jiffies +
3765-
clock_t_to_jiffies(cfg->fc_expires));
3764+
fib6_set_expires_locked(rt, jiffies +
3765+
clock_t_to_jiffies(cfg->fc_expires));
37663766
else
3767-
fib6_clean_expires(rt);
3767+
fib6_clean_expires_locked(rt);
37683768

37693769
if (cfg->fc_protocol == RTPROT_UNSPEC)
37703770
cfg->fc_protocol = RTPROT_BOOT;

tools/testing/selftests/net/fib_tests.sh

Lines changed: 69 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,13 +9,16 @@ ret=0
99
ksft_skip=4
1010

1111
# all tests in this script. Can be overridden with -t option
12-
TESTS="unregister down carrier nexthop suppress ipv6_notify ipv4_notify ipv6_rt ipv4_rt ipv6_addr_metric ipv4_addr_metric ipv6_route_metrics ipv4_route_metrics ipv4_route_v6_gw rp_filter ipv4_del_addr ipv4_mangle ipv6_mangle ipv4_bcast_neigh"
12+
TESTS="unregister down carrier nexthop suppress ipv6_notify ipv4_notify \
13+
ipv6_rt ipv4_rt ipv6_addr_metric ipv4_addr_metric ipv6_route_metrics \
14+
ipv4_route_metrics ipv4_route_v6_gw rp_filter ipv4_del_addr \
15+
ipv4_mangle ipv6_mangle ipv4_bcast_neigh fib6_gc_test"
1316

1417
VERBOSE=0
1518
PAUSE_ON_FAIL=no
1619
PAUSE=no
17-
IP="ip -netns ns1"
18-
NS_EXEC="ip netns exec ns1"
20+
IP="$(which ip) -netns ns1"
21+
NS_EXEC="$(which ip) netns exec ns1"
1922

2023
which ping6 > /dev/null 2>&1 && ping6=$(which ping6) || ping6=$(which ping)
2124

@@ -747,6 +750,68 @@ fib_notify_test()
747750
cleanup &> /dev/null
748751
}
749752

753+
fib6_gc_test()
754+
{
755+
setup
756+
757+
echo
758+
echo "Fib6 garbage collection test"
759+
set -e
760+
761+
EXPIRE=3
762+
763+
# Check expiration of routes every $EXPIRE seconds (GC)
764+
$NS_EXEC sysctl -wq net.ipv6.route.gc_interval=$EXPIRE
765+
766+
$IP link add dummy_10 type dummy
767+
$IP link set dev dummy_10 up
768+
$IP -6 address add 2001:10::1/64 dev dummy_10
769+
770+
$NS_EXEC sysctl -wq net.ipv6.route.flush=1
771+
772+
# Temporary routes
773+
for i in $(seq 1 1000); do
774+
# Expire route after $EXPIRE seconds
775+
$IP -6 route add 2001:20::$i \
776+
via 2001:10::2 dev dummy_10 expires $EXPIRE
777+
done
778+
sleep $(($EXPIRE * 2))
779+
N_EXP_SLEEP=$($IP -6 route list |grep expires|wc -l)
780+
if [ $N_EXP_SLEEP -ne 0 ]; then
781+
echo "FAIL: expected 0 routes with expires, got $N_EXP_SLEEP"
782+
ret=1
783+
else
784+
ret=0
785+
fi
786+
787+
# Permanent routes
788+
for i in $(seq 1 5000); do
789+
$IP -6 route add 2001:30::$i \
790+
via 2001:10::2 dev dummy_10
791+
done
792+
# Temporary routes
793+
for i in $(seq 1 1000); do
794+
# Expire route after $EXPIRE seconds
795+
$IP -6 route add 2001:20::$i \
796+
via 2001:10::2 dev dummy_10 expires $EXPIRE
797+
done
798+
sleep $(($EXPIRE * 2))
799+
N_EXP_SLEEP=$($IP -6 route list |grep expires|wc -l)
800+
if [ $N_EXP_SLEEP -ne 0 ]; then
801+
echo "FAIL: expected 0 routes with expires," \
802+
"got $N_EXP_SLEEP (5000 permanent routes)"
803+
ret=1
804+
else
805+
ret=0
806+
fi
807+
808+
set +e
809+
810+
log_test $ret 0 "ipv6 route garbage collection"
811+
812+
cleanup &> /dev/null
813+
}
814+
750815
fib_suppress_test()
751816
{
752817
echo
@@ -2217,6 +2282,7 @@ do
22172282
ipv4_mangle) ipv4_mangle_test;;
22182283
ipv6_mangle) ipv6_mangle_test;;
22192284
ipv4_bcast_neigh) ipv4_bcast_neigh_test;;
2285+
fib6_gc_test|ipv6_gc) fib6_gc_test;;
22202286

22212287
help) echo "Test names: $TESTS"; exit 0;;
22222288
esac

0 commit comments

Comments
 (0)