Bird: add bgp/evpn support with vxlan and anycast gateway#3475
Bird: add bgp/evpn support with vxlan and anycast gateway#3475jbemmel wants to merge 25 commits into
Conversation
Enable EVPN and VXLAN on the bird daemon, split L2 eth tables into a vxlan module config, and fold VXLAN interface setup into the vlan startup script. Co-authored-by: Cursor <cursoragent@cursor.com>
Run netlab config scripts before BIRD starts, make VXLAN setup idempotent, and activate EVPN on the existing BGP session instead of a duplicate peer. Co-authored-by: Cursor <cursoragent@cursor.com>
Enable gateway recursive and next-hop keep on EBGP EVPN channels for route reflectors, and export OSPF routes into BGP so PE devices can resolve remote VTEP addresses. Co-authored-by: Cursor <cursoragent@cursor.com>
Generate the EVPN/VXLAN startup script in Dockerfile.v2_from_src.j2 so clab build no longer needs supplemental files, and restore simple CMD startup for package-based bird and bird.v3 images. Co-authored-by: Cursor <cursoragent@cursor.com>
Containerlab connects routing protocols over lab interfaces, so Docker port publishing is not used for BGP, OSPF, or BFD. Co-authored-by: Cursor <cursoragent@cursor.com>
Keep VXLAN kernel setup with the vxlan module template and nest EVPN eth-table generation under a single vxlan guard. Co-authored-by: Cursor <cursoragent@cursor.com>
Use vxlan.j2 for kernel interface setup and vxlan@config.j2 for EVPN eth-table generation, matching the vlan.j2 versus bgp@session pattern. Co-authored-by: Cursor <cursoragent@cursor.com>
Use vxlan@mod in bird.yml to match bgp.mod.conf naming while keeping Box-safe daemon_config keys. Co-authored-by: Cursor <cursoragent@cursor.com>
Wait for netlab initial to finish before starting BIRD, create VXLAN interfaces after VLAN bridges are ready, and keep an upstream EVPN patch on file without applying it in the Docker build. Co-authored-by: Cursor <cursoragent@cursor.com>
Include the session type in each BGP protocol instance name so EVPN IBGP and link EBGP sessions to the same neighbor no longer collide. Co-authored-by: Cursor <cursoragent@cursor.com>
Set the SVI bridge MAC when the bridge is created, not on every script run. Co-authored-by: Cursor <cursoragent@cursor.com>
Drop EVPN interface polling, config pre-checks, and verbose startup logging; start BIRD once the netlab-config-done marker appears. Co-authored-by: Cursor <cursoragent@cursor.com>
Docker creates /etc/config on file bind mounts and /var/run already exists in the base image. Co-authored-by: Cursor <cursoragent@cursor.com>
Remove the single-use config_done_marker helper. Co-authored-by: Cursor <cursoragent@cursor.com>
Remove evpn-bridge-master.patch from the branch and gitignore the patches directory so it can stay local only. Co-authored-by: Cursor <cursoragent@cursor.com>
Use existing deploy success/failure state instead of a separate ran_executable flag. Co-authored-by: Cursor <cursoragent@cursor.com>
Keep patch directory handling out of the branch; local ignores can stay uncommitted. Co-authored-by: Cursor <cursoragent@cursor.com>
|
Could add a version check that aborts when the image isn't 2.19.0 or newer |
Co-authored-by: Cursor <cursoragent@cursor.com>
ipspace
left a comment
There was a problem hiding this comment.
Nice one, but I think a bunch of things might have to be changed.
Split VRRP/anycast interface creation out of frr.j2 so BIRD can reuse the data-plane script without pulling in VRRP daemon configuration. Co-authored-by: Cursor <cursoragent@cursor.com>
Drop clab-side docker exec touch logic and have every clab BIRD node run a simple config-done script that touches /var/run/netlab-config-done. Co-authored-by: Cursor <cursoragent@cursor.com>
BIRD startup now waits for netlab deploy scripts via the config-done marker, so the initial template no longer needs to poll for eth1. Co-authored-by: Cursor <cursoragent@cursor.com>
Use the same startup script in the package and v3 images so BIRD waits for netlab initial before launching the daemon. Co-authored-by: Cursor <cursoragent@cursor.com>
Inherit linux :ns deploy mode, restore host-side initial setup, slim BIRD images, and touch the config-done marker via docker exec. Co-authored-by: Cursor <cursoragent@cursor.com>
Use the provider-specific template name so containerlab initial config is selected like other clab devices. Co-authored-by: Cursor <cursoragent@cursor.com>
| #!/bin/bash | ||
| # | ||
| set -e | ||
| docker exec {{ ansible_host }} touch /var/run/netlab-config-done |
There was a problem hiding this comment.
Alternative: See #3498 where this is done after initial/bird-clab.j2 without requiring this odd 'config-done' quasi-plugin
ipspace
left a comment
There was a problem hiding this comment.
I've seen hacks before, but this one is a winner. Including VXLAN configs into VLAN script, and then generating EVPN configs in VXLAN template -- seriously?
You've stretched the definition of what a daemon is way beyond the original intent. No problem with that. However, now we have to figure out how to do data-plane configuration before the daemon is started. Reusing the same code in three Dockerfiles, templates that configure something other than what they're supposed to do, and quick hacks like your "config-done" sort-of plugin are not exactly a stellar first implementation of that concept.
I don't have much confidence in your AI friend generating a decent solution to a problem we obviously have a hard time defining. Please leave the PR as-is; I'll try to figure out a nicer way to make things work.
|
|
||
| CMD [ "bird", "-c", "/etc/bird/bird.conf", "-d" ] | ||
| # Wait for netlab initial to execute bound /etc/config scripts, then start BIRD. | ||
| RUN cat >/entrypoint.sh <<'ENTRYPOINT' |
There was a problem hiding this comment.
Embedding the same script into three Dockerfiles is a perfect recipe for reusable code :(( More later...
There was a problem hiding this comment.
"Plain" Dockerfiles and Jinja2 templates get built differently, making it harder to implement a 'universal' wait mechanism. Including the same bash script in each was the easiest way to get the consistency that was asked for - but I agree a more general core mechanism would be nicer
| @@ -0,0 +1,4 @@ | |||
| {% include 'vlan/frr.j2' +%} | |||
| {% if 'vxlan' in module|default([]) %} | |||
There was a problem hiding this comment.
Why do you need this dirty hack???
There was a problem hiding this comment.
Because daemons have 2 types of template files: Bash scripts and config file snippets
vxlan.j2 is already mapped as a config snippet, so it cannot also be a Bash script. This was a workaround to get both without core changes
If we move the logic from vxlan.j2 into evpn.j2, it frees up that file to include vxlan/frr.j2 instead and we can remove this
| @@ -0,0 +1,5 @@ | |||
| {% if vxlan.flooding|default('') == 'evpn' %} | |||
There was a problem hiding this comment.
If this thing generates content only when EVPN is enabled, don't you think it belongs to the EVPN config?
There was a problem hiding this comment.
It does fit better in evpn.j2, which already has the check
| @@ -1,11 +1,14 @@ | |||
| --- | |||
| description: BIRD Internet Routing Daemon | |||
| parent: linux | |||
There was a problem hiding this comment.
'linux' is the implicit default so it's not strictly needed
There was a problem hiding this comment.
'linux' is the implicit default so it's not strictly needed
So why did you add it?
| --- | ||
| description: BIRD Internet Routing Daemon | ||
| parent: linux | ||
| role: router |
There was a problem hiding this comment.
And why have you changed the default role?
There was a problem hiding this comment.
See #3499 as to what triggered it - I agree it would be better to fix that specific test
It could make sense that a "Routing Daemon" would have a default "router" role - but we can leave it out
There was a problem hiding this comment.
Although it's not defined (precisely) anywhere in the documentation, the daemon devices were designed to provide control-plane-only functionality (for example, BGP RR). That's also the most common BIRD use case (IXP BGP route server)
There was a problem hiding this comment.
Although it's not defined (precisely) anywhere in the documentation, the daemon devices were designed to provide control-plane-only functionality (for example, BGP RR). That's also the most common BIRD use case (IXP BGP route server)
ok, so in my mind that is a "router" role more than a "host" role. Looking at the codebase, I now realize that Linux daemons are explicitly exempted from requiring a loopback interface - is that the more common way BIRD gets deployed?
| ospf: RTS_OSPF | ||
| connected: RTS_DEVICE | ||
| static: RTS_STATIC_DEVICE,RTS_STATIC | ||
| netlab_initial: always |
There was a problem hiding this comment.
Isn't this relevant only with configuration reload?
There was a problem hiding this comment.
I initially modeled the Bird config after frr, hence this got copied - but you are right that it currently isn't used
| netlab_show_command: [ birdc, 'show $@' ] | ||
| docker_shell: bash -il | ||
| netlab_config_mode: sh | ||
| netlab_config_mode: ns |
There was a problem hiding this comment.
Isn't this inherited from Linux?
There was a problem hiding this comment.
no, but individual clab.node_config entries are
There was a problem hiding this comment.
no, but individual
clab.node_configentries are
So???
This depends on #3470 for the new 2.19.1 release
Couple of highlights:
device-module-test: 4 new cases OK
Most failures are due to missing vrf support; 10 fails because I don't have cEOS locally