Skip to content

Bird: add bgp/evpn support with vxlan and anycast gateway#3475

Open
jbemmel wants to merge 25 commits into
ipspace:devfrom
jbemmel:feature/bird-evpn
Open

Bird: add bgp/evpn support with vxlan and anycast gateway#3475
jbemmel wants to merge 25 commits into
ipspace:devfrom
jbemmel:feature/bird-evpn

Conversation

@jbemmel

@jbemmel jbemmel commented Jun 14, 2026

Copy link
Copy Markdown
Collaborator

This depends on #3470 for the new 2.19.1 release

  • Adds support for evpn, vxlan and gateway (anycast) modules
  • Reusing FRR logic/scripts for VLAN and VXLAN provisioning

Couple of highlights:

  • Needed a way to sync daemon startup with Netlab running config scripts -> bird-specific config script touches a 'done' file that the container waits for
  • Had to make frr scripts more idempotent and reusable in some places
  • Retains bird :ns provisioning
  • Putting bird error log on stderr so that docker logs shows it - including when using older version with EVPN
  • bird/vlan.j2 is a bash script while bird/vxlan.j2 is config, might benefit from cleaner separation (but then lookup paths need to be changed)

device-module-test: 4 new cases OK

./device-module-test -d bird -p clab evpn
Pre-test cleanup: 
* 19:53:59 evpn/01-vxlan-bridging.yml: create(ok) up(ok) config(ok) validate(ok) cleanup(ok) OK
* 19:54:26 evpn/02-vxlan-asymmetric-irb.yml: create(ok) up(ok) config(ok) validate(ok) cleanup(ok) OK
19:54:48 evpn/03-vxlan-symmetric-irb.yml: create(FAIL) 
19:54:48 evpn/04-vxlan-central-routing.yml: create(FAIL) 
19:54:49 evpn/05-vxlan-l3only.yml: create(FAIL) 
19:54:50 evpn/06-vxlan-bridging-vlan-bundle.yml: create(FAIL) 
19:54:50 evpn/10-vxlan-rr.yml: create(ok) up(FAIL) cleanup(ok)
* 19:54:52 evpn/11-vxlan-ebgp.yml: create(ok) up(ok) config(ok) validate(ok) cleanup(ok) OK
* 19:55:13 evpn/12-vxlan-ibgp-ebgp.yml: create(ok) up(ok) config(ok) validate(ok) cleanup(ok) OK
19:55:22 evpn/13-vxlan-ebgp-allowas.yml: create(FAIL) 
19:55:23 evpn/14-vxlan-ebgp-ebgp.yml: create(FAIL) 
19:55:23 evpn/15-vxlan-ebgp-unnumbered.yml: create(FAIL) 
19:55:24 evpn/20-vxlan-irb-ospf.yml: create(FAIL) 
19:55:25 evpn/21-bgp-ce-router.yml: create(FAIL) 
19:55:25 evpn/22-ospf-ce-router.yml: create(FAIL) 
19:55:26 evpn/30-cs-bridging.yml: create(FAIL) 
19:55:26 evpn/41-vxlan-ipv6-bridging.yml: create(FAIL) 
19:55:27 evpn/51-mpls-bridging.yml: create(FAIL) 
19:55:28 evpn/52-mpls-asymmetric-irb.yml: create(FAIL) 
19:55:28 evpn/53-evpn-l3only.yml: create(FAIL) 
19:55:29 evpn/54-evpn-l3vpn.yml: create(FAIL) 
19:55:29 evpn/61-sr-bridging.yml: create(FAIL) 

Most failures are due to missing vrf support; 10 fails because I don't have cEOS locally

jbemmel and others added 17 commits June 13, 2026 09:01
Enable EVPN and VXLAN on the bird daemon, split L2 eth tables into a vxlan module config, and fold VXLAN interface setup into the vlan startup script.

Co-authored-by: Cursor <cursoragent@cursor.com>
Run netlab config scripts before BIRD starts, make VXLAN setup idempotent, and activate EVPN on the existing BGP session instead of a duplicate peer.

Co-authored-by: Cursor <cursoragent@cursor.com>
Enable gateway recursive and next-hop keep on EBGP EVPN channels for route reflectors, and export OSPF routes into BGP so PE devices can resolve remote VTEP addresses.

Co-authored-by: Cursor <cursoragent@cursor.com>
Generate the EVPN/VXLAN startup script in Dockerfile.v2_from_src.j2 so
clab build no longer needs supplemental files, and restore simple CMD
startup for package-based bird and bird.v3 images.

Co-authored-by: Cursor <cursoragent@cursor.com>
Containerlab connects routing protocols over lab interfaces, so Docker
port publishing is not used for BGP, OSPF, or BFD.

Co-authored-by: Cursor <cursoragent@cursor.com>
Keep VXLAN kernel setup with the vxlan module template and nest EVPN
eth-table generation under a single vxlan guard.

Co-authored-by: Cursor <cursoragent@cursor.com>
Use vxlan.j2 for kernel interface setup and vxlan@config.j2 for EVPN
eth-table generation, matching the vlan.j2 versus bgp@session pattern.

Co-authored-by: Cursor <cursoragent@cursor.com>
Use vxlan@mod in bird.yml to match bgp.mod.conf naming while keeping
Box-safe daemon_config keys.

Co-authored-by: Cursor <cursoragent@cursor.com>
Wait for netlab initial to finish before starting BIRD, create VXLAN
interfaces after VLAN bridges are ready, and keep an upstream EVPN patch
on file without applying it in the Docker build.

Co-authored-by: Cursor <cursoragent@cursor.com>
Include the session type in each BGP protocol instance name so EVPN IBGP and link EBGP sessions to the same neighbor no longer collide.

Co-authored-by: Cursor <cursoragent@cursor.com>
Set the SVI bridge MAC when the bridge is created, not on every script run.

Co-authored-by: Cursor <cursoragent@cursor.com>
Drop EVPN interface polling, config pre-checks, and verbose startup logging; start BIRD once the netlab-config-done marker appears.

Co-authored-by: Cursor <cursoragent@cursor.com>
Docker creates /etc/config on file bind mounts and /var/run already exists in the base image.

Co-authored-by: Cursor <cursoragent@cursor.com>
Remove the single-use config_done_marker helper.

Co-authored-by: Cursor <cursoragent@cursor.com>
Remove evpn-bridge-master.patch from the branch and gitignore the patches directory so it can stay local only.

Co-authored-by: Cursor <cursoragent@cursor.com>
Use existing deploy success/failure state instead of a separate ran_executable flag.

Co-authored-by: Cursor <cursoragent@cursor.com>
Keep patch directory handling out of the branch; local ignores can stay uncommitted.

Co-authored-by: Cursor <cursoragent@cursor.com>
@jbemmel

jbemmel commented Jun 14, 2026

Copy link
Copy Markdown
Collaborator Author

Could add a version check that aborts when the image isn't 2.19.0 or newer

Co-authored-by: Cursor <cursoragent@cursor.com>

@ipspace ipspace left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one, but I think a bunch of things might have to be changed.

Comment thread netsim/ansible/templates/gateway/frr.j2 Outdated
Comment thread netsim/ansible/templates/initial/bird.j2 Outdated
Comment thread netsim/ansible/templates/vxlan/frr.j2
Comment thread netsim/daemons/bird/Dockerfile.v2_from_src.j2
Comment thread netsim/providers/clab/configs.py Outdated
Comment thread netsim/daemons/bird.yml Outdated
@jbemmel jbemmel marked this pull request as draft June 15, 2026 23:27
jbemmel and others added 6 commits June 15, 2026 18:30
Split VRRP/anycast interface creation out of frr.j2 so BIRD can reuse
the data-plane script without pulling in VRRP daemon configuration.

Co-authored-by: Cursor <cursoragent@cursor.com>
Drop clab-side docker exec touch logic and have every clab BIRD node
run a simple config-done script that touches /var/run/netlab-config-done.

Co-authored-by: Cursor <cursoragent@cursor.com>
BIRD startup now waits for netlab deploy scripts via the config-done
marker, so the initial template no longer needs to poll for eth1.

Co-authored-by: Cursor <cursoragent@cursor.com>
Use the same startup script in the package and v3 images so BIRD waits
for netlab initial before launching the daemon.

Co-authored-by: Cursor <cursoragent@cursor.com>
Inherit linux :ns deploy mode, restore host-side initial setup, slim BIRD
images, and touch the config-done marker via docker exec.

Co-authored-by: Cursor <cursoragent@cursor.com>
Use the provider-specific template name so containerlab initial config is selected like other clab devices.

Co-authored-by: Cursor <cursoragent@cursor.com>
@jbemmel jbemmel marked this pull request as ready for review June 16, 2026 01:09
#!/bin/bash
#
set -e
docker exec {{ ansible_host }} touch /var/run/netlab-config-done

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternative: See #3498 where this is done after initial/bird-clab.j2 without requiring this odd 'config-done' quasi-plugin

@ipspace ipspace left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen hacks before, but this one is a winner. Including VXLAN configs into VLAN script, and then generating EVPN configs in VXLAN template -- seriously?

You've stretched the definition of what a daemon is way beyond the original intent. No problem with that. However, now we have to figure out how to do data-plane configuration before the daemon is started. Reusing the same code in three Dockerfiles, templates that configure something other than what they're supposed to do, and quick hacks like your "config-done" sort-of plugin are not exactly a stellar first implementation of that concept.

I don't have much confidence in your AI friend generating a decent solution to a problem we obviously have a hard time defining. Please leave the PR as-is; I'll try to figure out a nicer way to make things work.

Comment thread netsim/daemons/bird/bgp.j2 Outdated

CMD [ "bird", "-c", "/etc/bird/bird.conf", "-d" ]
# Wait for netlab initial to execute bound /etc/config scripts, then start BIRD.
RUN cat >/entrypoint.sh <<'ENTRYPOINT'

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Embedding the same script into three Dockerfiles is a perfect recipe for reusable code :(( More later...

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Plain" Dockerfiles and Jinja2 templates get built differently, making it harder to implement a 'universal' wait mechanism. Including the same bash script in each was the easiest way to get the consistency that was asked for - but I agree a more general core mechanism would be nicer

@@ -0,0 +1,4 @@
{% include 'vlan/frr.j2' +%}
{% if 'vxlan' in module|default([]) %}

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need this dirty hack???

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because daemons have 2 types of template files: Bash scripts and config file snippets

vxlan.j2 is already mapped as a config snippet, so it cannot also be a Bash script. This was a workaround to get both without core changes

If we move the logic from vxlan.j2 into evpn.j2, it frees up that file to include vxlan/frr.j2 instead and we can remove this

@@ -0,0 +1,5 @@
{% if vxlan.flooding|default('') == 'evpn' %}

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this thing generates content only when EVPN is enabled, don't you think it belongs to the EVPN config?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does fit better in evpn.j2, which already has the check

Comment thread netsim/daemons/bird.yml
@@ -1,11 +1,14 @@
---
description: BIRD Internet Routing Daemon
parent: linux

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why exactly do you need this?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'linux' is the implicit default so it's not strictly needed

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'linux' is the implicit default so it's not strictly needed

So why did you add it?

Comment thread netsim/daemons/bird.yml
---
description: BIRD Internet Routing Daemon
parent: linux
role: router

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And why have you changed the default role?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #3499 as to what triggered it - I agree it would be better to fix that specific test

It could make sense that a "Routing Daemon" would have a default "router" role - but we can leave it out

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although it's not defined (precisely) anywhere in the documentation, the daemon devices were designed to provide control-plane-only functionality (for example, BGP RR). That's also the most common BIRD use case (IXP BGP route server)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although it's not defined (precisely) anywhere in the documentation, the daemon devices were designed to provide control-plane-only functionality (for example, BGP RR). That's also the most common BIRD use case (IXP BGP route server)

ok, so in my mind that is a "router" role more than a "host" role. Looking at the codebase, I now realize that Linux daemons are explicitly exempted from requiring a loopback interface - is that the more common way BIRD gets deployed?

Comment thread netsim/daemons/bird.yml
ospf: RTS_OSPF
connected: RTS_DEVICE
static: RTS_STATIC_DEVICE,RTS_STATIC
netlab_initial: always

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this relevant only with configuration reload?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially modeled the Bird config after frr, hence this got copied - but you are right that it currently isn't used

Comment thread netsim/daemons/bird.yml
netlab_show_command: [ birdc, 'show $@' ]
docker_shell: bash -il
netlab_config_mode: sh
netlab_config_mode: ns

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this inherited from Linux?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, but individual clab.node_config entries are

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, but individual clab.node_config entries are

So???

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants