I’m digging into the backlog today! I’ve had these thoughts jotted down since trying to solve a problem on another OpenStack all-in-one box a few weeks ago, and I’m glad to finally get it finished. So without further ado, let’s jump in!
The Questions
I have already covered Open vSwitch and OpenStack networking in the following two articles:
There have been some unanswered questions for me however:
Why VLAN trunking anomalies seem to be present on patch ports
If one looks at the output of ovs-vsctl show
, some confusion may ensue. For example, there are several VLAN tags there, but if all of them are trunked across (as is the behaviour of a patch port), which VLAN wins? Do any? How is this even working?
Bridge br-int fail_mode: secure Port "foo" tag: 4 Interface "foo" type: internal Port "bar" tag: 3 Interface "bar" type: internal Port "jack" tag: 1 Interface "jack" Port "jill" tag: 2 Interface "jill" type: internal Port br-int Interface br-int type: internal Port "int-br-ex" Interface "int-br-ex" type: patch options: {peer="phy-br-ex"} Bridge br-ex Port "enp2s0" Interface "enp2s0" Port phy-br-ex Interface phy-br-ex type: patch options: {peer=int-br-ex} Port br-ex Interface br-ex type: internal ovs_version: "2.1.3"
How does OpenStack handle several layer 3 networks over the same router
My other question was – observing that OpenStack does not create any sort of VM for routing or what not – how does routing even work? I mean, managing ultimately what could be thousands of tenant networks and possibly dozens or even hundreds of external networks can get pretty messy, I would imagine.
The answers were pretty clear, once I dug a bit deeper.
Open vSwitch, and Integration to External Bridge Mapping
The OpenStack integration bridge will maps to two kinds of bridges, depending on where in the architecture is looked at:
- The external bridge (as shown above) – this is generally done on network nodes and my all-in-one setup
- The tunnel bridge (not shown above to save space) – this is done on regular compute nodes, for example
This is specifically denoted by the two patch ports in each bridge:
# br-int Port "int-br-ex" Interface "int-br-ex" type: patch options: {peer="phy-br-ex"} # br-ex Port phy-br-ex Interface phy-br-ex type: patch options: {peer=int-br-ex}
As mentioned, all VLANs are passed over a bridge. Think of it as a trunk port on a physical switch that is set to match all VLANs. The vSwitch layer of OVS does not perform any sort of selective VLAN mapping.
So if all VLANs going over this port are tagged, then how do we make sense of what we see in the external bridge, which has no tags at all? All ports are either untagged or are trunks, so just looking at this at face value, it would seem that like a bad configuration.
Not necessarily.
OpenFlow Magic on External Bridge
The switch layer is only half the story when deal with Open vSwitch. The second part is what happens with OpenFlow on the external bridge:
# ovs-ofctl dump-flows br-ex NXST_FLOW reply (xid=0x4): cookie=0x0, duration=3776.846s, table=0, n_packets=5106, n_bytes=1142456, idle_age=0, priority=1 actions=NORMAL cookie=0x0, duration=3654.201s, table=0, n_packets=0, n_bytes=0, idle_age=3654, priority=4,in_port=2,dl_vlan=1 actions=strip_vlan,NORMAL cookie=0x0, duration=3776.341s, table=0, n_packets=132, n_bytes=10608, idle_age=3703, priority=2,in_port=2 actions=drop
The second rule is the specific one we want to pay attention to. This rule contains the strip_vlan
action, which actually removes any tags outgoing on this port, matching off VLAN 1. So any traffic coming into port 2 on the external bridge (basically the peer port to the integration bridge) off of VLAN 1 (which one would assume is the external network), will have its VLAN stripped before being forwarded.
And hence, mystery solved! Now moving on to the other issue – routing.
Network Namespaces
As previously mentioned, one would imagine that networking would get pretty messy when implementing the routing of several several tenant networks over a single router – consider the amount of networks, interfaces, and routes (including default routes) that these nodes would have to manage, and the head may spin pretty quickly.
So how to manage all of these routes in a sane fashion? Enter network namespaces.
Network namespaces are a fairly recent addition to the Linux kernel. Introduced in version 2.6.24, I have found the easiest way to think about the feature is to think about it in the context of the work that has been done on containers in the last few years (to support things like LXC, CoreOS, and Docker). Each network namespace is its own individual pseudo-container, an island of networking, pretty much its own individual virtual router.
These map to OpenStack pretty visibly. For example:
# neutron router-list -F id -F name +--------------------------------------+---------------------------+ | id | name | +--------------------------------------+---------------------------+ | f44651e2-0aab-435b-ad11-7ad4255825c7 | r.lab.vcts.local | +--------------------------------------+---------------------------+
Above is the router ID for my current lab network. Perhaps, in the name of good convention, this has a matching namespace?
# ip netns show | grep f44651e2-0aab-435b-ad11-7ad4255825c7 qrouter-f44651e2-0aab-435b-ad11-7ad4255825c7
Why yes, yes it does!
Now, there are tons of things that can be done within a network namespace, but I’m not going to cover them all, as they are not necessarily relevant within the context of a fully working OpenStack implementation, as everything is already going to be set up.
One of the best ways to troubleshoot a namespace is to enter it using ip netns exec
. Note that this is not a fully separate container. Instead, commands are just executed within the context of that specific network namespace, the idea being that commands can be run that are not necessarily namespace aware.
Commands can be ran individually, but it may just be easier to run a shell within the target context, like so:
# ip netns exec qrouter-f44651e2-0aab-435b-ad11-7ad4255825c7 /bin/bash # ip route show default via 192.168.0.1 dev qg-0c4c9d04-f0 172.16.0.0/24 dev qr-da3efe6d-a2 proto kernel scope link src 172.16.0.1 192.168.0.0/24 dev qg-0c4c9d04-f0 proto kernel scope link src 192.168.0.99
When the above is looked at, some pieces may start fitting together. And even though I haven’t covered it here, it will make sense from the above: There is the internal interface qr-da3efe6d-a2
, which has the internal network 172.16.0.0/24
. The external interface has been bound thru OpenStack controls to 192.168.0.99/24
on qg-0c4c9d04-f0
, which then allows general outbound through the default route, and 1-1 nat for floating IP addresses.
Plenty of other commands can be run within this bash shell to get useful information, such as ip addr
, ifconfig
, and iptables
, to get information on which IP addresses are bound to the router and how the firewall and NAT is set up.
Additional Reading
Hopefully the above gives you lots of insight into how networking works on OpenStack. For further reading, check out Networking in too Much Detail, the page that served as a starting point for a lot of this research. LWN.net also has a pretty awesome article explaining namespaces here.