Adventures in OpenStack: Intro to OpenFlow, and Network Namespaces

I’m digging into the backlog today! I’ve had these thoughts jotted down since trying to solve a problem on another OpenStack all-in-one box a few weeks ago, and I’m glad to finally get it finished. So without further ado, let’s jump in!

The Questions

I have already covered Open vSwitch and OpenStack networking in the following two articles:

There have been some unanswered questions for me however:

Why VLAN trunking anomalies seem to be present on patch ports

If one looks at the output of ovs-vsctl show, some confusion may ensue. For example, there are several VLAN tags there, but if all of them are trunked across (as is the behaviour of a patch port), which VLAN wins? Do any? How is this even working?

    Bridge br-int
        fail_mode: secure
        Port "foo"
            tag: 4
            Interface "foo"
                type: internal
        Port "bar"
            tag: 3
            Interface "bar"
                type: internal
        Port "jack"
            tag: 1
            Interface "jack"
        Port "jill"
            tag: 2
            Interface "jill"
                type: internal
        Port br-int
            Interface br-int
                type: internal
        Port "int-br-ex"
            Interface "int-br-ex"
                type: patch
                options: {peer="phy-br-ex"}
    Bridge br-ex
        Port "enp2s0"
            Interface "enp2s0"
        Port phy-br-ex
            Interface phy-br-ex
                type: patch
                options: {peer=int-br-ex}
        Port br-ex
            Interface br-ex
                type: internal
    ovs_version: "2.1.3"

How does OpenStack handle several layer 3 networks over the same router

My other question was – observing that OpenStack does not create any sort of VM for routing or what not – how does routing even work? I mean, managing ultimately what could be thousands of tenant networks and possibly dozens or even hundreds of external networks can get pretty messy, I would imagine.

The answers were pretty clear, once I dug a bit deeper.

Open vSwitch, and Integration to External Bridge Mapping

The OpenStack integration bridge will maps to two kinds of bridges, depending on where in the architecture is looked at:

  • The external bridge (as shown above) – this is generally done on network nodes and my all-in-one setup
  • The tunnel bridge (not shown above to save space) – this is done on regular compute nodes, for example

This is specifically denoted by the two patch ports in each bridge:

        # br-int
        Port "int-br-ex"
            Interface "int-br-ex"
                type: patch
                options: {peer="phy-br-ex"}
        # br-ex
        Port phy-br-ex
            Interface phy-br-ex
                type: patch
                options: {peer=int-br-ex}

As mentioned, all VLANs are passed over a bridge. Think of it as a trunk port on a physical switch that is set to match all VLANs. The vSwitch layer of OVS does not perform any sort of selective VLAN mapping.

So if all VLANs going over this port are tagged, then how do we make sense of what we see in the external bridge, which has no tags at all? All ports are either untagged or are trunks, so just looking at this at face value, it would seem that like a bad configuration.

Not necessarily.

OpenFlow Magic on External Bridge

The switch layer is only half the story when deal with Open vSwitch. The second part is what happens with OpenFlow on the external bridge:

# ovs-ofctl dump-flows br-ex
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=3776.846s, table=0, n_packets=5106, n_bytes=1142456, idle_age=0, priority=1 actions=NORMAL
 cookie=0x0, duration=3654.201s, table=0, n_packets=0, n_bytes=0, idle_age=3654, priority=4,in_port=2,dl_vlan=1 actions=strip_vlan,NORMAL
 cookie=0x0, duration=3776.341s, table=0, n_packets=132, n_bytes=10608, idle_age=3703, priority=2,in_port=2 actions=drop

The second rule is the specific one we want to pay attention to. This rule contains the strip_vlan action, which actually removes any tags outgoing on this port, matching off VLAN 1. So any traffic coming into port 2 on the external bridge (basically the peer port to the integration bridge) off of VLAN 1 (which one would assume is the external network), will have its VLAN stripped before being forwarded.

And hence, mystery solved! Now moving on to the other issue – routing.

Network Namespaces

As previously mentioned, one would imagine that networking would get pretty messy when implementing the routing of several several tenant networks over a single router – consider the amount of networks, interfaces, and routes (including default routes) that these nodes would have to manage, and the head may spin pretty quickly.

So how to manage all of these routes in a sane fashion? Enter network namespaces.

Network namespaces are a fairly recent addition to the Linux kernel. Introduced in version 2.6.24, I have found the easiest way to think about the feature is to think about it in the context of the work that has been done on containers in the last few years (to support things like LXC, CoreOS, and Docker). Each network namespace is its own individual pseudo-container, an island of networking, pretty much its own individual virtual router.

These map to OpenStack pretty visibly. For example:

# neutron router-list -F id -F name
+--------------------------------------+---------------------------+
| id                                   | name                      |
+--------------------------------------+---------------------------+
| f44651e2-0aab-435b-ad11-7ad4255825c7 | r.lab.vcts.local          |
+--------------------------------------+---------------------------+

Above is the router ID for my current lab network. Perhaps, in the name of good convention, this has a matching namespace?

# ip netns show | grep f44651e2-0aab-435b-ad11-7ad4255825c7
qrouter-f44651e2-0aab-435b-ad11-7ad4255825c7

Why yes, yes it does!

Now, there are tons of things that can be done within a network namespace, but I’m not going to cover them all, as they are not necessarily relevant within the context of a fully working OpenStack implementation, as everything is already going to be set up.

One of the best ways to troubleshoot a namespace is to enter it using ip netns exec. Note that this is not a fully separate container. Instead, commands are just executed within the context of that specific network namespace, the idea being that commands can be run that are not necessarily namespace aware.

Commands can be ran individually, but it may just be easier to run a shell within the target context, like so:

# ip netns exec qrouter-f44651e2-0aab-435b-ad11-7ad4255825c7 /bin/bash
# ip route show
default via 192.168.0.1 dev qg-0c4c9d04-f0 
172.16.0.0/24 dev qr-da3efe6d-a2  proto kernel  scope link  src 172.16.0.1 
192.168.0.0/24 dev qg-0c4c9d04-f0  proto kernel  scope link  src 192.168.0.99 

When the above is looked at, some pieces may start fitting together. And even though I haven’t covered it here, it will make sense from the above: There is the internal interface qr-da3efe6d-a2, which has the internal network 172.16.0.0/24. The external interface has been bound thru OpenStack controls to 192.168.0.99/24 on qg-0c4c9d04-f0, which then allows general outbound through the default route, and 1-1 nat for floating IP addresses.

Plenty of other commands can be run within this bash shell to get useful information, such as ip addr, ifconfig, and iptables, to get information on which IP addresses are bound to the router and how the firewall and NAT is set up.

Additional Reading

Hopefully the above gives you lots of insight into how networking works on OpenStack. For further reading, check out Networking in too Much Detail, the page that served as a starting point for a lot of this research. LWN.net also has a pretty awesome article explaining namespaces here.

Adventures in OpenStack – Networking

Note: Some inaccurate information was corrected in this article – see here for the details.

The past articles regarding Open vSwitch have kind of been a precursor to this one, because to understand how OpenStack networking worked, the concepts regarding some of the underlaying components needed to be understood for me first.

When I started looking into this last week, I really had no idea where to start. As I dug deeper, I found that this guide was probably the best in explaining the basics on how Neutron worked: Neutron in the RHEL/CentOS/Fedora deployment guide.

Neutron Network Example (Courtesy OpenStack Foundation - Apache 2.0 License)
Neutron Network Example (Courtesy OpenStack Foundation – Apache 2.0 License)

The diagram above was probably one of the tools that helped me out the most. You can see how Neutron works on both the compute and network nodes, and the role that Open vSwitch plays in the deployment at large.

Note that both GRE and VXLAN are supported for tunnels, and in fact packstack will configure your setup with VXLAN. Some features are still being developed with VXLAN, and because I haven’t delved into it too much I’m not too sure what is still missing (although one feature seems to be VLAN pruning). I really don’t have the experience to say which one is the currently the better choice as of Juno.

For now, I am focusing on the basics – what I needed to do to get my dev server set up. This entailed a few things:

  • Re-configuring my external bridge so that I could run my management interface and the “external” network on the same physical interface – see this previous article
  • Setting up neutron to map the external network to the external bridge, explicitly
  • Setting up my external and internal networks

Network Types

There are currently five network types that you can set up in OpenStack.

  • Local: I equated this to “non-routed”, but it can be used on single server setups for tenant networking. However, it cannot scale past one host.
  • Flat: A untagged direct network-to-physical mapping. This was ultimately the best choice for my external network since my requirements are not that complicated at this point in time.
  • VLAN: This is like Flat with VLAN tagging. This would, of course, allow you to run multiple segregated external networks over a single interface.
  • GRE/VXLAN: Your tunneling options. Generally used on the integration bridge to pass traffic between nodes. Best used for tenant networks.

For my setup, as I mentioned, I ultimately settled on using a flat network for our external bridge, and I haven’t touched the internal network setups (it really is not necessary at this point in time, seeing as I only have one host).

Neutron Configuration

Keep in mind that I don’t cover how to do the Open vSwitch stuff here. If you need that info see this previous article – An Intro to Open vSwitch.

With that in mind, if you are using a separate interface you can simply add it to the Open vSwitch database without much in the way of extra configuration – just run the following:

ovs-vsctl add port br-ex eth1

Assuming that eth1 is your extra interface.

On to the Neutron configuration. Generally, this is stored in /etc/neutron/plugin.ini. Note that we are using the ML2 (Modular Layer 2) plugin here, which has to be symlinked appropriately:

lrwxrwxrwx.   1 root root       37 Jan 29 23:24 plugin.ini -> /etc/neutron/plugins/ml2/ml2_conf.ini

Make sure you define the network types you will allow:

type_drivers = flat,vxlan

Pick a network type for your tenant networks, generally one is fine:

tenant_network_types = vxlan

Mechanism drivers – using Open vSwitch for now, of course. This will be set up for you by default if you are using packstack.

mechanism_drivers = openvswitch

From here I am going to skip to the changes I needed to make in my packstack setup to get the external bridge working. Most of the config that I had I left at the defaults, so if you are using packstack as well it does not need to be changed much.

The only thing left is to define your external network as a flat network:

[ml2_type_flat]
flat_networks = external

Restarting Services

Once this is all done, you can save and restart nova and neutron services. Restarted the services below based on the node that is being updated.

# Controller Node
systemctl restart openstack-nova-api.service openstack-nova-scheduler.service \
  openstack-nova-conductor.service
systemctl restart neutron-server.service
# Network Node
systemctl restart neutron-openvswitch-agent.service neutron-l3-agent.service \
  neutron-dhcp-agent.service neutron-metadata-agent.service
# Compute Node
systemctl restart openstack-nova-compute.service
systemctl restart neutron-openvswitch-agent.service
# All-in-One Node
systemctl restart openstack-nova-api.service openstack-nova-scheduler.service \
  openstack-nova-conductor.service openstack-nova-compute.service
systemctl restart neutron-openvswitch-agent.service neutron-l3-agent.service \
  neutron-dhcp-agent.service neutron-metadata-agent.service neutron-server.service

Now, we are ready to set up our external and internal networks. I will cover this tomorrow in a couple of other articles!

CentOS ifcfg Scripts: DEVICE vs NAME

In the previous article that I discussed setting up Open vSwitch in, I encountered an odd issue when setting up my bridges for configuration upon startup (especially the br-ex bridge, which I had a management address on as well). Restarting the network after the initial changes worked, but upon startup, the network did not come up properly. After logging in and restarting the network again, things came up. This process was reproducible 100% of the time.

Upon startup, only the OvS bridges were up. Physical interfaces were not added. Upon initial inspection there did not seem to be anything wrong, so I started to delve into the process further and spent a morning of googling and poring thru the network init scripts, trying to figure out exactly what I did wrong, or if there was a bug with the way OvS auto-configuration was handled on RedHat-family systems.

The answer? Yes and no.

Ultimately there was an error in my configuration. In my physical interface config file (ie: ifcfg-eth0 or its equivalent in the new PCI syntax CentOS and RHEL 7 follow), I had:

HWADDR="00:11:22:AA:BB:CC"
TYPE="Ethernet"
NAME="eth0" <-- BAD
ONBOOT="yes"
NM_CONTROLLED="no"
TYPE=OVSPort
DEVICETYPE=ovs
OVS_BRIDGE=br-ex

The correct file has:

HWADDR="00:11:22:AA:BB:CC"
TYPE="Ethernet"
DEVICE="eth0" <-- GOOD
ONBOOT="yes"
NM_CONTROLLED="no"
TYPE=OVSPort
DEVICETYPE=ovs
OVS_BRIDGE=br-ex

Basically, the issue was the presence of the NAME directive in place of the correct DEVICE directive.

The Cause

I’m pretty sure NAME was there from installation. I performed minimal modification of the physical interface configuration file, and my bridge configuration file had the correct syntax (which I took from a tutorial, save the actual addressing information which I transplanted from the physical interface).

After poring over logs, I found the giveaway:

Feb 9 10:39:52 devhost network: Bringing up interface eth0: Error: either "dev" is duplicate, or "br-ex" is a garbage.
Feb 9 10:39:52 devhost network: cat: /sys/class/net/eth0: Is a directory
Feb 9 10:39:52 devhost network: cat: br-ex/ifindex: No such file or directory
Feb 9 10:39:52 devhost network: /etc/sysconfig/network-scripts/ifup-eth: line 273: 1000 + : syntax error: operand expected (error token is "+ ")
Feb 9 10:39:52 devhost network: ERROR : [/etc/sysconfig/network-scripts/ifup-aliases] Missing config file br-ex.
Feb 9 10:39:52 devhost /etc/sysconfig/network-scripts/ifup-aliases: Missing config file br-ex.
Feb 9 10:39:52 devhost ovs-vsctl: ovs|00001|vsctl|INFO|Called as ovs-vsctl -t 10 -- --may-exist add-port br-ex "eth0
Feb 9 10:39:52 devhost ovs-vsctl: br-ex"
Feb 9 10:39:53 devhost network: [ OK ]

There was a line break in the ovs-vsctl command logged in syslog.

I ultimately traced this down to the get_device_by_hwaddr() function, in /etc/sysconfig/network-scripts/network-functions:

get_device_by_hwaddr ()
{
    LANG=C ip -o link | awk -F ': ' -vIGNORECASE=1 '!/link\/ieee802\.11/ && /'"$1"'/ { print $2 }'
}

This will return multiple interfaces, separated by line breaks, when there is more than one interface on the system sharing a MAC address. Using DEVICE instead of NAME skips this, as the system only runs this when the former is not defined.

The MAC address got duplicated probably from a previous setup of OvS and was retained when the bridge was brought back up upon startup.

Moral of the story: if you are setting up bridging or any other kind of virtual interface infrastructure after installation, don’t overlook this, to save yourself some pain!

I posted a bug to CentOS about this as well: http://bugs.centos.org/view.php?id=8187

An Intro to Open vSwitch

I’ve spent the last few days getting my bearings around Open vSwitch. It’s pretty amazing, and IMO if you are virtualizing under Linux these days, it’s pretty much a must.

So what is Open vSwitch? It’s basically the Open Source answer to other proprietary technologies, such as VMware’s distributed vSwitch (which you have probably used if you have ever used multiple servers within vSphere). It allows you to build a distributed layer-2 network entirely in software. It also performs very well under a single server setup, allowing you to build sophisticated switch fabrics under a single physical interface. It solves some frustrations surrounding network bridging under a KVM setup as well that you may have encountered, such as having a bridge that shares the same physical interface as your host’s management address.

Check out http://openvswitch.org/ for some documentation and tutorials.

I will be explaining some basic concepts regarding Open vSwitch here, namely how to set up a bridge, attach an interface to it, and also how to automate the process using ifcfg-* files under CentOS (and by proxy, probably RHEL and Fedora as well). Also, we will discuss how to set up a VM on the bridge using libvirt.

Some Terms

bridge is a network fabric under Open vSwitch. For the purpose of this tutorial, a bridge represents a broadcast domain within the fabric at large, ie: a VLAN. Note that this does not have to be the case all the time, as it is possible to have a bridge that has ports on different VLANs, just like a physical switch.

port is a virtual switch port within the bridge. These ports are attached to interfaces, such as physical ones, virtual machine interfaces, or other bridges.

Open vSwitch Basic Bridge

Above is a very simple diagram that depicts the bridge br-ex, with ports connected to a VM’s eth0, and the host machine’s eth0.

Creating a Bridge

Run the following command to create a bridge:

ovs-vsctl add-br br-ex

This command would create a new empty bridge, br-ex. This bridge can then be addressed just like a regular interface on the system, but of course, would not do much at this point in time since we do not have any ports attached to it.

Adding a Port

ovs-vsctl add-port br-ex eth0

This would add eth0 to the bridge br-ex.

CentOS/RedHat/Fedora Interface Configuration Files

You can also have the OS set up bridges for you upon system startup – this is especially useful if you are binding IP addresses to a specific bridge. Note that any bridges that you create like this will get destroyed/re-created upon restart of the network (ie: system network restart or systemctl restart network.service).

Change your ifcfg-eth0 file to look something like this:

HWADDR="00:11:22:AA:BB:CC"
DEVICE="eth0"
ONBOOT="yes"
NM_CONTROLLED="no"
TYPE=OVSPort
DEVICETYPE=ovs
OVS_BRIDGE=br-ex

And create a ifcfg-br-ex interface configuration file:

DEVICE=br-ex
DEVICETYPE=ovs
TYPE=OVSBridge
BOOTPROTO=static
ONBOOT=yes
IPADDR="1.2.3.4"
NETMASK="255.255.255.0"
GATEWAY="1.2.3.1"
DNS1="1.2.3.2"
DNS2="1.2.3.3"

Sub in your values for MAC addresses, physical interface names, and IP addresses, obviously.

Another note here that is extremely important is that you need to make sure that you use the DEVICE directive instead of the NAME directive. The latter may be left over in your physical interface configuration file from installation, so make a note to change it. I will address the exact reason why in a different article.

Setting Up a Libvirt VM to use a Bridge

Now that you have set up the above, you can add a VM to the bridge with Libvirt. Edit your domain’s (VM’s) XML file and add a block like this for every NIC you want to create:

 <interface type='bridge'>
  <mac address='52:54:00:71:b1:b6'/>
  <source bridge='br-ex'/>
  <virtualport type='openvswitch'/>
  <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
 </interface>

This was taken directly from the Open vSwitch Libvrirt HOWTO.

Make sure of course that you assign a correct PCI ID. You may wish to create the domain first via other means, add the devices you will need, and just elect to not use network at first. Unfortunately, it does not seem that a lot of Libvirt admin tools have specific Open vSwitch support just yet (at least it does not seem that the version of virt-manager that comes with most distributions does, anyway).

Edit Feb 12 2015 – Slight correction to physical interface config file – duplicate device type.