OVN and OpenStack Status – 2015-04-21

It has been a couple weeks since the last OVN status update. Here is a review of what has happened since that time.

ovn-nbd is now ovn-northd

Someone pointed out that the acronym “nbd” is used for “Network Block Device” and may exist in the same deployment as OVN.  To avoid any possible confusion, we renamed ovn-nbd to ovn-northd.

ovn-controller now exists

ovn-controller is the daemon that runs on every hypervisor or gateway.  The initial version of this daemon has been merged.  The current version of ovn-controller performs two important functions.

First, ovn-controller populates the Chassis table of the OVN_Southbound database.  Each row in the Chassis table represents a hypervisor or gateway running ovn-controller.  It contains information that identifies the chassis and what encapsulation types it supports.  If you run ovs-sandbox with OVN support enabled, it will run the following commands to configure ovn-controller:

ovs-vsctl set open . external-ids:system-id=56b18105-5706-46ef-80c4-ff20979ab068
ovs-vsctl set open . external-ids:ovn-remote=unix:"$sandbox"/db.sock
ovs-vsctl set open . external-ids:ovn-encap-type=vxlan
ovs-vsctl set open . external-ids:ovn-encap-ip=127.0.0.1
ovs-vsctl add-br br-int

After setup is complete, we can check the OVN_Southbound table’s contents and see the corresponding Chassis entry:

Chassis table
_uuid                                encaps                                 gateway_ports name                                  
------------------------------------ -------------------------------------- ------------- --------------------------------------
2852bf00-db63-4732-8b44-a3bc689ed1bc [e1c1f7fc-409d-4f74-923a-fc6de8409f82] {}            "56b18105-5706-46ef-80c4-ff20979ab068"

Encap table
_uuid                                ip          options type 
------------------------------------ ----------- ------- -----
e1c1f7fc-409d-4f74-923a-fc6de8409f82 "127.0.0.1" {}      vxlan

The other important task performed by the current version of ovn-controller is to monitor the local switch for ports being added that match up to logical ports created in OVN.  When a port is created on the local switch with an iface-id that matches the OVN logical port’s name, ovn-controller will update the Bindings table to specify that the port exists on this chassis.  Once this is done, ovn-northd will report that the port is up to the OVN_Northbound database.

$ ovsdb-client dump OVN_Southbound
Bindings table
_uuid                                chassis                                logical_port                           mac parent_port tag
------------------------------------ -------------------------------------- -------------------------------------- --- ----------- ---
...
2dc299fa-835b-4e42-aa82-3d2da523b4d9 "81b0f716-c957-43cf-b34e-87ae193f617a" "d03aa502-0d76-4c1e-8877-43778088c55c" []  []          [] 
...

$ ovn-nbctl lport-get-up d03aa502-0d76-4c1e-8877-43778088c55c
up

The next steps for ovn-controller are to program the local switch to create tunnels and flows as appropriate based on the contents of the OVN_Southbound database.  This is currently being worked on.

The Pipeline Table

The OVN_Southbound database has a table called Pipeline.  ovn-northd is responsible for translating the logical network elements defined in OVN_Northbound into entries in the Pipeline table of OVN_Southbound.  The first version of populating the Pipeline table has been merged. One thing that is particularly interesting here is that ovn-northd defines logical flows.  It does not have to figure out the detailed switch configuration for every chassis running ovn-controller.  ovn-controller is responsible for translating the logical flows into OpenFlow flows specific to the chassis.

The OVN_Southbound documentation has a good explanation of the contents of the Pipeline table.  If you’re familiar with OpenFlow, the format will be very familiar.

As a simple example, let’s just use ovn-nbctl to manually create a single logical switch that has 2 logical ports.

ovn-nbctl lswitch-add sw0
ovn-nbctl lport-add sw0 sw0-port1 
ovn-nbctl lport-add sw0 sw0-port2 
ovn-nbctl lport-set-macs sw0-port1 00:00:00:00:00:01
ovn-nbctl lport-set-macs sw0-port2 00:00:00:00:00:02

Now we can check out the resulting contents of the Pipeline table.  The output of ovsdb-client has been reordered to group the entries by table_id and priority. I’ve also cut off the _uuid column since it’s not important for understanding here.

Pipeline table
match                          priority table_id actions                                                                 logical_datapath
------------------------------ -------- -------- ----------------------------------------------------------------------- ------------------------------------
"eth.src[40]"                  100      0        drop                                                                    843a9a4a-8afc-41e2-bea1-5fa58874e109
vlan.present                   100      0        drop                                                                    843a9a4a-8afc-41e2-bea1-5fa58874e109
"inport == \"sw0-port1\""      50       0        resubmit                                                                843a9a4a-8afc-41e2-bea1-5fa58874e109
"inport == \"sw0-port2\""      50       0        resubmit                                                                843a9a4a-8afc-41e2-bea1-5fa58874e109
"1"                            0        0        drop                                                                    843a9a4a-8afc-41e2-bea1-5fa58874e109

"eth.dst[40]"                  100      1        "outport = \"sw0-port2\"; resubmit; outport = \"sw0-port1\"; resubmit;" 843a9a4a-8afc-41e2-bea1-5fa58874e109
"eth.dst == 00:00:00:00:00:01" 50       1        "outport = \"sw0-port1\"; resubmit;"                                    843a9a4a-8afc-41e2-bea1-5fa58874e109
"eth.dst == 00:00:00:00:00:02" 50       1        "outport = \"sw0-port2\"; resubmit;"                                    843a9a4a-8afc-41e2-bea1-5fa58874e109

"1"                            0        2        resubmit                                                                843a9a4a-8afc-41e2-bea1-5fa58874e109

"outport == \"sw0-port1\""     50       3        "output(\"sw0-port1\")"                                                 843a9a4a-8afc-41e2-bea1-5fa58874e109
"outport == \"sw0-port2\""     50       3        "output(\"sw0-port2\")"                                                 843a9a4a-8afc-41e2-bea1-5fa58874e109

In table 0, we’re dropping anything with a broadcast/multicast source MAC. We’re also dropping anything with a logical VLAN tag, as that doesn’t make sense. Next, if the packet comes from one of the ports connected to the logical switch, we will continue processing in table 1. Otherwise, we drop it.

In table 1, we will output the packet to all ports if the destination MAC is broadcast/multicast. Note that the output action to the source port is implicitly handled as a drop. Finally, we’ll set the output variable based on destination MAC address and continue processing in table 2.

Table 2 does nothing but continue to table 3. In the ovn-northd code, table 2 is where entries for ACLs go. ovn-nbctl does not currently support adding ACLs. This table is where Neutron will program security groups, but that’s not ready yet, either.

Table 3 handles sending the packet to the right output port based on the contents of the outport variable set back in table 1.

The logical_datapath column ties all of these rows together as implementing a single logical datapath, which in this case is an OVN logical switch.

There is one other item supported by ovn-northd that is not reflected in this example. The OVN_Northbound database has a port_security column for logical ports. Its contents are defined as “A set of L2 (Ethernet) or L3 (IPv4 or IPv6) addresses or L2+L3 pairs from which the logical port is allowed to send packets and to which it is allowed to receive packets.” If this were set here, table 0 would also handle ingress port security and table 3 would handle egress port security.

We will look at more detailed examples in future posts as both OVN and its Neutron integration progress further.

Neutron Integration

There have also been several changes to the Neutron integration for OVN in the last couple of weeks.  Since ovn-northd and ovn-controller are becoming more functional, the devstack integration runs both of these daemons, along with ovsdb-server and ovs-vswitchd.  That means that as you create networks and ports via the Neutron API, they will be created in OVN and result in Bindings and Pipeline updates.

We now also have a devstack CI job that runs against every patch proposed to the OVN Neutron integration.  It installs and runs Neutron with OVN.  Devstack also creates some default networks.  We still have a bit more work to do in OVN before we can expand this to actually test network connectivity.

Also related to testing, Terry Wilson submitted a patch to OVS that will allow us to publish the OVS Python bindings to PyPI.  The patch has been merged and Terry will soon be publishing the code to PyPI.  This will allow us to install the library for unit test jobs.

The original Neutron ML2 driver implementation used ovn-nbctl.  It has now been converted to use the Python ovsdb library, which should be much more efficient.  neutron-server will maintain an open connection to the OVN_Northbound database for all of its operations.

I’ve also been working on the necessary changes for creating a port in Neutron that is intended to be used by a container running inside a VM.  There is a python-neutronclient change and two changes needed to networking-ovn that I’m still testing.

There are some edge cases where a resource can be created in Neutron but fail before we’ve created it in OVN.  Gal Sagie is working on some code to get them back in sync.

Gal Sagie also has a patch up for the first step toward security group support.  We have to document how we will map Neutron security groups to rules in the OVN_Northbound ACL table.

One piece of information that is communicated back up to the OVN_Northbound database by OVN is the up state of a logical port.  Terry Wilson is working on having our Neutron driver consume that so that we can emit a notification when a port that was created becomes ready for use.  This notification gets turned into a callback to Nova to tell it the VIF is ready for use so the corresponding VM can be started.

OVN and OpenStack Integration Development Update

The Open vSwitch project announced the OVN effort back in January.  After OVN was announced, I got very interested in its potential.  OVN is by no means tied to OpenStack, but the primary reason I’m interested is I see it as a promising open source backend for OpenStack Neutron.  To put it into context with existing Neutron code, it would replace the OVS agent in Neutron in the short term.  It would eventually also replace the L3 and DHCP agents once OVN gains the equivalent functionality.

Implementation has been coming along well in the last month, so I wanted to share an overview of what we have so far.  We’re aiming to have a working implementation of L2 connectivity by the OpenStack Vancouver Summit next month.

Design

The initial design documentation was merged at the end of February.  Here are the rendered versions of those docs: ovn-architecture, ovn-nb schema, ovn schema.

This initial design allows hooking up VMs or containers to OVN managed virtual networks.  There was an update to the design merged that addresses the use case of running containers inside of VMs.  It seems like most existing work just creates another layer of overlay networks for containers.  What’s interesting about this proposal is that it allows you to connect those containers directly to the OVN managed virtual networks.  In the OpenStack world, that means you could have your containers hooked up directly to virtual networks managed by Neutron.  Further, the container hosting VM and all of its containers do not have to be connected to the same network and this works without having to create an extra layer of overlay networks.

OVN Implementation

For most of my OVN development and testing, I’ve been working straight from the ovs git tree. Building it is something like:

$ git clone http://github.com/openvswitch/ovs.git
$ cd ovs

Switch to the ovn branch, as that’s where OVN development is happening for now:

$ git checkout ovn

You’ll need automake, autoconf, libtool, make, patch, and gcc or clang installed, at least. For detailed instructions on building ovs, see INSTALL.md in the ovs git tree.

$ ./boot.sh
$ ./configure
$ make

OVS includes a script called ovs-sandbox that I find very helpful for development. It sets up a dummy ovs environment that you can run the tools against, but it doesn’t actually process real traffic. You can send some fake packets through to see how they would be processed if needed. I’ve been adding OVN support to ovs-sandbox along the way.

Here’s a demonstration of ovs-sandbox with what is implemented in OVN so far.  Start by running ovs-sandbox with OVN support turned on:

$ make sandbox SANDBOXFLAGS="-o"

You’ll get output like this:

----------------------------------------------------------------------
You are running in a dummy Open vSwitch environment. You can use
ovs-vsctl, ovs-ofctl, ovs-appctl, and other tools to work with the
dummy switch.

Log files, pidfiles, and the configuration database are in the
"sandbox" subdirectory.

Exit the shell to kill the running daemons.

Now everything is running:

$ ps ax | grep ov[sn]
 ...
 ... ovsdb-server --detach --no-chdir --pidfile -vconsole:off --log-file --remote=punix:/home/rbryant/src/ovs/tutorial/sandbox/db.sock ovn.db ovnnb.db conf.db
 ... ovs-vswitchd --detach --no-chdir --pidfile -vconsole:off --log-file --enable-dummy=override -vvconn -vnetdev_dummy
 ... ovn-nbd --detach --no-chdir --pidfile -vconsole:off --log-file

Note the ovn-nbd daemon. Soon there will also be an ovn-controller daemon running. Also note that ovsdb-server is serving up 3 databases (ovn.db, ovnnb.db, and conf.db).

You can run ovn-nbctl to create resources via the OVN public interface (the OVN_Northbound database). So, for example:

$ ovn-nbctl lswitch-add sw0
$ ovn-nbctl lswitch-add sw1
$ ovn-nbctl lswitch-list
4956f6b4-a1ba-49aa-86a6-134b9cfdfdf6 (sw1)
52858b33-995f-43fa-a1cf-445f16d2ab09 (sw0)
$ ovn-nbctl lport-add sw0-port0 sw0
$ ovn-nbctl lport-add sw0-port1 sw0
$ ovn-nbctl lport-list sw0
d4d78dc5-166d-4457-8bb0-1f6ed5f1ed91 (sw0-port1)
c2114eaa-2f75-443f-b23e-6dda664a979b (sw0-port0)

One of the things that ovn-nbd does is create entries in the Bindings table of the OVN database when logical ports are added to the OVN_Northbound database. The Bindings table is used to keep track of which hypervisor a port exists on after VIFs get created and plugged into the local ovs switch. After the commands above, there should be 2 entries in the Bindings table. We can dump the OVN db and see that they are there:

$ ovsdb-client dump OVN
Bindings table
_uuid chassis logical_port mac parent_port tag
------------------------------------ ------- ------------ --- ----------- ---
997e0c14-2fba-499d-b077-26ddfc87e935 "" "sw0-port0" [] [] []
f7b61ef1-01d5-42ab-b08e-176bf6f3eb4b "" "sw0-port1" [] [] []

Note that the chassis column is empty, meaning that the port hasn’t been placed on a hypervisor yet.

We can also see that the state of the port is still down in the OVN_Northbound database since it hasn’t been created on a hypervisor yet.

$ ovn-nbctl lport-get-up sw0-port0
down

One of the tasks of ovn-controller running on each hypervisor is to monitor the local switch and detect when a new port on the local switch corresponds with an OVN logical port. When that occurs, ovn-controller will update the chassis column. For now, we can simulate that with a manual ovsdb transaction:

$ ovsdb-client transact '["OVN",{"op":"update","table":"Bindings","where":[["_uuid","==",["uuid","997e0c14-2fba-499d-b077-26ddfc87e935"]]],"row":{"chassis":"hostname"}}]'
[{"count":1}]
$ ovsdb-client dump OVN
Bindings table
_uuid chassis logical_port mac parent_port tag
------------------------------------ -------- ------------ --- ----------- ---
f7b61ef1-01d5-42ab-b08e-176bf6f3eb4b "" "sw0-port1" [] [] []
997e0c14-2fba-499d-b077-26ddfc87e935 hostname "sw0-port0" [] [] []

Now that the chassis column has been populated, ovn-nbd should notice and set the port state to up in the OVN_Northbound db.

$ ovn-nbctl lport-get-up sw0-port0
up

OpenStack Integration

Like with most OpenStack projects, you can try out the Neutron support for OVN using devstack.  Instructions for using the OVN devstack plugin are in the networking-ovn git repo.

You start by cloning both devstack and networking-ovn.

$ git clone http://git.openstack.org/openstack-dev/devstack.git
$ git clone http://git.openstack.org/openstack/networking-ovn.git

If you don’t have any devstack configuration, you can use a sample local.conf from the networking-ovn repo:

$ cd devstack
$ cp ../networking-ovn/devstack/local.conf.sample local.conf

If you’re new to using devstack, it is best if you use a throwaway VM for this.  You will also need to run devstack with a sudo enabled user.  Once your configuration that enables OVN support is in place, run devstack:

$ ./stack.sh

In my case, I’m running this on Fedora 21.  It has also been tested on Ubuntu. Once devstack finishes running successfully, you should get output that looks like this:

This is your host ip: 192.168.122.31
Keystone is serving at http://192.168.122.31:5000/
The default users are: admin and demo
The password: password
2015-04-08 14:31:10.242 | stack.sh completed in 165 seconds.

One bit of environment initialization that devstack does is create some initial Neutron networks.  You can see them using the neutron command, which talks to the Neutron REST API.

$ . openrc
$ neutron net-list
+--------------------------------------+---------+--------------------------------------------------+
| id | name | subnets |
+--------------------------------------+---------+--------------------------------------------------+
| a28b651e-5cb9-481b-9f9b-d5d57e55c6d0 | public | df0aee67-166c-4ad4-890c-bbf5d02ca3cf |
| 2637f01e-f41e-4d1b-865f-195253027031 | private | eac6621f-e8cc-4c94-84bf-e73dab610018 10.0.0.0/24 |
+--------------------------------------+---------+--------------------------------------------------+

Since OVN is the configured backend, we can use the ovn-nbctl utility to verify that these networks were created in OVN.

$ ovn-nbctl lswitch-list
480235d0-d1a5-43a9-821b-d32e109445fd (neutron-2637f01e-f41e-4d1b-865f-195253027031)
a60a2c16-cea7-4bdc-8082-b47745d016b3 (neutron-a28b651e-5cb9-481b-9f9b-d5d57e55c6d0)
$ ovn-nbctl lswitch-get-external-id 480235d0-d1a5-43a9-821b-d32e109445fd
neutron:network_name=private
$ ovn-nbctl lswitch-get-external-id a60a2c16-cea7-4bdc-8082-b47745d016b3
neutron:network_name=public

We can also create ports using the Neutron API and verify that they get created in OVN. To do that, we first create a port in Neutron:

$ neutron port-create private
Created a new port:
+-----------------------+---------------------------------------------------------------------------------+
| Field | Value |
+-----------------------+---------------------------------------------------------------------------------+
| admin_state_up | True |
| allowed_address_pairs | |
| binding:vnic_type | normal |
| device_id | |
| device_owner | |
| fixed_ips | {"subnet_id": "eac6621f-e8cc-4c94-84bf-e73dab610018", "ip_address": "10.0.0.3"} |
| id | ff07588c-4b11-4ec8-b7c5-1be64fc0ebac |
| mac_address | fa:16:3e:23:bd:f6 |
| name | |
| network_id | 2637f01e-f41e-4d1b-865f-195253027031 |
| security_groups | ab539a1c-c3d8-49f7-9ad1-3a8b451bce91 |
| status | DOWN |
| tenant_id | 64f29642350d4c978cf03a4917a35999 |
+-----------------------+---------------------------------------------------------------------------------+

Then we can list the logical ports in OVN for the logical switch associated with the Neutron network named private.  The output is the OVN UUID for the port followed by the port name in parentheses.  Neutron sets the port name equal to the UUID of the Neutron port.

$ ovn-nbctl lswitch-get-external-id 480235d0-d1a5-43a9-821b-d32e109445fd
neutron:network_name=private
$ ovn-nbctl lport-list 480235d0-d1a5-43a9-821b-d32e109445fd
...
fe959cfa-fd20-4129-9669-67af1fa6bbf7 (ff07588c-4b11-4ec8-b7c5-1be64fc0ebac)

We can also see that the port is down since it has not yet been plugged in to the local ovs switch on a hypervisor:

$ ovn-nbctl lport-get-up fe959cfa-fd20-4129-9669-67af1fa6bbf7
down

Ongoing Work

All OVN development discussion, patch submission, and patch review happens on the ovs-dev mailing list.  Development is currently happening in the ovn branch until things are further along.  Discussion about the OpenStack integration happens on the openstack-dev mailing list, while patch submission and review happens in OpenStack’s gerrit.

As mentioned earlier, the ovn-controller daemon is not yet running in this development environment.  That will change shortly as Justin Pettit posted it for review earlier this week.

As you might have noticed, there’s a lot of infrastructure in place, but the actual flows and tunnels necessary to implement these virtual networks are not yet in place.  There’s been a lot of work in preparation for that, though.  Ben Pfaff has had a patch series up for review for expression matching needed for OVN.  It probably should have been merged by now, but the reviews have been a little slow.  (That’s my guilt talking.)  Ben has also started working on making ovn-nbd populate the Pipeline table of the OVN database.

Finally, the proposed OVN design introduces some new demands on ovsdb-server.  In particular, there will easily be hundreds of instances of ovn-controller connected to ovsdb-server.  Andy Zhou has been doing some very nice work around increasing performance in anticipation of these new demands.

Implementation of Pacemaker Managed OpenStack VM Recovery

I’ve discussed the use of Pacemaker as a method to detect compute node failures and recover the VMs that were running there.  The implementation of this is ready for testing.  Details can be found in this post to rdo-list.

The post mentions one pending enhancement to Nova that would improve things further:

Currently fence_compute loops, waiting for nova to recognise that the failed host is down, before we make a host-evacuate call which triggers nova to restart the VMs on another host. The discussed nova API extensions will speed up recovery times by allowing fence_compute to proactively push that information into nova instead.

The issue here is that the default backend for Nova’s servicegroup API relies on the nova-compute service to periodically check in to the Nova database to indicate that it is still running.  The delay in the recovery process is caused by Nova waiting on a configured timeout since the last time the service checked in.  Pacemaker is going to know about the failure much sooner, so it would be helpful if there was an API to tell Nova “trust me, this node is gone”.  This proposed spec intends to provide such an API.