About russellbryant

I'm an open source software engineer working for Red Hat on the OpenStack project.

Metal³ – Metal Kubed, Bare Metal Provisioning for Kubernetes

Project Introduction

There are a number of great open source tools for bare metal host provisioning, including Ironic.  Metal³ aims to build on these technologies to provide a Kubernetes native API for managing bare metal hosts via a provisioning stack that is also running on Kubernetes.  We believe that Kubernetes Native Infrastructure, or managing your infrastructure just like your applications, is a powerful next step in the evolution of infrastructure management.

The Metal³ project is also building integration with the Kubernetes cluster-api project, allowing Metal³ to be used as an infrastructure backend for Machine objects from the Cluster API.

Metal3 Repository Overview

There is a Metal³ overview and some more detailed design documents in the metal3-docs repository.

The baremetal-operator is the component that manages bare metal hosts.  It exposes a new BareMetalHost custom resource in the Kubernetes API that lets you manage hosts in a declarative way.

Finally, the cluster-api-provider-baremetal repository includes integration with the cluster-api project.  This provider currently includes a Machine actuator that acts as a client of the BareMetalHost custom resources.

Demo

The project has been going for a few months now, and there’s enough now to show some working code.

For this demonstration, I’ve started with a 3 node Kubernetes cluster installed using OpenShift.

$ kubectl get nodes
NAME       STATUS   ROLES    AGE   VERSION
master-0   Ready    master   24h   v1.13.4+d4ce02c1d
master-1   Ready    master   24h   v1.13.4+d4ce02c1d
master-2   Ready    master   24h   v1.13.4+d4ce02c1d

Machine objects were created to reflect these 3 masters, as well.

$ kubectl get machines
NAME              INSTANCE   STATE   TYPE   REGION   ZONE   AGE
ostest-master-0                                             24h
ostest-master-1                                             24h
ostest-master-2                                             24h

For this cluster-api provider, a Machine has a corresponding BareMetalHost object, which corresponds to the piece of hardware we are managing.  There is a design document that covers the relationship between Nodes, Machines, and BareMetalHosts.

Since these hosts were provisioned earlier, they are in a special “externally provisioned” state, indicating that we enrolled them in management while they were already running in a desired state.  If changes are needed going forward, the baremetal-operator will be able to automate them.

$ kubectl get baremetalhosts
NAME                 STATUS   PROVISIONING STATUS      MACHINE           BMC                         HARDWARE PROFILE   ONLINE   ERROR
openshift-master-0   OK       externally provisioned   ostest-master-0   ipmi://192.168.111.1:6230                      true     
openshift-master-1   OK       externally provisioned   ostest-master-1   ipmi://192.168.111.1:6231                      true     
openshift-master-2   OK       externally provisioned   ostest-master-2   ipmi://192.168.111.1:6232                      true

Now suppose we’d like to expand this cluster by adding another bare metal host to serve as a worker node.  First we need to create a new BareMetalHost object that adds this new host to the inventory of hosts managed by the baremetal-operator.  Here’s the YAML for the new BareMetalHost:

---
apiVersion: v1
kind: Secret
metadata:
  name: openshift-worker-0-bmc-secret
type: Opaque
data:
  username: YWRtaW4=
  password: cGFzc3dvcmQ=

---
apiVersion: metalkube.org/v1alpha1
kind: BareMetalHost
metadata:
  name: openshift-worker-0
spec:
  online: true
  bmc:
    address: ipmi://192.168.111.1:6233
    credentialsName: openshift-worker-0-bmc-secret
  bootMACAddress: 00:ab:4f:d8:9e:fa

Now to add the BareMetalHost and its IPMI credentials Secret to the cluster:

$ kubectl create -f worker_crs.yaml 
secret/openshift-worker-0-bmc-secret created
baremetalhost.metalkube.org/openshift-worker-0 created

The list of BareMetalHosts now reflects a new host in the inventory that is ready to be provisioned.  It will remain in this “ready” state until it is claimed by a new Machine object.

$ kubectl get baremetalhosts
NAME                 STATUS   PROVISIONING STATUS      MACHINE           BMC                         HARDWARE PROFILE   ONLINE   ERROR
openshift-master-0   OK       externally provisioned   ostest-master-0   ipmi://192.168.111.1:6230                      true     
openshift-master-1   OK       externally provisioned   ostest-master-1   ipmi://192.168.111.1:6231                      true     
openshift-master-2   OK       externally provisioned   ostest-master-2   ipmi://192.168.111.1:6232                      true     
openshift-worker-0   OK       ready                                      ipmi://192.168.111.1:6233   unknown            true

We have a MachineSet already created for workers, but it scaled down to 0.

$ kubectl get machinesets
NAME              DESIRED   CURRENT   READY   AVAILABLE   AGE
ostest-worker-0   0         0                             24h

We can scale this MachineSet to 1 to indicate that we’d like a worker provisioned.  The baremetal cluster-api provider will then look for an available BareMetalHost, claim it, and trigger provisioning of that host.

$ kubectl scale machineset ostest-worker-0 --replicas=1

After the new Machine was created, our cluster-api provider claimed the available host and triggered it to be provisioned.

$ kubectl get baremetalhosts
NAME                 STATUS   PROVISIONING STATUS      MACHINE                 BMC                         HARDWARE PROFILE   ONLINE   ERROR
openshift-master-0   OK       externally provisioned   ostest-master-0         ipmi://192.168.111.1:6230                      true     
openshift-master-1   OK       externally provisioned   ostest-master-1         ipmi://192.168.111.1:6231                      true     
openshift-master-2   OK       externally provisioned   ostest-master-2         ipmi://192.168.111.1:6232                      true     
openshift-worker-0   OK       provisioning             ostest-worker-0-jmhtc   ipmi://192.168.111.1:6233   unknown            true

This process takes some time.  Under the hood, the baremetal-operator is driving Ironic through a provisioning process.  This begins with wiping disks to ensure the host comes up in a clean state.  It will eventually write the desired OS image to disk and then reboot into that OS.  When complete, a new Kubernetes Node will register with the cluster.

$ kubectl get baremetalhosts
NAME                 STATUS   PROVISIONING STATUS      MACHINE                 BMC                         HARDWARE PROFILE   ONLINE   ERROR
openshift-master-0   OK       externally provisioned   ostest-master-0         ipmi://192.168.111.1:6230                      true     
openshift-master-1   OK       externally provisioned   ostest-master-1         ipmi://192.168.111.1:6231                      true     
openshift-master-2   OK       externally provisioned   ostest-master-2         ipmi://192.168.111.1:6232                      true     
openshift-worker-0   OK       provisioned              ostest-worker-0-jmhtc   ipmi://192.168.111.1:6233   unknown            true     

$ kubectl get nodes
NAME       STATUS   ROLES    AGE   VERSION
master-0   Ready    master   24h   v1.13.4+d4ce02c1d
master-1   Ready    master   24h   v1.13.4+d4ce02c1d
master-2   Ready    master   24h   v1.13.4+d4ce02c1d
worker-0   Ready    worker   68s   v1.13.4+d4ce02c1d

The following screen cast demonstrates this process, as well:

Removing a bare metal host from the cluster is very similar.  We just have to scale this MachineSet back down to 0.

$ kubectl scale machineset ostest-worker-0 --replicas=0

Once the Machine has been deleted, the baremetal-operator will deprovision the bare metal host.

$ kubectl get baremetalhosts
NAME                 STATUS   PROVISIONING STATUS      MACHINE           BMC                         HARDWARE PROFILE   ONLINE   ERROR
openshift-master-0   OK       externally provisioned   ostest-master-0   ipmi://192.168.111.1:6230                      true     
openshift-master-1   OK       externally provisioned   ostest-master-1   ipmi://192.168.111.1:6231                      true     
openshift-master-2   OK       externally provisioned   ostest-master-2   ipmi://192.168.111.1:6232                      true     
openshift-worker-0   OK       deprovisioning                             ipmi://192.168.111.1:6233   unknown            false

Once the deprovisioning process is complete, the bare metal host will be back to its “ready” state, available in the host inventory to be claimed by a future Machine object.

$ kubectl get baremetalhosts
NAME                 STATUS   PROVISIONING STATUS      MACHINE           BMC                         HARDWARE PROFILE   ONLINE   ERROR
openshift-master-0   OK       externally provisioned   ostest-master-0   ipmi://192.168.111.1:6230                      true     
openshift-master-1   OK       externally provisioned   ostest-master-1   ipmi://192.168.111.1:6231                      true     
openshift-master-2   OK       externally provisioned   ostest-master-2   ipmi://192.168.111.1:6232                      true     
openshift-worker-0   OK       ready                                      ipmi://192.168.111.1:6233   unknown            false

Getting Involved

All development is happening on github.  We have a metal3-dev mailing list and use #cluster-api-baremetal on Kubernetes Slack to chat.  Occasional project updates are posted to @metal3_io on Twitter.

OVN – Geneve vs VXLAN, Does it Matter?

One of the early design decisions made in OVN was to only support tunnel encapsulation protocols that provided the ability to include additional metadata beyond what fits in the VNI field of a VXLAN header.  OVN mostly uses the Geneve protocol and only uses VXLAN for integration with TOR switches that support the hardware_vtep OVSDB schema to use as L2 gateways between logical and physical networks.

Many people wonder when they first learn of this design decision, “why not VXLAN?”  In particular, what about performance?  Some hardware has VXLAN offload capabilities.  Are we going to suffer a performance hit when using Geneve?

These are very good questions, so I set off to come up with a good answer.

Why Geneve?

One of the key implementation details of OVN is Logical Flows.  Instead of programming new features using OpenFlow, we primarily use Logical Flows.  This makes feature development easier because we don’t have to worry about the physical location of resources on the network when writing flows.  We are able to write flows as if the entire deployment was one giant switch instead of 10s, 100s, or 1000s of switches.

Part of the implementation of this is that in addition to passing a network ID over a tunnel, we also pass IDs for the logical source and destination ports.  With Geneve, OVN will identify the network using the VNI field and will use an additional 32-bit TLV to specify both the source and destination logical ports.

Of course, by using an extensible protocol, we also have the capability to add more metadata for advanced features in the future.

More detail about OVN’s use of Geneve TLVs can be found in the “Tunnel Encapsulations” sub-section of “Design Decisions” in the OVN Architecture document.

Hardware Offload

Imagine a single UDP packet being sent between two VMs.  The headers might look something like:

  • Ethernet header
  • IP header
  • UDP header
  • Application payload

When we encapsulate this packet in a tunnel, what gets sent over the physical network ends up looking like this:

  • Outer Ethernet header
  • Outer IP header
  • Outer UDP header
  • Geneve or VXLAN Header
  • Application payload: (Inner packet from VM 1 to VM 2)
    • Inner Ethernet header
    • Inner IP header
    • Inner UDP header
    • Application payload

There are many more NIC capabilities than what’s discussed here, but I’ll focus on some key features related to tunnel performance.

Some offload capabilities are not actually VXLAN specific.  For example, the commonly referred to “tx-udp_tnl-segmentation” offload applies to both VXLAN and Geneve.  This is where the kernel is able to send a large amount of data to the NIC at once and the NIC breaks it up into TCP segments and then adds both the inner and outer headers. The performance boost comes from not having to do the same thing in software.  This offload helps significantly with TCP throughput over a tunnel.

You can check to see if a NIC has support for “tx-udp_tnl-segmentation” with ethtool.  For example, on a host that doesn’t support it:

$ ethtool -k eth0 | grep tnl-segmentation
tx-udp_tnl-segmentation: off [fixed]

or on a host that does support it and has it enabled:

$ ethtool -k eth0 | grep tnl-segmentation
tx-udp_tnl-segmentation: on

There is a type of offload that is VXLAN specific, and that is RSS (Receive Side Scaling).  This is when the NIC is able to look inside a tunnel to identify the inner flows and efficiently distribute them among multiple receive queues (to be processed across multiple CPUs).  Without this capability, a VXLAN tunnel looks like a single stream and will go into a single receive queue.

You may wonder, “does my NIC support VXLAN or Geneve RSS?”  Unfortunately, there does not appear to be an easy way to check this with a command.  The best method I’ve seen is to read the driver source code or dig through vendor documentation.

Since the VXLAN specific offload capability is on the receive side, it’s important to look at what other techniques can be used to improve receive side performance.  One such option is RPS (Receive Packet Steering).  RPS is the same concept as RSS, but done in software.  Packets are distributed among CPUs in software before fully processing them.

Another optimization is that OVN enables UDP checksums on Geneve tunnels by default.  Adding this checksum actually improves performance on the receive side.  This is because of some more recent optimizations implemented in the kernel.  When a Geneve packet is received, this outer UDP checksum will be verified by the NIC.  This checksum verification will be reported to the kernel.  Since the outer UDP checksum has been verified, the kernel uses this fact to skip having to calculate and verify any checksums of the inner packet.  Without enabling the outer UDP checksum and letting the NIC verify it, the kernel is doing more checksum calculation in software.  It’s expected that this regains significant performance on the receive side.

Performance Testing

In the last section, we identified that there is an offload capability (RSS) that is VXLAN specific.  Some NICs support RSS for VXLAN and Geneve, some for VXLAN only, and others don’t support it at all.

This raises an important question: On systems with NICs that do RSS for VXLAN only, can we match performance with Geneve?

On the surface, we expect Geneve performance to be worse.  However, because of other optimizations, we really need to check to see how much RSS helps.

After some investigation of driver source code (Thanks, Lance Richardson!), we found that the following drivers had RSS support for VXLAN, but not Geneve.

  • mlx4_en (Mellanox)
  • mlx5_core (Mellanox)
  • qlcnic (QLogic)
  • be2net (HPE Emulex)

To help answer our question above, we did some testing on machines with one of these NICs.

Hardware

The testing was done between two servers.  Both had a Mellanox NIC using the mlx4_en driver.  The NICs were connected back-to-back.

The servers had the following specs:

  • HP Z220
  • Intel(R) Core(TM) i5-3470 CPU @ 3.20GHz (1 socket, 4 cores)
  • Memory: 4096 MB

Software

  • Operating System: RHEL 7.3
  • Kernel: 4.10.2-1.el7.elrepo.x86_64
  • OVS: openvswitch-2.6.1-4.1.git20161206.el7.x86_64
  • tuned profile: throughput-performance

Test Overview

  • Create two tunnels between the hosts: one VXLAN and one Geneve.
    • With Geneve, add 1 TLV field to match the amount of additional metadata sent across the tunnel with OVN.
  • Use pbench-uperf to run tests
  • Traffic
    • TCP
    • UDP (with different packet sizes, 64 and 1024 byte)
    • Multiple concurrent streams (8 and 64)
  • All tests are run 3 times.  Results must be within 5% stddev or the 3 runs will be discarded and will run again.  This ensures reasonably consistent and reliable results.

Summary of Results

TCP Throughput

  • We reach line rate with both VXLAN and Geneve.  Differences are observed in CPU consumption where we see Geneve consistently using less CPU.

tcp-throughput-cpu

Average CPU Utilization Across Both Hosts
Scenario VXLAN – Average CPU Utilization (Percent) Geneve w/ UDP checksums and 1 TLV Field – Average CPU Utilization (Percent) Average CPU Utilization Increase (Percent)
tcp_bidirec-1024B-64i 40.17 36.45 -3.72
tcp_bidirec-1024B-8i 36.62 31.26 -5.35
tcp_bidirec-16384B-64i 27.02 24.52 -2.50
tcp_bidirec-16384B-8i 24.34 20.06 -4.28
tcp_stream-1024B-64i 39.75 36.53 -3.22
tcp_stream-1024B-8i 37.36 31.50 -5.87
tcp_stream-16384B-64i 26.92 22.83 -4.09
tcp_stream-16384B-8i 24.01 21.32 -2.69
Average CPU Utilization Increase (Percent) Across All Scenarios -3.96

TCP and UDP Request/Response Rate (RR)

  • We see higher CPU usage in these scenarios with Geneve, but an even higher relative amount of requests per second processed, leading us to conclude that Geneve is performing better overall in this case, as well.

rr-perf

Request / Response Performance
Scenario VXLAN – Requests per Second Geneve w/ UDP checksums and 1 TLV Field – Requests Per Second Percent Increase with Geneve
tcp_rr-1024B-64i 221400 241900 9.26%
tcp_rr-1024B-8i 109000 135000 23.85%
tcp_rr-16384B-64i 63400 63060 -0.54%
tcp_rr-16384B-8i 34330 37950 10.54%
udp_rr-1024B-64i 280300 283600 1.18%
udp_rr-1024B-8i 113600 145200 27.82%
udp_rr-64B-64i 282300 293100 3.83%
udp_rr-64B-8i 121600 154000 26.64%
Average Percentage Increase with Geneve 12.82%

rr-cpu

Average CPU Utilization Across Both Hosts
Scenario VXLAN Average CPU Utilization (Percent) Geneve w/ UDP checksums Average CPU Utilization (Percent) Average CPU Utilization Increase (Percent)
tcp_rr-1024B-64i 85.39 86.49 1.11
tcp_rr-1024B-8i 50.94 51.06 0.13
tcp_rr-16384B-64i 81.02 84.04 3.02
tcp_rr-16384B-8i 48.84 60.58 11.74
udp_rr-1024B-64i 85.10 85.05 -0.05
udp_rr-1024B-8i 45.95 46.38 0.43
udp_rr-64B-64i 85.65 85.71 0.06
udp_rr-64B-8i 47.66 49.43 1.77
Average CPU Utilization Increase (Percent) Across All Scenarios 2.28

Conclusion

Using optimizations available in newer versions of the Linux kernel, we are seeing better performance with Geneve than VXLAN, despite this hardware having some VXLAN specific offload capabilities.

Based on these results, I feel that OVN’s reliance on Geneve as its standard tunneling protocol is acceptable.  It provides additional capabilities while maintaining good performance, even on hardware that has VXLAN specific RSS support.

Adding general VXLAN support to OVN would not be trivial and would introduce a significant ongoing maintenance burden.  Testing done so far does not justify that cost.

Comparing OpenStack Neutron ML2+OVS and OVN – Control Plane

We have done a lot of performance testing of OVN over time, but one major thing missing has been an apples-to-apples comparison with the current OVS-based OpenStack Neutron backend (ML2+OVS).  I’ve been working with a group of people to compare the two OpenStack Neutron backends.  This is the first piece of those results: the control plane.  Later posts will discuss data plane performance.

Control Plane Differences

The ML2+OVS control plane is based on a pattern seen throughout OpenStack.  There is a series of agents written in Python.  The Neutron server communicates with these agents using an rpc mechanism built on top of AMQP (RabbitMQ in most deployments, including our tests).

OVN takes a distributed database-driven approach.  Configuration and state is managed through two databases: the OVN northbound and southbound databases.  These databases are currently based on OVSDB.  Instead of receiving updates via RPC, components are watching relevant portions of the database for changes and applying them locally.  More detail about these components can be found in my post about the first release of OVN, or even more detail is in the ovn-architecture document.

OVN does not make use of any of the Neutron agents.  Instead, all required functionality is implemented by ovn-controller and OVS flows.  This includes things like security groups, DHCP, L3 routing, and NAT.

Hardware and Software

Our testing was done in a lab using 13 machines which were allocated to the following functions:

  • 1 OpenStack TripleO Undercloud for provisioning
  • 3 Controllers (OpenStack and OVN control plane services)
  • 9 Compute Nodes (Hypervisors)

The hardware had the following specs:

  • 2x E5-2620 v2 (12 total cores, 24 total threads)
  • 64GB RAM
  • 4 x 1TB SATA
  • 1 x Intel X520 Dual Port 10G

Software:

  • CentOS 7.2
  • OpenStack, OVS, and OVN from their master branches (early December, 2016)
  • Neutron configuration notes
    • (OVN) 6 API workers, 1 RPC worker (since rpc is not used and neutron requires at least 1) for neutron-server on each controller (x3)
    • (ML2+OVS) 6 API workers, 6 RPC workers for neutron-server on each controller (x3)
    • (ML2+OVS) DVR was enabled

Test Configuration

The tests were run using OpenStack Rally.  We used the Browbeat project to easily set up, configure, and run the tests, as well as store, analyze, and compare results.  The rally portion of the browbeat configuration was:

rerun: 3
...
rally:
  enabled: true
  sleep_before: 5
  sleep_after: 5
  venv: /home/stack/rally-venv/bin/activate
  plugins:
    - netcreate-boot: rally/rally-plugins/netcreate-boot
    - subnet-router-create: rally/rally-plugins/subnet-router-create
    - neutron-securitygroup-port: rally/rally-plugins/neutron-securitygroup-port
  benchmarks:
    - name: neutron
      enabled: true
      concurrency:
        - 8
        - 16
        - 32 
      times: 500
      scenarios:
        - name: create-list-network
          enabled: true
          file: rally/neutron/neutron-create-list-network-cc.yml
        - name: create-list-port
          enabled: true
          file: rally/neutron/neutron-create-list-port-cc.yml
        - name: create-list-router
          enabled: true
          file: rally/neutron/neutron-create-list-router-cc.yml
        - name: create-list-security-group
          enabled: true
          file: rally/neutron/neutron-create-list-security-group-cc.yml
        - name: create-list-subnet
          enabled: true
          file: rally/neutron/neutron-create-list-subnet-cc.yml
    - name: plugins
      enabled: true
      concurrency:
        - 8
        - 16
        - 32 
      times: 500
      scenarios:
        - name: netcreate-boot
          enabled: true
          image_name: cirros
          flavor_name: m1.xtiny
          file: rally/rally-plugins/netcreate-boot/netcreate_boot.yml
        - name: subnet-router-create
          enabled: true
          num_networks:  10
          file: rally/rally-plugins/subnet-router-create/subnet-router-create.yml
        - name: neutron-securitygroup-port
          enabled: true
          file: rally/rally-plugins/neutron-securitygroup-port/neutron-securitygroup-port.yml

This configuration defines several scenarios to run.  Each one is set to run 500 times, at three different concurrency levels.  Finally, “rerun: 3” at the beginning says we run the entire configuration 3 times.  This is a bit confusing, so let’s look at one example.

The “netcreate-boot” scenario is to create a network and boot a VM on that network.  The configuration results in the following execution:

  • Run 1
    • Create 500 VMs, each on their own network, 8 at a time, and then clean up
    • Create 500 VMs, each on their own network, 16 at a time, and then clean up
    • Create 500 VMs, each on their own network, 32 at a time, and then clean up
  • Run 2
    • Create 500 VMs, each on their own network, 8 at a time, and then clean up
    • Create 500 VMs, each on their own network, 16 at a time, and then clean up
    • Create 500 VMs, each on their own network, 32 at a time, and then clean up
  • Run 3
    • Create 500 VMs, each on their own network, 8 at a time, and then clean up
    • Create 500 VMs, each on their own network, 16 at a time, and then clean up
    • Create 500 VMs, each on their own network, 32 at a time, and then clean up

In total, we will have created 4500 VMs.

Results

Browbeat includes the ability to store all rally test results in elastic search and then display them using Kibana.  A live dashboard of these results is on elk.browbeatproject.org.

The following tables show the results for the average times, 95th percentile, Maximum, and minimum times for all APIs executed throughout the test scenarios.

API ML2+OVS Average OVN Average % improvement
nova.boot_server 80.672 23.45 70.93%
neutron.list_ports 6.296 6.478 -2.89%
neutron.list_subnets 5.129 3.826 25.40%
neutron.add_interface_router 4.156 3.509 15.57%
neutron.list_routers 4.292 3.089 28.03%
neutron.list_networks 2.596 2.628 -1.23%
neutron.list_security_groups 2.518 2.518 0.00%
neutron.remove_interface_router 3.679 2.353 36.04%
neutron.create_port 2.096 2.136 -1.91%
neutron.create_subnet 1.775 1.543 13.07%
neutron.delete_port 1.592 1.517 4.71%
neutron.create_security_group 1.287 1.372 -6.60%
neutron.create_network 1.352 1.285 4.96%
neutron.create_router 1.181 0.845 28.45%
neutron.delete_security_group 0.763 0.793 -3.93%

 

API ML2+OVS 95% OVN 95% % improvement
nova.boot_server 163.2 35.336 78.35%
neutron.list_ports 11.038 11.401 -3.29%
neutron.list_subnets 10.064 6.886 31.58%
neutron.add_interface_router 7.908 6.367 19.49%
neutron.list_routers 8.374 5.321 36.46%
neutron.list_networks 5.343 5.171 3.22%
neutron.list_security_groups 5.648 5.556 1.63%
neutron.remove_interface_router 6.917 4.078 41.04%
neutron.create_port 5.521 4.968 10.02%
neutron.create_subnet 4.041 3.091 23.51%
neutron.delete_port 2.865 2.598 9.32%
neutron.create_security_group 3.245 3.547 -9.31%
neutron.create_network 3.089 2.917 5.57%
neutron.create_router 2.893 1.92 33.63%
neutron.delete_security_group 1.776 1.72 3.15%

 

API ML2+OVS Maximum OVN Maximum % improvement
nova.boot_server 221.877 47.827 78.44%
neutron.list_ports 29.233 32.279 -10.42%
neutron.list_subnets 35.996 17.54 51.27%
neutron.add_interface_router 29.591 22.951 22.44%
neutron.list_routers 19.332 13.975 27.71%
neutron.list_networks 12.516 13.765 -9.98%
neutron.list_security_groups 14.577 13.092 10.19%
neutron.remove_interface_router 35.546 9.391 73.58%
neutron.create_port 53.663 40.059 25.35%
neutron.create_subnet 46.058 26.472 42.52%
neutron.delete_port 5.121 5.149 -0.55%
neutron.create_security_group 14.243 13.206 7.28%
neutron.create_network 32.804 32.566 0.73%
neutron.create_router 14.594 6.452 55.79%
neutron.delete_security_group 4.249 3.746 11.84%

 

API ML2+OVS Minimum OVN Minimum % improvement
nova.boot_server 18.665 3.761 79.85%
neutron.list_ports 0.195 0.22 -12.82%
neutron.list_subnets 0.252 0.187 25.79%
neutron.add_interface_router 1.698 1.556 8.36%
neutron.list_routers 0.185 0.147 20.54%
neutron.list_networks 0.21 0.174 17.14%
neutron.list_security_groups 0.132 0.184 -39.39%
neutron.remove_interface_router 1.557 1.057 32.11%
neutron.create_port 0.58 0.614 -5.86%
neutron.create_subnet 0.42 0.416 0.95%
neutron.delete_port 0.464 0.46 0.86%
neutron.create_security_group 0.081 0.094 -16.05%
neutron.create_network 0.113 0.179 -58.41%
neutron.create_router 0.077 0.053 31.17%
neutron.delete_security_group 0.092 0.104 -13.04%

Analysis

The most drastic difference in results is for “nova.boot_server”.  This is also the one piece of these tests that actually measures the time it takes to provision the network, and not just loading Neutron with configuration.

When Nova boots a server, it blocks waiting for an event from Neutron indicating that a port is ready before it sets the server state to ACTIVE and powers on the VM.  Both ML2+OVS and OVN implement this mechanism.  Our test scenario measured the time it took for servers to become ACTIVE.

Further tests were done on ML2+OVS and we were able to confirm that disabling this synchronization between Nova and Neutron brought the results back to being on par with the OVN results.  This confirmed that the extra time was indeed spent waiting for Neutron to report that ports were ready.

To be clear, you should not disable this synchronization.  The only reason you can disable it is because not all Neutron backends support it (ML2+OVS and OVN both do).  It was put in place to avoid a race condition.  It ensures that the network is actually ready for use before booting a VM.  The issue is how long it’s taking Neutron to provision the network for use.  Further analysis is needed to break down where Neutron (ML2+OVS) is spending most of its time in the provisioning process.

OVN Logical Flows and ovn-trace

One of the most satisfying feelings when working on new software is when you settle on a really great abstraction. When this goes well, things just fall into place. The design is easy to understand and modifying the system is an easy, pleasant experience.

This is how I’ve felt as I learned about the original proposed design for OVN and then contributed to OVN development over the last year and a half. In particular, I’ve been incredibly happy with the Logical Flows abstraction.

In this post I’ll explain what Logical Flows are and how to use ovn-trace to understand them. I will also provide some examples where this abstraction made adding features far easier than I would have expected.

But First, OpenFlow Basics

Before getting to Logical Flows, it is helpful to have a general understanding of OpenFlow. OpenFlow is the protocol used to program the packet processing pipeline of Open vSwitch. It lets you define a series of tables with rules (flows) that contain a priority, match, and a set of actions. For each table, the highest priority (larger number is higher priority) flow that matches is executed.

Let’s imagine a trivial virtual switch with two ports, port1 and port2.

                        +--------+
            (1)         |        |          (2)
           port1 -------| br-int |-------- port2
                        |        |
                        +--------+

We can create a bridge with 2 ports using the following commands:

$ ovs-vsctl add-br br-int
$ ovs-vsctl add-port br-int port1
$ ovs-vsctl add-port br-int port2

$ ovs-vsctl show
3b1995d8-9683-45db-8929-36c62abdbd31
    Bridge br-int
        Port "port1"
            Interface "port1"
        Port br-int
            Interface br-int
                type: internal
        Port "port2"
            Interface "port2"

A trivial example is to define a single table where we forward all packets from port1 to port2, and all packets from port2 to port1.

    Table  Priority  Match      Actions
    -----  --------  ---------- -------
    0      0         in_port=1  output:2
    0      0         in_port=2  output:1

We can program this pipeline in OVS using the ovs-ofctl command:

$ ovs-ofctl del-flows br-int
$ ovs-ofctl add-flow br-int
$ ovs-ofctl add-flow br-int "table=0, priority=0, in_port=1,actions=output:2"
$ ovs-ofctl add-flow br-int "table=0, priority=0, in_port=2,actions=output:1"

$ ovs-ofctl dump-flows br-int
NXST_FLOW reply (xid=0x4):
  cookie=0x0, duration=9.679s, table=0, n_packets=0, n_bytes=0, idle_age=9, priority=0,in_port=1 actions=output:2
  cookie=0x0, duration=2.287s, table=0, n_packets=0, n_bytes=0, idle_age=2, priority=0,in_port=2 actions=output:1

We can extend this example to add a second table and demonstrate the use of different priorities. Let’s only allow packets from port1 if the source MAC address is 00:00:00:00:01 and only allow packets from port2 with a source MAC address of 00:00:00:00:00:02 (basic source port security). We’ll use table 0 to implement port security and then use table 1 to decide the packet’s destination.

(Yes, this could be done in a single table, but then it wouldn’t be demonstrating tables and priorities, which is the main point here.)

    Table  Priority  Match                               Actions
    -----  --------  ----------------------------------- ------------
    0      10        in_port=1,dl_src=00:00:00:00:00:01  resubmit(,1)
    0      10        in_port=2,dl_src=00:00:00:00:00:02  resubmit(,1)
    0      0                                             drop
    1      0         in_port=1                           output:2
    1      0         in_port=2                           output:1

Again, we can program this pipeline using the ovs-ofctl command line utility.

$ ovs-ofctl del-flows br-int
$ ovs-ofctl add-flow br-int "table=0, priority=10, in_port=1,dl_src=00:00:00:00:00:01,actions=resubmit(,1)"
$ ovs-ofctl add-flow br-int "table=0, priority=10, in_port=2,dl_src=00:00:00:00:00:02,actions=resubmit(,1)"
$ ovs-ofctl add-flow br-int "table=0, priority=0, actions=drop"
$ ovs-ofctl add-flow br-int "table=1, priority=0, in_port=1,actions=output:2"
$ ovs-ofctl add-flow br-int "table=1, priority=0, in_port=2,actions=output:1"

$ ovs-ofctl dump-flows br-int
NXST_FLOW reply (xid=0x4):
 cookie=0x0, duration=72.132s, table=0, n_packets=0, n_bytes=0, idle_age=72, priority=10,in_port=1,dl_src=00:00:00:00:00:01 actions=resubmit(,1)
 cookie=0x0, duration=60.565s, table=0, n_packets=0, n_bytes=0, idle_age=60, priority=10,in_port=2,dl_src=00:00:00:00:00:02 actions=resubmit(,1)
 cookie=0x0, duration=28.127s, table=0, n_packets=0, n_bytes=0, idle_age=28, priority=0 actions=drop
 cookie=0x0, duration=13.887s, table=1, n_packets=0, n_bytes=0, idle_age=13, priority=0,in_port=1 actions=output:2
 cookie=0x0, duration=4.023s, table=1, n_packets=0, n_bytes=0, idle_age=4, priority=0,in_port=2 actions=output:1

Open vSwitch also provides a mechanism to trace a sample packet through a configured pipeline. Here we will trace a packet from port1 with an expected source MAC address. The output of this trace shows that the packet is resubmitted to table 1 and is then output to port 2.   The output is a bit verbose.  Just look at the “Rule” and “OpenFlow actions” lines to see which flows were executed.

$ ovs-appctl ofproto/trace br-int in_port=1,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02 -generate
Bridge: br-int
Flow: in_port=1,vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,dl_type=0x0000

Rule: table=0 cookie=0 priority=10,in_port=1,dl_src=00:00:00:00:00:01
OpenFlow actions=resubmit(,1)

    Resubmitted flow: in_port=1,vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,dl_type=0x0000
    Resubmitted regs: reg0=0x0 reg1=0x0 reg2=0x0 reg3=0x0 reg4=0x0 reg5=0x0 reg6=0x0 reg7=0x0 reg8=0x0 reg9=0x0 reg10=0x0 reg11=0x0 reg12=0x0 reg13=0x0 reg14=0x0 reg15=0x0
    Resubmitted  odp: drop
    Resubmitted megaflow: recirc_id=0,in_port=1,dl_src=00:00:00:00:00:01,dl_type=0x0000
    Rule: table=1 cookie=0 priority=0,in_port=1
    OpenFlow actions=output:2

Final flow: unchanged
Megaflow: recirc_id=0,in_port=1,dl_src=00:00:00:00:00:01,dl_type=0x0000
Datapath actions: 3

OpenFlow can be used to build up much more complex pipelines, as well. See the ovs-ofctl(8) man page for a lot more detail.

OVN Logical Flows

The previous section recapped the basics of OpenFlow. It showed how OpenFlow can be used to build packet processing pipelines of a single switch. Manually programming these pipelines on one host, much less hundreds or thousands of hosts, can be tedious. That’s where an SDN controller that programs flows across many switches to accomplish a task is helpful. That’s the role that OVN plays for the Open vSwitch project. OVN does all of the OpenFlow programming necessary to implement the network topologies and security policies you define using its high level configuration interface.

How does OVN determine the flows required on each host? The central abstraction to solving this problem in OVN is Logical Flows. Logical Flows are conceptually similar to OpenFlow in that they are made up of tables of flows with a priority, match, and actions. The major difference is that logical flows describe the detailed behavior of an entire network that can span any number of hosts. It provides us with separation between defining detailed network behavior and having to worry about the actual physical layout of the environment (how many hosts exist and which hosts ports reside on).

OVN centrally programs networks in logical flows. These logical flows are distributed throughout the whole environment to ovn-controller running on each host. ovn-controller then knows how to compile logical flows into OpenFlow using the current state of the physical environment (what ports reside locally, and how to reach other hosts).

Let’s create an example OVN configuration similar to the one in the section on OpenFlow basics. We will create a single OVN logical switch with two logical ports.

$ ovn-nbctl ls-add sw0

$ ovn-nbctl lsp-add sw0 sw0-port1
$ ovn-nbctl lsp-set-addresses sw0-port1 00:00:00:00:00:01
$ ovn-nbctl lsp-set-port-security sw0-port1 00:00:00:00:00:01

$ ovn-nbctl lsp-add sw0 sw0-port2
$ ovn-nbctl lsp-set-addresses sw0-port2 00:00:00:00:00:02
$ ovn-nbctl lsp-set-port-security sw0-port2 00:00:00:00:00:02

$ ovn-nbctl show sw0
    switch 48d5f699-7ffe-4627-a369-2fc905e44b32 (sw0)
        port sw0-port1
            addresses: ["00:00:00:00:00:01"]
        port sw0-port2
            addresses: ["00:00:00:00:00:02"]

OVN defines the logical switch, sw0, using two pipelines: an ingress pipeline and an egress pipeline. When a packet enters the network, the ingress pipeline is executed on the host where the packet originated. If the destination is on the same host, the egress pipeline will be executed as well.

sw0-port1 and sw0-port2 on the same host:

    +--------------------------------------------------------------------------+
    |                                                                          |
    |                               Host A                                     |
    |                                                                          |
    |   +---------+                                              +---------+   |
    |   |sw0-port1| --> ingress pipeline --> egress pipeline --> |sw0-port2|   |
    |   +---------+                                              +---------+   |
    |                                                                          |
    +--------------------------------------------------------------------------+

If the destination is remote, the packet will be sent over a tunnel before executing the egress pipeline on the remote host.

sw0-port1 and sw0-port2 on separate hosts:

    +--------------------------------------+
    |                                      |
    |             Host A                   |
    |                                      |
    |   +---------+                        |
    |   |sw0-port1| --> ingress pipeline   |
    |   +---------+           ||           |
    |                         ||           |
    +-------------------------||-----------+
                              ||
                              \/
                         geneve tunnel
                              ||
                              ||
    +-------------------------||-----------+
    |                         ||           |
    |             Host B      ||           |
    |                         ||           |
    |   +---------+           \/           |
    |   |sw0-port2| < -- egress pipeline   |
    |   +---------+                        |
    |                                      |
    +--------------------------------------+

You can use the “ovn-sbctl lflow-list” command to view the full set of logical flows. The structure will feel somewhat familiar to OpenFlow, but there are some key differences:

  1. Ports are logical entities that reside somewhere on a network, not physical ports on a single switch.
  2. Each table in the pipeline is given a name in addition to its number. The name describes the purpose of that stage in the pipeline.
  3. The match syntax is far more flexible. It supports complex boolean expressions and will feel very familiar to programmers.
  4. The actions supported in OVN logical flows extend beyond what you would expect from OpenFlow. We are able to implement higher level features, such as DHCP, in the logical flow syntax. See the documentation for the Logical_Flow table in ovn-sb(5) for details on match and action syntax.

There are several additional stages in the pipeline reserved for features not being used in this example, so the flows in many of the tables are not doing anything interesting.

    $ ovn-sbctl lflow-list
    Datapath: "sw0" (d7bf4a7b-e915-4502-8f9d-5995d33f5d10)  Pipeline: ingress
      table=0 (ls_in_port_sec_l2  ), priority=100  , match=(eth.src[40]), action=(drop;)
      table=0 (ls_in_port_sec_l2  ), priority=100  , match=(vlan.present), action=(drop;)
      table=0 (ls_in_port_sec_l2  ), priority=50   , match=(inport == "sw0-port1" && eth.src == {00:00:00:00:00:01}), action=(next;)
      table=0 (ls_in_port_sec_l2  ), priority=50   , match=(inport == "sw0-port2" && eth.src == {00:00:00:00:00:02}), action=(next;)
      table=1 (ls_in_port_sec_ip  ), priority=0    , match=(1), action=(next;)
      table=2 (ls_in_port_sec_nd  ), priority=90   , match=(inport == "sw0-port1" && eth.src == 00:00:00:00:00:01 && arp.sha == 00:00:00:00:00:01), action=(next;)
      table=2 (ls_in_port_sec_nd  ), priority=90   , match=(inport == "sw0-port1" && eth.src == 00:00:00:00:00:01 && ip6 && nd && ((nd.sll == 00:00:00:00:00:00 || nd.sll == 00:00:00:00:00:01) || ((nd.tll == 00:00:00:00:00:00 || nd.tll == 00:00:00:00:00:01)))), action=(next;)
      table=2 (ls_in_port_sec_nd  ), priority=90   , match=(inport == "sw0-port2" && eth.src == 00:00:00:00:00:02 && arp.sha == 00:00:00:00:00:02), action=(next;)
      table=2 (ls_in_port_sec_nd  ), priority=90   , match=(inport == "sw0-port2" && eth.src == 00:00:00:00:00:02 && ip6 && nd && ((nd.sll == 00:00:00:00:00:00 || nd.sll == 00:00:00:00:00:02) || ((nd.tll == 00:00:00:00:00:00 || nd.tll == 00:00:00:00:00:02)))), action=(next;)
      table=2 (ls_in_port_sec_nd  ), priority=80   , match=(inport == "sw0-port1" && (arp || nd)), action=(drop;)
      table=2 (ls_in_port_sec_nd  ), priority=80   , match=(inport == "sw0-port2" && (arp || nd)), action=(drop;)
      table=2 (ls_in_port_sec_nd  ), priority=0    , match=(1), action=(next;)
      table=3 (ls_in_pre_acl      ), priority=0    , match=(1), action=(next;)
      table=4 (ls_in_pre_lb       ), priority=0    , match=(1), action=(next;)
      table=5 (ls_in_pre_stateful ), priority=100  , match=(reg0[0] == 1), action=(ct_next;)
      table=5 (ls_in_pre_stateful ), priority=0    , match=(1), action=(next;)
      table=6 (ls_in_acl          ), priority=0    , match=(1), action=(next;)
      table=7 (ls_in_qos_mark     ), priority=0    , match=(1), action=(next;)
      table=8 (ls_in_lb           ), priority=0    , match=(1), action=(next;)
      table=9 (ls_in_stateful     ), priority=100  , match=(reg0[1] == 1), action=(ct_commit(ct_label=0/1); next;)
      table=9 (ls_in_stateful     ), priority=100  , match=(reg0[2] == 1), action=(ct_lb;)
      table=9 (ls_in_stateful     ), priority=0    , match=(1), action=(next;)
      table=10(ls_in_arp_rsp      ), priority=0    , match=(1), action=(next;)
      table=11(ls_in_dhcp_options ), priority=0    , match=(1), action=(next;)
      table=12(ls_in_dhcp_response), priority=0    , match=(1), action=(next;)
      table=13(ls_in_l2_lkup      ), priority=100  , match=(eth.mcast), action=(outport = "_MC_flood"; output;)
      table=13(ls_in_l2_lkup      ), priority=50   , match=(eth.dst == 00:00:00:00:00:01), action=(outport = "sw0-port1"; output;)
      table=13(ls_in_l2_lkup      ), priority=50   , match=(eth.dst == 00:00:00:00:00:02), action=(outport = "sw0-port2"; output;)
    Datapath: "sw0" (d7bf4a7b-e915-4502-8f9d-5995d33f5d10)  Pipeline: egress
      table=0 (ls_out_pre_lb      ), priority=0    , match=(1), action=(next;)
      table=1 (ls_out_pre_acl     ), priority=0    , match=(1), action=(next;)
      table=2 (ls_out_pre_stateful), priority=100  , match=(reg0[0] == 1), action=(ct_next;)
      table=2 (ls_out_pre_stateful), priority=0    , match=(1), action=(next;)
      table=3 (ls_out_lb          ), priority=0    , match=(1), action=(next;)
      table=4 (ls_out_acl         ), priority=0    , match=(1), action=(next;)
      table=5 (ls_out_qos_mark    ), priority=0    , match=(1), action=(next;)
      table=6 (ls_out_stateful    ), priority=100  , match=(reg0[1] == 1), action=(ct_commit(ct_label=0/1); next;)
      table=6 (ls_out_stateful    ), priority=100  , match=(reg0[2] == 1), action=(ct_lb;)
      table=6 (ls_out_stateful    ), priority=0    , match=(1), action=(next;)
      table=7 (ls_out_port_sec_ip ), priority=0    , match=(1), action=(next;)
      table=8 (ls_out_port_sec_l2 ), priority=100  , match=(eth.mcast), action=(output;)
      table=8 (ls_out_port_sec_l2 ), priority=50   , match=(outport == "sw0-port1" && eth.dst == {00:00:00:00:00:01}), action=(output;)
      table=8 (ls_out_port_sec_l2 ), priority=50   , match=(outport == "sw0-port2" && eth.dst == {00:00:00:00:00:02}), action=(output;)

The easiest way to understand logical flows is to use the ovn-trace command. ovn-trace allows you to see how OVN would process a sample packet.

ovn-trace has two required arguments:

    $ ovn-trace DATAPATH MICROFLOW

DATAPATH identifies the logical datapath (a logical switch or a logical router) where the sample packet will begin. MICROFLOW describes the sample packet to be simulated. Much more detail can be found in the ovn-trace(8) man page.

Given our sample OVN configuration, let’s see how OVN would process a packet from sw0-port1 that is intended for sw0-port2. ovn-trace has a few different levels of detail to choose from. The first is –minimal, which tells you what happens to a packet, but omits a lot of unnecessary detail. In this case, we see that the final result is that the packet will be delivered to sw0-port2, as expected.

    $ ovn-trace --minimal sw0 'inport == "sw0-port1" && eth.src == 00:00:00:00:00:01 && eth.dst == 00:00:00:00:00:02'
    # reg14=0x1,vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,dl_type=0x0000
    output("sw0-port2");

The next level of detail is given if you use the –summary option. In this mode, we get more detail about packet processing, including which pipeline is being executed. If we run ovn-trace with the same sample packet, we get a better idea of how the packet is processed. We see that:

  1. The packet enters the network (sw0) from port sw0-port1 and runs the ingress pipeline.
  2. We can see the value “sw0-port2” set to the “outport” variable, indicating that the intended destination for this packet is “sw0-port2”.
  3. The packet is output from the ingress pipeline, which brings it to the egress pipeline for “sw0” with the outport variable set to “sw0-port2”.
  4. The output action is executed in the egress pipeline, which outputs the packet to the current value of the “outport” variable, which is “sw0-port2”.
    $ ovn-trace --summary sw0 'inport == "sw0-port1" && eth.src == 00:00:00:00:00:01 && eth.dst == 00:00:00:00:00:02'
    # reg14=0x1,vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,dl_type=0x0000
    ingress(dp="sw0", inport="sw0-port1") {
        outport = "sw0-port2";
        output;
        egress(dp="sw0", inport="sw0-port1", outport="sw0-port2") {
            output;
            /* output to "sw0-port2", type "" */;
        };
    };

While debugging a problem or modifying the code, you may want even more detailed output. ovn-trace has a –detailed option. In this case you get more details about each meaningful logical flow encountered. You see the table number, pipeline stage name, full match, and priority number from the flow. You also get a reference to the location in the OVN source code that is responsible for the creation of that logical flow.

    $ ovn-trace --detailed sw0 'inport == "sw0-port1" && eth.src == 00:00:00:00:00:01 && eth.dst == 00:00:00:00:00:02'
    # reg14=0x1,vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:02,dl_type=0x0000

    ingress(dp="sw0", inport="sw0-port1")
    -------------------------------------
     0. ls_in_port_sec_l2 (ovn-northd.c:2827): inport == "sw0-port1" && eth.src == {00:00:00:00:00:01}, priority 50
        next(1);
    13. ls_in_l2_lkup (ovn-northd.c:3095): eth.dst == 00:00:00:00:00:02, priority 50
        outport = "sw0-port2";
        output;

    egress(dp="sw0", inport="sw0-port1", outport="sw0-port2")
    ---------------------------------------------------------
     8. ls_out_port_sec_l2 (ovn-northd.c:3170): outport == "sw0-port2" && eth.dst == {00:00:00:00:00:02}, priority 50
        output;
        /* output to "sw0-port2", type "" */

Another good example of using ovn-trace would be to see why a packet is getting dropped.  We’ve enabled port security, so let’s get a detailed trace of what would happen to a packet sent from sw0-port1 that contained an unexpected source MAC address.  The output will show us that the packet entered sw0 and failed to match any flow in table 0, meaning the packet is dropped.  We also see that table 0 is named “ls_in_port_sec_l2”, short for “Logical Switch ingress L2 port security”.

    $ ovn-trace --detailed sw0 'inport == "sw0-port1" && eth.src == 00:00:00:00:00:ff && eth.dst == 00:00:00:00:00:02'
# reg14=0x1,vlan_tci=0x0000,dl_src=00:00:00:00:00:ff,dl_dst=00:00:00:00:00:02,dl_type=0x0000

    ingress(dp="sw0", inport="sw0-port1")
    -------------------------------------
    0. ls_in_port_sec_l2: no match (implicit drop)

A similar example would be if a packet contained an unknown destination MAC address.  In this case, we’ll see that the packet successfully passed table 0, but failed to match in table 13, “ls_in_l2_lkup”, short for “Logical Switch ingress L2 lookup”.

    $ ovn-trace --detailed sw0 'inport == "sw0-port1" && eth.src == 00:00:00:00:00:01 && eth.dst == 00:00:00:00:00:ff'
    # reg14=0x1,vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:ff,dl_type=0x0000

    ingress(dp="sw0", inport="sw0-port1")
    -------------------------------------
     0. ls_in_port_sec_l2 (ovn-northd.c:2827): inport == "sw0-port1" && eth.src == {00:00:00:00:00:01}, priority 50
        next(1);
    13. ls_in_l2_lkup: no match (implicit drop)

So far, we have only looked at examples of a single L2 logical network.  Let’s create a new environment that shows how ovn-trace works across multiple networks.  We will create 2 networks, each with 2 ports, and connect them with a logical router.

#!/bin/bash

# Create the first logical switch and its two ports.
ovn-nbctl ls-add sw0

ovn-nbctl lsp-add sw0 sw0-port1
ovn-nbctl lsp-set-addresses sw0-port1 "00:00:00:00:00:01 10.0.0.51"
ovn-nbctl lsp-set-port-security sw0-port1 "00:00:00:00:00:01 10.0.0.51"

ovn-nbctl lsp-add sw0 sw0-port2
ovn-nbctl lsp-set-addresses sw0-port2 "00:00:00:00:00:02 10.0.0.52"
ovn-nbctl lsp-set-port-security sw0-port2 "00:00:00:00:00:02 10.0.0.52"

# Create the second logical switch and its two ports.
ovn-nbctl ls-add sw1

ovn-nbctl lsp-add sw1 sw1-port1
ovn-nbctl lsp-set-addresses sw1-port1 "00:00:00:00:00:03 192.168.1.51"
ovn-nbctl lsp-set-port-security sw1-port1 "00:00:00:00:00:03 192.168.1.51"

ovn-nbctl lsp-add sw1 sw1-port2
ovn-nbctl lsp-set-addresses sw1-port2 "00:00:00:00:00:04 192.168.1.52"
ovn-nbctl lsp-set-port-security sw1-port2 "00:00:00:00:00:04 192.168.1.52"

# Create a logical router between sw0 and sw1.
ovn-nbctl create Logical_Router name=lr0

ovn-nbctl lrp-add lr0 lrp0 00:00:00:00:ff:01 10.0.0.1/24
ovn-nbctl lsp-add sw0 sw0-lrp0 \
    -- set Logical_Switch_Port sw0-lrp0 type=router \
    options:router-port=lrp0 addresses='"00:00:00:00:ff:01"'

ovn-nbctl lrp-add lr0 lrp1 00:00:00:00:ff:02 192.168.1.1/24
ovn-nbctl lsp-add sw1 sw1-lrp1 \
    -- set Logical_Switch_Port sw1-lrp1 type=router \
    options:router-port=lrp1 addresses='"00:00:00:00:ff:02"'

We can then use “ovn-nbctl show” to view the resulting logical network configuration.

$ ovn-nbctl show
    switch bf4ba6c6-91c5-4f56-9981-72643816f923 (sw1)
        port sw1-lrp1
            addresses: ["00:00:00:00:ff:02"]
        port sw1-port2
            addresses: ["00:00:00:00:00:04 192.168.1.52"]
        port sw1-port1
            addresses: ["00:00:00:00:00:03 192.168.1.51"]
    switch 13b80127-4b36-46ea-816a-1ba4ffd6ac57 (sw0)
        port sw0-port1
            addresses: ["00:00:00:00:00:01 10.0.0.51"]
        port sw0-lrp0
            addresses: ["00:00:00:00:ff:01"]
        port sw0-port2
            addresses: ["00:00:00:00:00:02 10.0.0.52"]
    router 68935017-967a-4c4a-9dad-5d325a9f203a (lr0)
        port lrp0
            mac: "00:00:00:00:ff:01"
            networks: ["10.0.0.1/24"]
        port lrp1
            mac: "00:00:00:00:ff:02"
            networks: ["192.168.1.1/24"]

We should be able to trace a packet sent from sw0-port1 destined for sw1-port2, which requires going through the router. The minimal output will confirm that the end result is that the packet should be output to sw1-port2. We will also see what modifications the packet received along the way. As the packet traversed the logical router, the TTL was decremented and then the source and destination MAC addresses were updated for the next hop.

$ ovn-trace --minimal sw0 'inport == "sw0-port1" && \
                           eth.src == 00:00:00:00:00:01 && \
                           ip4.src == 10.0.0.51 && \
                           eth.dst == 00:00:00:00:ff:01 && \
                           ip4.dst == 192.168.1.52 && \
                           ip.ttl == 32'
# ip,reg14=0x1,vlan_tci=0x0000,dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:ff:01,nw_src=10.0.0.51,nw_dst=192.168.1.52,nw_proto=0,nw_tos=0,nw_ecn=0,nw_ttl=32
ip.ttl--;
eth.src = 00:00:00:00:ff:02;
eth.dst = 00:00:00:00:00:04;
output("sw1-port2");

If you’d like to take an even closer look, you can experiment with the previous ovn-trace command by changing the verbosity to –summary or –detailed. You could also start to make changes to the sample packet to see what would happen.

A Powerful Abstraction

I mentioned at the very beginning of this post that I find OVN Logical Flows to be a powerful abstraction that has made adding features to OVN much easier than I anticipated. Now that we’ve gone through logical flows in some detail, I’d like to point to a couple of recent feature developments that help demonstrate how easy logical flows make adding features to OVN.

Source Based Routing

OVN has support for L3 gateways that can be used to provide connectivity between OVN logical networks and physical networks. A typical OVN network might have an L3 gateway that resides on a single host. The downside to using a single L3 gateway is that all traffic destined for that physical network must go through the single host where the L3 gateway resides.

Recently, Gurucharan Shetty added support for multiple L3 gateways on an OVN logical network. The method supported for distributing traffic among the gateways is based on source IP address.

It’s not important to understand all of the details in this change. I mainly want to draw attention to how little of a code change was required. Let’s take a look at the diffstat, organizied by the type of change.

Documentation:
 NEWS                          |    1 
 ovn/ovn-nb.xml                |   28 +++++
 ovn/utilities/ovn-nbctl.8.xml |    8 +

Database schema update:
 ovn/ovn-nb.ovsschema          |    8 +

Command line utility support for new db schema additions:
 ovn/utilities/ovn-nbctl.c     |   43 ++++++--

Changes to how OVN builds logical flows to add support for this feature:
 ovn/northd/ovn-northd.c       |   24 +++-

Test code:
 tests/ovn-nbctl.at            |   42 ++++----
 tests/ovn.at                  |  219 ++++++++++++++++++++++++++++++++++++++++++

 8 files changed, 334 insertions(+), 39 deletions(-)

At the very core of this feature is the 24 lines of code changed in ovn-northd.c. This is where the code that generates logical flows was updated for this feature. That is amazing to me. This feature has a significant impact on network behavior, yet was accomplished in very few lines of C code.

DSCP

Another example of adding a feature using logical flows is this patch that adds support for setting the DSCP field of IP packets based on an arbitrary traffic classifier.

The patch added a new QoS table to the OVN northbound database. In this table you define a match (or traffic classifier) and the corresponding DSCP value to set on packets that match this classifier.

The key changes to the code are the 80 lines changed in ovn-northd.c. The patch looks a bit bigger than it really is because it created a new pipeline stage for QoS and had to renumber the stages that followed it.

Implementing this feature using logical flows just requires inserting flows matching the configured match (traffic classifier), and then using the OVN logical flow actions “ip.dscp = VALUE; next;”.

DHCP

OVN supports DHCPv4 and DHCPv6, but all of the details are controlled through logical flows. In most cases, if behavior needs to be changed, it’s only a change to the code that generates the DHCP related logical flows.

A recent example was this patch which added stateless DHCPv6 support. With stateless DHCPv6, we want to provide some configuration entries (such as a DNS address), but not assign an IPv6 address. Implementing this was just a small tweak to the DHCPv6 logical flows to optionally not include the IPv6 address option in the response generated by OVN.

Conclusion

I hope you now have a better understanding of OVN Logical Flows, a match-action pipeline for defining the behavior of logical networks that can span many hosts.

Thanks again to Ben Pfaff for more recently writing ovn-trace, which makes it easier to read, understand, and modify OVN logical flows. Ben talked about ovn-trace as a part of our OVN talk at the OpenStack Summit in Barcelona. You can find a video of that talk here:

OVS 2.6 and The First Release of OVN

In January of 2015, the Open vSwitch team announced that they planned to start a new project within OVS called OVN (Open Virtual Network).  The timing could not have been better for me as I was looking around for a new project.  I dove in with a goal of figuring out whether OVN could be a promising next generation of Open vSwitch integration for OpenStack and have been contributing to it ever since.

OVS 2.6.0 has now been released which includes the first non-experimental version of OVN.  As a community we have also built integration with OpenStack, Docker, and Kubernetes.

OVN is a system to support virtual network abstraction. OVN complements the existing capabilities of OVS to add native support for virtual network abstractions, such as virtual L2 and L3 overlays and security groups.

Some high level features of OVN include:

  • Provides virtual networking abstraction for OVS, implemented using L2 and L3 overlays, but can also manage connectivity to physical networks
  • Supports flexible ACLs (security policies) implemented using flows that use OVS connection tracking
  • Native support for distributed L3 routing using OVS flows, with support for both IPv4 and IPv6
  • ARP and IPv6 Neighbor Discovery suppression for known IP-MAC bindings
  • Native support for NAT and load balancing using OVS connection tracking
  • Native fully distributed support for DHCP
  • Works with any OVS datapath (such as the default Linux kernel datapath, DPDK, or Hyper-V) that supports all required features (namely Geneve tunnels and OVS connection tracking. See the datapath feature list in the FAQ for details.)
  • Supports L3 gateways from logical to physical networks
  • Supports software-based L2 gateways
  • Supports TOR (Top of Rack) based L2 gateways that implement the hardware_vtep schema
  • Can provide networking for both VMs and containers running inside of those VMs, without a second layer of overlay networking

Support for large scale deployments is a key goal of OVN.  So far, we have seen physical deployments of several hundred nodes.  We’ve also done some larger scale testing by simulating deployments of thousands of nodes using the ovn-scale-test project.

OVN Architecture

Components

ovn-architecture

OVN is a distributed system.  There is a local SDN controller that runs on every host, called ovn-controller.  All of the controllers are coordinated through the southbound database.  There is also a centralized component, ovn-northd, that processes high level configuration placed in the northbound database. OVN’s architecture is discussed in detail in the ovn-architecture document.

OVN uses databases for its control plane. One benefit is that scaling databases is a well understood problem.  OVN currently makes use of ovsdb-server as its database.  The use of ovsdb-server is particularly convenient within OVN as it introduces no new dependencies since ovsdb-server is already in use everywhere OVS is used.  However, the project is also currently considering adding support for, or fully migrating to etcd v3, since v3 includes all of the features we wanted for our system.

We have also found that this database driven architecture is much more reliable than RPC based approaches taken in other systems we have worked with.  In OVN, each instance of ovn-controller is always working with a consistent snapshot of the database.  It maintains a connection to the database and gets a feed of relevant updates as they occur.  If connectivity is interrupted, ovn-controller will always catch back up to the latest consistent snapshot of the relevant database contents and process them.

Logical Flows

OVN introduces a new intermediary representation of the system’s configuration called logical flows.  A typical centralized model would take the desired high level configuration, calculate the required physical flows for the environment, and program the switches on each node with those physical flows.  OVN breaks this problem up into a couple of steps.  It first calculates logical flows, which are similar to physical OpenFlow flows in their expressiveness, but operate only on logical entities.  The logical flows for a given network are identical across the whole environment.  These logical flows are then distributed to the local controller on each node, ovn-controller, which converts logical flows to physical flows.  This means that some deployment-wide computation is done once and the node-specific computation is fully distributed and done local to the node it applies to.

Logical flows have also proven to be powerful when it comes to implementing features.  As we’ve built up support for new capabilities in the logical flow syntax, most features are now implemented at the logical flow layer, which is much easier to work with than physical flows.

Data Path

OVN implements features natively in OVS wherever possible.  One such example is the implementation of security policies using OVS+conntrack integration.  I wrote about this in more detail previously.  This approach has led to significant data path performance improvements as compared to previous approaches.  The other area this makes a huge impact is how OVN implements distributed L3 routing.  Instead of combining OVS with several other layers of technology, we provide L3 routing purely with OVS flows.  In addition to the performance benefits, we also find this to be much simpler than the alternative approaches that other projects have taken to build routing on top of OVS.  Another benefit is that all of these features work with OVS+DPDK since we don’t rely on Linux kernel-specific features.

Integrations

OpenStack

Integration with OpenStack was developed in parallel with OVN itself.  The OpenStack networking-ovn project contains an ML2 driver for OpenStack Neutron that provides integration with OVN.  It differs from Neutron’s original OVS integration in some significant ways.  It no longer makes use of the Neutron Python agents as all equivalent functionality has been moved into OVN.  As a result, it no longer uses RabbitMQ.  Neutron’s use of RabbitMQ for RPC has been replaced by OVN’s database driven control plane.  The following diagram gives a visual representation of the architecture of Neutron using OVN.  Even more detail can be found in our documented reference architecture.

neutron-ovn-architecture

There are a few different ways to test out OVN integration with OpenStack.  The most popular development environment for OpenStack is called DevStack.  We provide integration with DevStack, including some instructions on how to do simple testing with DevStack.

If you’re a Vagrant user, networking-ovn includes a vagrant setup for doing multi-node testing of OVN using DevStack.

The OpenStack TripleO deployment project includes support for OVN as of the OpenStack Newton release.

Finally, we also have manual installation instructions to help with integrating OVN into your own OpenStack environment.

Kubernetes

There is active development on a CNI plugin for OVN to be used with Kubernetes.  One of the key goals for OVN was to have containers in mind from the beginning, and not just VMs.  Some important features were added to OVN to help support this integration.  For example, ovn-kubernetes makes use of OVN’s load balancing support, which is built on native load balancing support in OVS.

The README in that repository contains an overview, as well as instructions on how to use it.  There is also support for running an ovn-kubernetes environment using vagrant.

Docker

There is OVN integration with Docker networking, as well.  This currently resides in the main OVS repo, though it could be split out into its own repository in the future, similar to ovn-kubernetes.

Getting Involved

We would love feedback on your experience trying out OVN.  Here are some ways to get involved and provide feedback:

  • OVS and OVN are discussed on the OVS discuss mailing list.
  • OVN development occurs on the OVS development mailing list.
  • OVS and OVN are discussed in #openvswitch on the Freenode IRC network.
  • Development of the OVN Kubernetes integration occurs on Github but can be discussed on either the Open vSwitch IRC channel or discuss mailing list.
  • Integration of OVN with OpenStack is discussed in #openstack-neutron-ovn on Freenode, as well as the OpenStack development mailing list.

OpenStack Security Groups using OVN ACLs

OpenStack Security Groups give you a way to define packet filtering policy that is implemented by the cloud infrastructure.  OVN and its OpenStack Neutron integration now includes support for security groups and this post discusses how it works.

Existing OVS Support in OpenStack

It’s worth looking at how this has been implemented with OVS in the past for OpenStack.  OpenStack’s existing OVS integration (ML2+OVS) makes use of iptables to implement security groups.  Unfortunately, to make that work, we have to connect the VM to a tap device, put that on a linux bridge, and then connect the linux bridge to the OVS bridge using a veth pair so that we have a place to implement the iptables rules.  It’s great that this works, but the extra layers are not ideal.

old-security-group-impl

To get rid of all of the extra layers between the VM and OVS, we need to be able to build stateful firewall services in OVS directly.

Enter OVS with Conntrack Integration

OVS integration with the kernel’s connection tracker has been a hotly anticipated feature for OVS, and for good reason.  At the last OpenStack Summit in May, 2015, in Vancouver, there was a presentation that covered the benefits of this integration and how it will benefit security groups once available.  They were able to demonstrate significant performance benefit over the current approach of implementing security groups using iptables.  You can watch the presentation here:

The talk goes into some good detail about how this works.  However, at that time the conntrack integration was not yet finished and available for use.  Since then there has been fantastic progress!  The upstream kernel changes have been accepted and the userspace changes have all merged into the OVS project.  This will all be available in the next OVS release after 2.4.

The major piece left is completing a backport of the kernel changes.  Even though the openvswitch module is included in the upstream kernel, the OVS project maintains a version of the code that is backported to older kernels.  Backports of the conntrack integration are available as of writing in this branch.

This functionality can now be used to build stateful services in OVS.  Without having to get into what this looks like in terms of detailed flows, here is an idea of what it lets you do in your packet processing pipeline.

  1. In one stage, you can match all IP traffic and send it through the connection tracker.
  2. In the next stage, you now have the connection tracker’s state associated with this packet.
    1. For packets representing a new connection, you can use custom policy to decide if you’d like to accept the connection or not.  If you do accept it, you can tell the connection tracker to remember this connection.
    2. You know when packets are associated with existing connections and can allow them through.  This also applies to associated return traffic.
    3. You know if a packet is invalid because it’s not the right type of packet for a new connection and doesn’t match any existing known connection.

Now let’s take a closer look at some real usage.

OVN Stateful ACLs

An example use of OVS+conntrack is the implementation of ACLs in OVN.  ACLs provide a way to do distributed packet filtering for OVN networks. OVN ACLs are used to implement security groups for OpenStack Neutron.

I always find ovs-sandbox incredibly useful for exploring OVN features.  In fact, I’ve been writing an OVN tutorial that uses ovs-sandbox. Let’s use ovs-sandbox to look at how OVN uses OVS+conntrack to implement ACLs.

I always run ovs-sandbox straight from the ovs git tree.  If you’re starting from scratch, you’ll first need to clone the ovs git repository. Note that you may also need to install some dependencies, including: autoconf, automake, libtool, gcc, patch, and make.

$ git clone https://github.com/openvswitch/ovs.git
$ cd ovs
$ ./configure
& make

Now that we have ovs compiled from git, we can run ovs-sandbox with OVN enabled from the git tree.

$ make sandbox SANDBOXFLAGS="--ovn"

Next, we need to create a simple OVN logical topology. We’ll reuse a script from the OVN tutorial that creates a single logical switch with two logical ports. It then binds the two logical ports to the local ovs bridge in our sandbox. This script outputs all of the commands it executes.

$ ovn/env1/setup.sh 
+ ovn-nbctl lswitch-add sw0
+ ovn-nbctl lport-add sw0 sw0-port1
+ ovn-nbctl lport-add sw0 sw0-port2
+ ovn-nbctl lport-set-addresses sw0-port1 00:00:00:00:00:01
+ ovn-nbctl lport-set-addresses sw0-port2 00:00:00:00:00:02
+ ovn-nbctl lport-set-port-security sw0-port1 00:00:00:00:00:01
+ ovn-nbctl lport-set-port-security sw0-port2 00:00:00:00:00:02
+ ovs-vsctl add-port br-int lport1 -- set Interface lport1 external_ids:iface-id=sw0-port1
+ ovs-vsctl add-port br-int lport2 -- set Interface lport2 external_ids:iface-id=sw0-port2

We can view the logical topology using ovn-nbctl.

$ ovn-nbctl show
    lswitch caef7a2c-71fb-4af3-9cbc-589889606a2b (sw0)
        lport sw0-port1
            addresses: 00:00:00:00:00:01
        lport sw0-port2
            addresses: 00:00:00:00:00:02

We can also look at the physical topology to see that the two logical ports are bound to our single local chassis (hypervisor).

$ ovn-sbctl show
Chassis "56b18105-5706-46ef-80c4-ff20979ab068"
    Encap geneve
        ip: "127.0.0.1"
    Port_Binding "sw0-port1"
    Port_Binding "sw0-port2"

Now let’s create some ACLs! A common use case would be creating a policy for a given port that looks something like this:

  • Allow incoming ICMP requests and associated return traffic.
  • Allow incoming SSH connections and associated return traffic.
  • Drop other incoming IP traffic.

Here’s how we’d create that policy for sw0-port1 using ACLs.

$ ovn-nbctl acl-add sw0 to-lport 1002 'outport == "sw0-port1" && ip && icmp' allow-related
$ ovn-nbctl acl-add sw0 to-lport 1002 'outport == "sw0-port1" && ip && tcp && tcp.dst == 22' allow-related
$ ovn-nbctl acl-add sw0 to-lport 1001 'outport == "sw0-port1" && ip' drop

To verify what we’ve done, we can list the ACLs configured on the logical switch sw0.

$ ovn-nbctl acl-list sw0
  to-lport  1002 (outport == "sw0-port1" && ip && icmp) allow-related
  to-lport  1002 (outport == "sw0-port1" && ip && tcp && tcp.dst == 22) allow-related
  to-lport  1001 (outport == "sw0-port1" && ip) drop

Next we can look at how OVN integrates these ACLs into its Logical Flows.

As an aside, the more I work on and with OVN, the more convinced I am that Logical Flows are an incredibly powerful abstraction used in the OVN implementation. OVN first describes the packet processing pipeline in a structure that seems similar to OpenFlow, but only talks about logical network elements. This single logical packet processing pipeline is sent down to all hypervisors. A local controller on each hypervisor converts the logical flows into OpenFlow flows that reflect the local view of the world. The end result of all of this is that we’re able to implement more and more complex features in logical flows without having to worry about the current physical topology.

Now that we have ACLs configured, there are new entries in the logical flow table in the stages switch_in_pre_acl, switch_in_acl, switch_out_pre_acl, and switch_out_acl. The full logical flow table at this point can be seen with ovn-sbctl.

$ ovn-sbctl lflow-list

Let’s take a closer look at the switch_out_pre_acl and switch_out_post_acl stages of the egress logical flows for sw0.

In switch_out_pre_acl, we match IP traffic and put it through the connection tracker. This populates the connection state fields so that we can apply policy as appropriate.

    table=0(switch_out_pre_acl), priority=  100, match=(ip), action=(ct_next;)
    table=0(switch_out_pre_acl), priority=    0, match=(1), action=(next;)

In switch_out_acl, we allow packets associated with existing connections. We drop packets that are deemed to be invalid (such as non-SYN TCP packet not associated with an existing connection).

    table=1(switch_out_acl), priority=65535, match=(!ct.est && ct.rel && !ct.new && !ct.inv), action=(next;)
    table=1(switch_out_acl), priority=65535, match=(ct.est && !ct.rel && !ct.new && !ct.inv), action=(next;)
    table=1(switch_out_acl), priority=65535, match=(ct.inv), action=(drop;)

For new connections, we apply our configured ACL policy to decide whether to allow the connection or not. In this case, we’ll allow ICMP or SSH. Otherwise, we’ll drop the packet.

    table=1(switch_out_acl), priority= 2002, match=(ct.new && (outport == “sw0-port1” && ip && icmp)), action=(ct_commit; next;)
    table=1(switch_out_acl), priority= 2002, match=(ct.new && (outport == “sw0-port1” && ip && tcp && tcp.dst == 22)), action=(ct_commit; next;)
    table=1(switch_out_acl), priority= 2001, match=(outport == “sw0-port1” && ip), action=(drop;)

When using ACLs, the default policy is to allow and track IP connections. Based on our above policy, IP traffic directed at sw0-port1 will never hit this flow at priority 1.

    table=1(switch_out_acl), priority=    1, match=(ip), action=(ct_commit; next;)
    table=1(switch_out_acl), priority=    0, match=(1), action=(next;)

Currently, ovs-sandbox’s fake datapath doesn’t support conntrack integration so looking at OpenFlow at this point won’t show the flows you’d expect. Let’s jump over to a real OpenStack environment that implements security groups using OVN ACLs to dig deeper.

Security Groups using OVN ACLs

The original OVS support in OpenStack could, and most likely will be updated to use conntrack integration to implement security groups.  In this example, we’re using Neutron integration with OVN, which just merged support for implementing security groups using OVN ACLs. This example uses a single node devstack environment as described in this document.

Let’s start with a security group that implements a policy similar to the example we started with in ovs-sandbox. OpenStack security groups drop all traffic by default. The default security group shown here has been set up to allow all outbound IP traffic and associated return traffic. It also allows inbound ICMP requests and SSH connections.

$ neutron security-group-list
+--------------------------------------+---------+-----------------------+
| id                                   | name    | security_group_rules  |
+--------------------------------------+---------+-----------------------+
| a5e41dd4-4b15-4e68-a81d-45466bda3949 | default | egress, IPv4          |
|                                      |         | egress, IPv6          |
|                                      |         | ingress, IPv4, 22/tcp |
|                                      |         | ingress, IPv4, icmp   |
+--------------------------------------+---------+-----------------------+

The OVN Neutron driver translates this to the following OVN ACLs:

$ ovn-nbctl acl-list neutron-a920d5ef-eca8-4c4f-9c24-55e29e1c03d6
from-lport  1002 (inport == "a4a81c09-4e93-41e2-be83-cfe1f8b39f77" && ip4) allow-related
from-lport  1002 (inport == "a4a81c09-4e93-41e2-be83-cfe1f8b39f77" && ip6) allow-related
from-lport  1001 (inport == "a4a81c09-4e93-41e2-be83-cfe1f8b39f77" && ip) drop
  to-lport  1002 (outport == "a4a81c09-4e93-41e2-be83-cfe1f8b39f77" && ip4 && icmp4) allow-related
  to-lport  1002 (outport == "a4a81c09-4e93-41e2-be83-cfe1f8b39f77" && ip4 && tcp && tcp.dst >= 22 && tcp.dst <= 22) allow-related
  to-lport  1001 (outport == "a4a81c09-4e93-41e2-be83-cfe1f8b39f77" && ip) drop

In the ovs-sandbox example, we looked at the egress logical flows. Let’s do that again to see the ACL stages which correspond to the to-lport direction ACLs.

$ ovn-sbctl lflow-list
...
  table=0(switch_out_pre_acl), priority=  100, match=(ip), action=(ct_next;)
  table=0(switch_out_pre_acl), priority=    0, match=(1), action=(next;)
...

We send all IP traffic through the connection tracker to initialize the ct state fields.

...
  table=1(switch_out_acl), priority=65534, match=(!ct.est && ct.rel && !ct.new && !ct.inv), action=(next;)
  table=1(switch_out_acl), priority=65534, match=(ct.est && !ct.rel && !ct.new && !ct.inv), action=(next;)

Traffic associated with existing connections is let through.

  table=1(switch_out_acl), priority=65534, match=(ct.inv), action=(drop;)

Invalid traffic is dropped.

  table=1(switch_out_acl), priority= 2002, match=(ct.new && (outport == "a4a81c09-4e93-41e2-be83-cfe1f8b39f77" && ip4 && icmp4)), action=(ct_commit; next;)
  table=1(switch_out_acl), priority= 2002, match=(ct.new && (outport == "a4a81c09-4e93-41e2-be83-cfe1f8b39f77" && ip4 && tcp && tcp.dst >= 22 && tcp.dst <= 22)), action=(ct_commit; next;)

These logical flows correspond to our ACLs. If the packet represents a new connection and that connection is IPv4 ICMP or SSH, we store info about the connection for later and allow it through.

  table=1(switch_out_acl), priority= 2001, match=(outport == "a4a81c09-4e93-41e2-be83-cfe1f8b39f77" && ip), action=(drop;)

This is our flow to drop traffic directed at our VM by default if it didn’t match one of the rules above for ICMP or SSH.

  table=1(switch_out_acl), priority=    1, match=(ip), action=(ct_commit; next;)
  table=1(switch_out_acl), priority=    0, match=(1), action=(next;)
...

Otherwise, OVN defaults to allowing traffic through.

All of that is logical flows. Now let’s look at how this is implemented in OpenFlow. The OpenFlow flows associated with ACLs in the egress logical flows are in OpenFlow tables 48 and 49.

$ sudo ovs-ofctl -O OpenFlow13 dump-flows br-int | cut -f4- -d' '
...
table=48, n_packets=22, n_bytes=2000, priority=100,ip,metadata=0x1 actions=ct(table=49,zone=NXM_NX_REG5[0..15])
table=48, n_packets=0, n_bytes=0, priority=100,ipv6,metadata=0x1 actions=ct(table=49,zone=NXM_NX_REG5[0..15])
table=48, n_packets=31490, n_bytes=3460940, priority=0,metadata=0x1 actions=resubmit(,49)
...
table=49, n_packets=0, n_bytes=0, priority=65534,ct_state=-new-est+rel-inv+trk,metadata=0x1 actions=resubmit(,50)
table=49, n_packets=14, n_bytes=1294, priority=65534,ct_state=-new+est-rel-inv+trk,metadata=0x1 actions=resubmit(,50)
table=49, n_packets=0, n_bytes=0, priority=65534,ct_state=+inv+trk,metadata=0x1 actions=drop
table=49, n_packets=0, n_bytes=0, priority=2002,ct_state=+new+trk,tcp,reg7=0x4,metadata=0x1,tp_dst=22 actions=ct(commit,zone=NXM_NX_REG5[0..15]),resubmit(,50)
table=49, n_packets=1, n_bytes=98, priority=2002,ct_state=+new+trk,icmp,reg7=0x4,metadata=0x1 actions=ct(commit,zone=NXM_NX_REG5[0..15]),resubmit(,50)
table=49, n_packets=0, n_bytes=0, priority=2001,ip,reg7=0x4,metadata=0x1 actions=drop
table=49, n_packets=0, n_bytes=0, priority=2001,ipv6,reg7=0x4,metadata=0x1 actions=drop
table=49, n_packets=7, n_bytes=608, priority=1,ip,metadata=0x1 actions=ct(commit,zone=NXM_NX_REG5[0..15]),resubmit(,50)
table=49, n_packets=0, n_bytes=0, priority=1,ipv6,metadata=0x1 actions=ct(commit,zone=NXM_NX_REG5[0..15]),resubmit(,50)
table=49, n_packets=31490, n_bytes=3460940, priority=0,metadata=0x1 actions=resubmit(,50)

This showed a pretty simple security group. Let’s make the security group a bit more complicated, add a couple more VMs, and then see what the ACLs look like. Imagine we have some sort of web app running on these three VMs. We want to allow TCP ports 80 and 443 from the outside to these VMs. Imagine also that these apps present an internal only API for the VMs to talk to each other on port 8080. So, we want any VM using this security group to be able to access other VMs on this security group on port 8080, but no access from outside. While we’re at it, we want everything to work on both IPv4 and IPv6. Here’s what the resulting security group looks like.

$ neutron security-group-list
+--------------------------------------+---------+--------------------------------------------------------------------------------+
| id                                   | name    | security_group_rules                                                           |
+--------------------------------------+---------+--------------------------------------------------------------------------------+
| a5e41dd4-4b15-4e68-a81d-45466bda3949 | default | egress, IPv4                                                                   |
|                                      |         | egress, IPv6                                                                   |
|                                      |         | ingress, IPv4, 22/tcp                                                          |
|                                      |         | ingress, IPv4, 443/tcp                                                         |
|                                      |         | ingress, IPv4, 80/tcp                                                          |
|                                      |         | ingress, IPv4, 8080/tcp, remote_group_id: a5e41dd4-4b15-4e68-a81d-45466bda3949 |
|                                      |         | ingress, IPv4, icmp                                                            |
|                                      |         | ingress, IPv6, 22/tcp                                                          |
|                                      |         | ingress, IPv6, 443/tcp                                                         |
|                                      |         | ingress, IPv6, 80/tcp                                                          |
|                                      |         | ingress, IPv6, 8080/tcp, remote_group_id: a5e41dd4-4b15-4e68-a81d-45466bda3949 |
|                                      |         | ingress, IPv6, icmp                                                            |
+--------------------------------------+---------+--------------------------------------------------------------------------------+

Now, after booting a couple more VMs (for a total of 3), Neutron’s OVN plugin has created the following ACLs. All of these will get automatically translated into logical flows, and then translated into OpenFlow flows by the local ovn-controller on each hypervisor as appropriate.

$ ovn-nbctl acl-list neutron-a920d5ef-eca8-4c4f-9c24-55e29e1c03d6
from-lport  1002 (inport == "62848020-ba3b-445c-a8a9-c13094648b34" && ip4) allow-related
from-lport  1002 (inport == "62848020-ba3b-445c-a8a9-c13094648b34" && ip6) allow-related
from-lport  1002 (inport == "6353ad55-f6e7-4bc5-9e5d-55e975b6736e" && ip4) allow-related
from-lport  1002 (inport == "6353ad55-f6e7-4bc5-9e5d-55e975b6736e" && ip6) allow-related
from-lport  1002 (inport == "a4a81c09-4e93-41e2-be83-cfe1f8b39f77" && ip4) allow-related
from-lport  1002 (inport == "a4a81c09-4e93-41e2-be83-cfe1f8b39f77" && ip6) allow-related
from-lport  1001 (inport == "62848020-ba3b-445c-a8a9-c13094648b34" && ip) drop
from-lport  1001 (inport == "6353ad55-f6e7-4bc5-9e5d-55e975b6736e" && ip) drop
from-lport  1001 (inport == "a4a81c09-4e93-41e2-be83-cfe1f8b39f77" && ip) drop
  to-lport  1002 (outport == "62848020-ba3b-445c-a8a9-c13094648b34" && ip4 && icmp4) allow-related
  to-lport  1002 (outport == "62848020-ba3b-445c-a8a9-c13094648b34" && ip4 && inport == {"6353ad55-f6e7-4bc5-9e5d-55e975b6736e","a4a81c09-4e93-41e2-be83-cfe1f8b39f77"} && tcp && tcp.dst >= 8080 && tcp.dst <= 8080) allow-related
  to-lport  1002 (outport == "62848020-ba3b-445c-a8a9-c13094648b34" && ip4 && tcp && tcp.dst >= 22 && tcp.dst <= 22) allow-related
  to-lport  1002 (outport == "62848020-ba3b-445c-a8a9-c13094648b34" && ip4 && tcp && tcp.dst >= 443 && tcp.dst <= 443) allow-related
  to-lport  1002 (outport == "62848020-ba3b-445c-a8a9-c13094648b34" && ip4 && tcp && tcp.dst >= 80 && tcp.dst <= 80) allow-related
  to-lport  1002 (outport == "62848020-ba3b-445c-a8a9-c13094648b34" && ip6 && icmp6) allow-related
  to-lport  1002 (outport == "62848020-ba3b-445c-a8a9-c13094648b34" && ip6 && inport == {"6353ad55-f6e7-4bc5-9e5d-55e975b6736e","a4a81c09-4e93-41e2-be83-cfe1f8b39f77"} && tcp && tcp.dst >= 8080 && tcp.dst <= 8080) allow-related
  to-lport  1002 (outport == "62848020-ba3b-445c-a8a9-c13094648b34" && ip6 && tcp && tcp.dst >= 22 && tcp.dst <= 22) allow-related
  to-lport  1002 (outport == "62848020-ba3b-445c-a8a9-c13094648b34" && ip6 && tcp && tcp.dst >= 443 && tcp.dst <= 443) allow-related
  to-lport  1002 (outport == "62848020-ba3b-445c-a8a9-c13094648b34" && ip6 && tcp && tcp.dst >= 80 && tcp.dst <= 80) allow-related
  to-lport  1002 (outport == "6353ad55-f6e7-4bc5-9e5d-55e975b6736e" && ip4 && icmp4) allow-related
  to-lport  1002 (outport == "6353ad55-f6e7-4bc5-9e5d-55e975b6736e" && ip4 && inport == {"62848020-ba3b-445c-a8a9-c13094648b34","a4a81c09-4e93-41e2-be83-cfe1f8b39f77"} && tcp && tcp.dst >= 8080 && tcp.dst <= 8080) allow-related
  to-lport  1002 (outport == "6353ad55-f6e7-4bc5-9e5d-55e975b6736e" && ip4 && tcp && tcp.dst >= 22 && tcp.dst <= 22) allow-related
  to-lport  1002 (outport == "6353ad55-f6e7-4bc5-9e5d-55e975b6736e" && ip4 && tcp && tcp.dst >= 443 && tcp.dst <= 443) allow-related
  to-lport  1002 (outport == "6353ad55-f6e7-4bc5-9e5d-55e975b6736e" && ip4 && tcp && tcp.dst >= 80 && tcp.dst <= 80) allow-related
  to-lport  1002 (outport == "6353ad55-f6e7-4bc5-9e5d-55e975b6736e" && ip6 && icmp6) allow-related
  to-lport  1002 (outport == "6353ad55-f6e7-4bc5-9e5d-55e975b6736e" && ip6 && inport == {"62848020-ba3b-445c-a8a9-c13094648b34","a4a81c09-4e93-41e2-be83-cfe1f8b39f77"} && tcp && tcp.dst >= 8080 && tcp.dst <= 8080) allow-related
  to-lport  1002 (outport == "6353ad55-f6e7-4bc5-9e5d-55e975b6736e" && ip6 && tcp && tcp.dst >= 22 && tcp.dst <= 22) allow-related
  to-lport  1002 (outport == "6353ad55-f6e7-4bc5-9e5d-55e975b6736e" && ip6 && tcp && tcp.dst >= 443 && tcp.dst <= 443) allow-related
  to-lport  1002 (outport == "6353ad55-f6e7-4bc5-9e5d-55e975b6736e" && ip6 && tcp && tcp.dst >= 80 && tcp.dst <= 80) allow-related
  to-lport  1002 (outport == "a4a81c09-4e93-41e2-be83-cfe1f8b39f77" && ip4 && icmp4) allow-related
  to-lport  1002 (outport == "a4a81c09-4e93-41e2-be83-cfe1f8b39f77" && ip4 && inport == {"62848020-ba3b-445c-a8a9-c13094648b34","6353ad55-f6e7-4bc5-9e5d-55e975b6736e"} && tcp && tcp.dst >= 8080 && tcp.dst <= 8080) allow-related
  to-lport  1002 (outport == "a4a81c09-4e93-41e2-be83-cfe1f8b39f77" && ip4 && tcp && tcp.dst >= 22 && tcp.dst <= 22) allow-related
  to-lport  1002 (outport == "a4a81c09-4e93-41e2-be83-cfe1f8b39f77" && ip4 && tcp && tcp.dst >= 443 && tcp.dst <= 443) allow-related
  to-lport  1002 (outport == "a4a81c09-4e93-41e2-be83-cfe1f8b39f77" && ip4 && tcp && tcp.dst >= 80 && tcp.dst <= 80) allow-related
  to-lport  1002 (outport == "a4a81c09-4e93-41e2-be83-cfe1f8b39f77" && ip6 && icmp6) allow-related
  to-lport  1002 (outport == "a4a81c09-4e93-41e2-be83-cfe1f8b39f77" && ip6 && inport == {"62848020-ba3b-445c-a8a9-c13094648b34","6353ad55-f6e7-4bc5-9e5d-55e975b6736e"} && tcp && tcp.dst >= 8080 && tcp.dst <= 8080) allow-related
  to-lport  1002 (outport == "a4a81c09-4e93-41e2-be83-cfe1f8b39f77" && ip6 && tcp && tcp.dst >= 22 && tcp.dst <= 22) allow-related
  to-lport  1002 (outport == "a4a81c09-4e93-41e2-be83-cfe1f8b39f77" && ip6 && tcp && tcp.dst >= 443 && tcp.dst <= 443) allow-related
  to-lport  1002 (outport == "a4a81c09-4e93-41e2-be83-cfe1f8b39f77" && ip6 && tcp && tcp.dst >= 80 && tcp.dst <= 80) allow-related
  to-lport  1001 (outport == "62848020-ba3b-445c-a8a9-c13094648b34" && ip) drop
  to-lport  1001 (outport == "6353ad55-f6e7-4bc5-9e5d-55e975b6736e" && ip) drop
  to-lport  1001 (outport == "a4a81c09-4e93-41e2-be83-cfe1f8b39f77" && ip) drop

Possible Future Work

The biggest issue we have with this is just how new it is. It requires compiling and loading a custom version of the openvswitch kernel module from a custom branch of ovs. All of that is handled automatically by our devstack plugin, but it’s not exactly what you’d want for production usage. As the kernel backport is finalized, we expect it to be backported into distro kernels as well, which will make this much more consumable. It will certainly be backported for RHEL 7 and its derivatives.

I’m looking forward to seeing what other features get implemented using OVS+conntrack, both for OVN and beyond!

Bridging Asterisk RTP streams with OVS

I’m at the AstriCon conference this week, which is a conference built around the Asterisk open source project.  I worked on the Asterisk project for about 7 years before joining Red Hat to hack on cloud infrastructure.  I also helped write a book about it.  While I’m not working on Asterisk directly anymore, I still find it a very interesting project.  The community is full of great people.  Another reason I still pay attention is that communications infrastructure in general is an incredibly important use case for cloud infrastructure. The telco world is going through a rapid transformation with SDN and NFV.

I did a keynote at AstriCon last year about open cloud infrastructure and its importance to Asterisk and communications infrastructure more broadly.  This year I did a talk more focused on networking and how some of the SDN trends apply to this project.  One of the things this conference has started doing is have a session called “dangerous demos”.  The idea is for people to come up on stage and attempt a short (3-5 minute) live demo.  They give awards for various categories, including the most amusing case of a demo crashing and burning, as is often the case with live demos, especially using conference wifi.  Sounds fun, doesn’t it?  I thought so.

Last Friday I set off to see what kind of demo I could whip up in an afternoon.  Here’s what I came up with.

Asterisk Call Bridging

Before getting to the demo, it’s important to have some background on how Asterisk and some related technologies work.  Asterisk supports many different communications technologies.  It supports many different methods of traditional telephone network (PSTN) connectivity.  It also supports several Voice over IP (VoIP) protocols.  Any connection to the system via any of these technologies is represented as an Asterisk channel.

[A Single Call Leg, Represented by a Single Channel]

In some cases, there is only one channel.  This is when Asterisk itself is the endpoint of the call.  Some traditional examples would be something like voicemail or a system that implements an IVR such as an automated system to make payments on account.

It’s also common to have two channels bridged together.  Imagine two phones on a call talking to each other.

[Two Call Legs Represented by Two Channels]

Architecturally, there are some layers involved here.  There is channel technology abstraction so that two channels using different technologies can still be bridged together.

[Channel Technology and Abstract Channel Layers]

This is an incredibly powerful part of Asterisk’s architecture.  It lets you bridge new technologies like WebRTC to traditional telephony protocols.  However, bridging media streams through the abstract channel layer is not the most efficient way to do it if the two channels bridged together are actually the same technology.  So, Asterisk also has a concept of “native bridging”.  This lets channel technology implementations implement more efficient ways of bridging.

SIP is the most commonly used VoIP protocol.  SIP is actually just a signaling (control) protocol.  The actual media streams are independent streams using the RTP protocol.  In some cases, the media streams can be sent directly between endpoints, but not always.  Asterisk sometimes has to transcode the media streams between two different codecs.  Another common case is that the streams may be fully compatible, but the system is used to put all streams through a controlled point (or set of points) at the edge of a company’s network. This use case is sometimes referred to as a Session Border Controller (SBC).

An RTP stream is a good example of a painful scenario for packet processing performance.  It’s a stream of small packets.  A typical RTP stream would be 50 UDP packets per second in each direction.  Each packet would hold 20 milliseconds of audio.  This can be different.  You can increase packet sizes, but it comes at the cost of increasing latency into the call. 20 ms of audio using G.711 is 160 bytes of audio payload. There are several other codecs that may increase or decrease the audio payload. For example, 20 ms using G.729 would be only 20 bytes of audio payload. Every packet also includes ethernet, IP, UDP, and RTP headers.

When two of these RTP streams are bridged in Asterisk, there is a thread handling the call that’s polling on two UDP sockets.  When a packet comes in on one socket, it’s processed if necessary and then written out to the other socket.

You can find a somewhat dated chapter that I wrote several years ago about Asterisk in the book “Architecture of Open Source Applications”. I re-used some of the diagrams from that chapter for this post.

The Demo

This demo is targeted at the case of Asterisk bridging two RTP streams that are fully compatible (same codec, same payload sizes, among other things).  During my talk about “SDN and Asterisk” yesterday, I talked about several things. One thing I talked about is how the Linux networking datapath is becoming more programmable and I talked about Open vSwitch (OVS) as a specific example of that.

My demo consists of two VMs on my laptop (asterisk1 and asterisk2).  They both have a single vCPU and 1 GB of RAM.

asterisk1 serves as both endpoints of calls passing through asterisk2, so asterisk2 is doing bridging of compatible RTP streams.  Both ends of the call on asterisk1 are executing the Milliwatt() application, which just generates a tone. Each call looks like this:

call-topology

I also customized the networking configuration on asterisk2. Instead of just having eth0, I have an OVS bridge named breth0 and eth0 is attached to that bridge.

[rbryant@asterisk2 ~]$ sudo ovs-vsctl show
e00ae5a3-5f81-476e-b40c-ff0c03817dea
    Bridge "breth0"
        fail_mode: standalone
        Port "eth0"
            Interface "eth0"
        Port "breth0"
            Interface "breth0"
                type: internal
    ovs_version: "2.4.0"

[rbryant@asterisk2 ~]$ ip addr list breth0
4: breth0@NONE: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default 
    link/ether 52:54:00:31:cf:ce brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.149/24 brd 192.168.122.255 scope global dynamic breth0
       valid_lft 2258sec preferred_lft 2258sec
    inet6 fe80::5054:ff:fe31:cfce/64 scope link 
       valid_lft forever preferred_lft forever

With this setup in place, I generated 100 calls, which means both asterisk1 and asterisk2 have 200 active channels.  On asterisk1:

asterisk1*CLI> core show channels
Channel              Location             State   Application(Data)             
SIP/asterisk2-000000 555@public:2         Up      Milliwatt()                   
SIP/asterisk2-000000 555@public:2         Up      Milliwatt()                   
SIP/asterisk2-000000 555@public:2         Up      Milliwatt()                   
...
200 active channels
200 active calls
200 calls processed

and on asterisk2:

asterisk2*CLI> core show channels
Channel              Location             State   Application(Data)             
SIP/asterisk1-000000 (None)               Up      AppDial((Outgoing Line))      
SIP/asterisk1-000000 555@public:1         Up      Dial(SIP/555@asterisk1)       
SIP/asterisk1-000000 (None)               Up      AppDial((Outgoing Line))   
SIP/asterisk1-000000 555@public:1         Up      Dial(SIP/555@asterisk1)          
...
200 active channels
100 active calls
100 calls processed

Why 200 channels? It’s a nice round number. It also generates enough load on asterisk2 for the demo without making my laptop melt.

I mentioned earlier that in this case, Asterisk does the bridging of two RTP streams in a thread that’s polling on two UDP sockets, reading packets from one, doing any necessary processing, and then writing it back out to the other socket. In this scenario, Asterisk is using roughly 25% of the vCPU on asterisk2.

What if in the simple forwarding case we could push this forwarding down into the kernel?

To pull this off, first I needed to know about all of the RTP streams active on asterisk2. I actually need to know about pairs of RTP streams. When a packet arrives on one stream, I need to know what other stream it’s associated with for sending it back out. Asterisk honestly does not make it very easy to get this information. You can get it using the CHANNEL() function. I probably could have written an AMI script to get the info I needed. I’m not sure if I could have done it with ARI.  All of that sounded like too much work for my Friday afternoon hack.  The easiest way for me was to write a custom Asterisk C module that provided a CLI command to dump all of the info I wanted.  Here’s the relevant code minus all of the module and CLI command boilerplate code:

	struct ast_channel *chan;
	struct ast_channel_iterator *chan_iter;
	chan_iter = ast_channel_iterator_all_new();
	for (; (chan = ast_channel_iterator_next(chan_iter)); ast_channel_unref(chan)) {
		char src[1024] = "";
		char dest[1024] = "";
		char src2[1024] = "";
		char dest2[1024] = "";
		struct ast_channel *chan2;
		ast_func_read(chan, "CHANNEL(rtpsource)", src, sizeof(src));
		ast_func_read(chan, "CHANNEL(rtpdest)", dest, sizeof(dest));
		chan2 = ast_bridged_channel(chan);
		ast_func_read(chan2, "CHANNEL(rtpsource)", src2, sizeof(src2));
		ast_func_read(chan2, "CHANNEL(rtpdest)", dest2, sizeof(dest2));
		ast_cli(a->fd, "%s %s %s %s\n", src, dest, src2, dest2);
	}
	ast_channel_iterator_destroy(chan_iter);

This code is a terrible hack that you’d never use on anything but this controlled environment, but it got me the info I wanted quickly.  The output looks something like this:

asterisk2*CLI> rtpstreams 
0.0.0.0:12164 192.168.122.130:10322 0.0.0.0:18364 192.168.122.130:19818
0.0.0.0:10364 192.168.122.130:15394 0.0.0.0:10110 192.168.122.130:17640
0.0.0.0:10110 192.168.122.130:17640 0.0.0.0:10364 192.168.122.130:15394
...

Now that we have the info we need about RTP stream pairs, we want to program the OVS bridge to do the RTP forwarding for us. We do that using the OpenFlow protocol. In this case, we’ll use the ovs-ofctl command line utility to create and delete flows as needed.

I don’t intend to go into any great detail about OpenFlow or how OVS works, but I think a really high level overview of flows is needed to be able to understand what happens next. OpenFlow lets you define a multi-stage packet processing pipeline. Each stage is a table. Processing starts in table 0. Processing may continue in other tables based on what actions are executed. Each flow in a table has a priority. The flow that gets executed in a table is the one with the highest priority that matches the packet. If multiple flows at the same priority match, which one gets executed is undefined.

What we want are flows that match an incoming RTP stream. In this demo we create flows with the following match conditions: the packet arrived on eth0, it’s a UDP packet, and the UDP destination port number is N. When a packet matches one of our flows, we execute these actions: change the source and destination MAC addresses, change the source and destination IP addresses, change the source and destination UDP port numbers, and send the packet back out where it came from (eth0).

An example command to install a flow like this would be:

sudo ovs-ofctl -O OpenFlow13 add-flow breth0 priority=100,in_port=1,udp,udp_dst=10758,actions=mod_dl_src:52:54:00:31:cf:ce,mod_dl_dst:52:54:00:88:75:61,mod_nw_src:192.168.122.148,mod_nw_dst:192.168.122.130,mod_tp_src:14508,mod_tp_dst:10060,in_port

Of course, typing up 200 of those would be pretty tiring, so I just scripted it. Here is a simple Python script to generate all of the flows we need:

#!/usr/bin/env python

import os
import subprocess

asterisk1_mac = '52:54:00:88:75:61'
asterisk2_mac = '52:54:00:31:cf:ce'
asterisk1_ip = '192.168.122.130'
asterisk2_ip = '192.168.122.148'

output = subprocess.check_output(['sudo', 'asterisk', '-rx', 'rtpstreams'])
pairs = []
for l in output.splitlines():
    parts = l.split()
    if parts[0] == 'Setting':
        continue
    try:
        pair = ((parts[0].split(':')[1], parts[1].split(':')[1]),
                (parts[2].split(':')[1], parts[3].split(':')[1]))
    except:
        print "Failed to parse parts: %s" % parts
    reverse_pair = (pair[1], pair[0])
    if reverse_pair not in pairs:
        pairs.append(pair)

for p in pairs:
    os.system('sudo ovs-ofctl -O OpenFlow13 add-flow breth0 '
            'priority=100,in_port=1,udp,'
            'udp_dst=%s,actions=mod_dl_src:%s,mod_dl_dst:%s,'
            'mod_nw_src:%s,mod_nw_dst:%s,'
            'mod_tp_src:%s,mod_tp_dst:%s,in_port'
            % (p[0][0],
               asterisk2_mac, asterisk1_mac,
               asterisk2_ip, asterisk1_ip,
               p[1][0], p[1][1]))
    os.system('sudo ovs-ofctl -O OpenFlow13 add-flow breth0 '
            'priority=100,in_port=1,udp,'
            'udp_dst=%s,actions=mod_dl_src:%s,mod_dl_dst:%s,'
            'mod_nw_src:%s,mod_nw_dst:%s,'
            'mod_tp_src:%s,mod_tp_dst:%s,in_port'
            % (p[1][0],
               asterisk2_mac, asterisk1_mac,
               asterisk2_ip, asterisk1_ip,
               p[0][0], p[0][1]))

After running the above script, we can view the flows on breth0 using the following command:

[rbryant@asterisk2 ~]$ sudo ovs-ofctl -O OpenFlow13 dump-flows breth0 | grep table | cut -f4- -d' '
table=0, n_packets=591, n_bytes=126474, priority=100,udp,in_port=1,tp_dst=12164 actions=set_field:52:54:00:31:cf:ce->eth_src,set_field:52:54:00:88:75:61->eth_dst,set_field:192.168.122.148->ip_src,set_field:192.168.122.130->ip_dst,set_field:18364->udp_src,set_field:19818->udp_dst,IN_PORT
table=0, n_packets=588, n_bytes=125832, priority=100,udp,in_port=1,tp_dst=18364 actions=set_field:52:54:00:31:cf:ce->eth_src,set_field:52:54:00:88:75:61->eth_dst,set_field:192.168.122.148->ip_src,set_field:192.168.122.130->ip_dst,set_field:12164->udp_src,set_field:10322->udp_dst,IN_PORT
table=0, n_packets=588, n_bytes=125832, priority=100,udp,in_port=1,tp_dst=10364 actions=set_field:52:54:00:31:cf:ce->eth_src,set_field:52:54:00:88:75:61->eth_dst,set_field:192.168.122.148->ip_src,set_field:192.168.122.130->ip_dst,set_field:10110->udp_src,set_field:17640->udp_dst,IN_PORT
...

We can see in the n_packets field of each flow that packets are matching all of our flows for forwarding RTP streams.

Here’s what’s really cool about this. After these flows are configured, Asterisk takes up less than 1% of the vCPU and the vCPU is 96-97% idle.

If we want to clear all of these flows and let RTP go back through Asterisk in userspace, we can run this script:

#!/bin/bash

for n in $(sudo ovs-ofctl -O OpenFlow13 dump-flows breth0 | grep "priority=100" | cut -f7 -d' ') ; do
    sudo ovs-ofctl -O OpenFlow13 del-flows --strict breth0 $n
done

At this point, the CPU usage jumps back up to where it was before.

Future Work

This was just the result of an afternoon hack.  My primary goal was just to spur some interest in exploring how cool things happening in the SDN space could provide new ways of doing things.

If someone wanted to explore doing this in Asterisk more seriously, you could write some code in Asterisk that could speak OpenFlow to the local OVS bridge to create and delete flows as needed.  You could also imagine the possibility of speaking OpenFlow to a top-of-rack switch to push the forwarding out of the host completely, yet still through a controlled point in your network.

Another major caveat in this demo is that OVS and OpenFlow don’t know what RTP is. There’s no way (that I know of) to do any sort of validation on the packets before forwarding them along.  If one end started sending garbage, this setup would happily forward it along.  It’s up to you how much that matters.  RTP devices are supposed to be built for the possibility of media streaming directly between endpoints, and in that case, there’s nothing in the middle doing any checking of things.

If you were at AstriCon, thank you for coming to my talk and/or demo.  To everyone, I hope you found this interesting and that it inspires you to go off and learn more about this cool technology!

An EZ Bake OVN for OpenStack

When Ben Pfaff pushed the last of the changes needed to make OVN functional to the ovn branch, he dubbed it the “EZ Bake milestone”.  The analogy is both humorous and somewhat accurate.  We’ve reached the first functional milestone, which is quite exciting.

ovn-m0In previous posts I have gone through and shown components of the system as it has been built.  Now that it’s functional, I will go through a working demonstration of OpenStack using OVN.

DevStack

For this test environment we’ll stand up two hosts using DevStack.  Both hosts will be VMs running Fedora 21 that have 2 vCPUs and 4 GB of RAM.  We will refer to them as ovn-devstack-1 and ovn-devstack-2.

Each VM needs to have git installed and a user created that has sudo access.  This user will be used run DevStack.

Setting up ovn-devstack-1

The first DevStack host will look like a typical single node DevStack install that runs all of OpenStack.  It will be using OVN to provide L2 network connectivity instead of the default OVS ML2 driver and the neutron OVS agent.  It will still make use of the L3 and DHCP agents from Neutron as the equivalent functionality has not yet been implemented in OVN.

Start by cloning DevStack and networking-ovn:

(ovn-devstack-1)$ git clone http://git.openstack.org/openstack-dev/devstack.git
(ovn-devstack-1)$ git clone http://git.openstack.org/openstack/networking-ovn.git

networking-ovn comes with some sample configuration files for DevStack.  We can use the main sample for this host without any modifications needed.

(ovn-devstack-1)$ cd devstack
(ovn-devstack-1)$ cp ../networking-ovn/devstack/local.conf.sample local.conf

After the DevStack configuration is in place, run DevStack to set up the environment.

(ovn-devstack-1)$ ./stack.sh

This takes several minutes to complete.  Once it has completed successfully, you should see some output that looks like this:

This is your host ip: 172.16.189.6
Horizon is now available at http://172.16.189.6/
Keystone is serving at http://172.16.189.6:5000/
The default users are: admin and demo
The password: password
2015-05-13 18:59:48.169 | stack.sh completed in 989 seconds.

Setting up ovn-devstack-2

The second DevStack host runs a minimal set of services needed to add an additional compute node (or hypervisor) to the existing DevStack environment.  It needs to run the OpenStack nova-compute service for managing local VMs and ovn-controller to manage the local ovs configuration.

Setting up the second DevStack host is a very similar process.  Start by cloning DevStack and networking-ovn.

(ovn-devstack-2)$ git clone http://git.openstack.org/openstack-dev/devstack.git
(ovn-devstack-2)$ git clone http://git.openstack.org/openstack/networking-ovn.git

networking-ovn provides an additional sample configuration file for DevStack that is intended to be used for adding additional compute nodes to an existing DevStack environment.  You must set the SERVICE_HOST configuration variable in this file to be the IP address of the main DevStack host.

(ovn-devstack-2)$ cd devstack
(ovn-devstack-2)$ cp ../networking-ovn/devstack/computenode-local.conf.sample local.conf
(ovn-devstack-2)$ vim local.conf
... edit to set SERVICE_HOST=172.16.189.6 in this example ...

Once the DevStack configuration is ready, you can run DevStack to set up the new compute node.  It should take less time to complete than the first DevStack host.

(ovn-devstack-2)$ ./stack.sh

Once it completes, you should see output that looks like this:

This is your host ip: 172.16.189.10
2015-05-13 19:02:30.663 | stack.sh completed in 98 seconds.

The Default Environment

DevStack is now running on two hosts.  Let’s take a look at the default state of this environment before we start creating VMs.  We’ll run various OpenStack command line tools to interact with the OpenStack APIs.  By default, these tools get credentials from environment variables.  DevStack comes with a file called openrc that makes it easy to switch between admin (the cloud administrator) and demo (a regular cloud user) credentials.

We can start by making sure that Nova sees two hypervisors.  This API requires admin credentials.

(ovn-devstack-1)$ cd devstack
(ovn-devstack-1)$ . openrc admin
(ovn-devstack-1)$ nova hypervisor-list
+----+------------------------------------+-------+---------+
| ID | Hypervisor hostname                | State | Status  |
+----+------------------------------------+-------+---------+
| 1  | ovn-devstack-1.os1.phx2.redhat.com | up    | enabled |
| 2  | ovn-devstack-2.os1.phx2.redhat.com | up    | enabled |
+----+------------------------------------+-------+---------+

DevStack also has a default network configuration.  We can use the neutron command line tool to list the default networks.

(ovn-devstack-1)$ . openrc admin
(ovn-devstack-1)$ neutron net-list
+--------------------------------------+---------+----------------------------------------------------------+
| id                                   | name    | subnets                                                  |
+--------------------------------------+---------+----------------------------------------------------------+
| 7e78ba86-2114-47ac-8194-201936e3820a | public  | ebfe46b4-e0ab-4cda-b2ee-5bb1761b5995 172.24.4.0/24       |
|                                      |         | b435f473-bed1-41bf-9110-797424364016 2001:db8::/64       |
| cfc9ff50-4435-4b29-bf2e-c27dd6cf5a5e | private | 74056863-9d45-452a-a431-344a33cf517b fdc1:1919:4bd6::/64 |
|                                      |         | d5ad74d7-7bd9-4646-add2-e816cfee1ec3 10.0.0.0/24         |
+--------------------------------------+---------+----------------------------------------------------------+

The Horizon web interface also provides a visual representation of the network topology:

default-topology

The default environment also creates four Neutron ports.  Three are related to the router and can be seen in the diagram above.  The fourth (not shown) is for the DHCP agent providing DHCP services to the private network.

$ neutron port-list
+--------------------------------------+------+-------------------+-------------------------------------------------------------------------------------------------------------+
| id                                   | name | mac_address       | fixed_ips                                                                                                   |
+--------------------------------------+------+-------------------+-------------------------------------------------------------------------------------------------------------+
| 381a2d96-bc4a-4785-82bc-4f2b48e007e8 |      | fa:16:3e:76:12:96 | {"subnet_id": "d5ad74d7-7bd9-4646-add2-e816cfee1ec3", "ip_address": "10.0.0.2"}                             |
|                                      |      |                   | {"subnet_id": "74056863-9d45-452a-a431-344a33cf517b", "ip_address": "fdc1:1919:4bd6:0:f816:3eff:fe76:1296"} |
| a5b967d4-296e-44dc-98b9-7336d0224e57 |      | fa:16:3e:8c:d0:a8 | {"subnet_id": "ebfe46b4-e0ab-4cda-b2ee-5bb1761b5995", "ip_address": "172.24.4.2"}                           |
|                                      |      |                   | {"subnet_id": "b435f473-bed1-41bf-9110-797424364016", "ip_address": "2001:db8::1"}                          |
| b2e0ae9e-d472-42ed-8776-6b338349d01d |      | fa:16:3e:b7:cd:77 | {"subnet_id": "74056863-9d45-452a-a431-344a33cf517b", "ip_address": "fdc1:1919:4bd6::1"}                    |
| f24756f3-d803-47d3-9fc7-1315f4071ac0 |      | fa:16:3e:b1:34:ed | {"subnet_id": "d5ad74d7-7bd9-4646-add2-e816cfee1ec3", "ip_address": "10.0.0.1"}                             |
+--------------------------------------+------+-------------------+-------------------------------------------------------------------------------------------------------------+

These default networks and ports can also be seen in OVN. OVN has a northbound database (OVN_Northbound) that serves as the public interface to OVN.  The Neutron driver updates this database to indicate the desired state.  OVN comes with a command line utility, ovn-nbctl, which can be used to view or update the OVN_Northbound database.  The show command gives a summary of the current configuration.

(ovn-devstack-1)$ ovn-nbctl show
    lswitch f8e8c67c-ce4a-4f23-a01b-0eb31b4ab3e2 (neutron-7e78ba86-2114-47ac-8194-201936e3820a)
        lport a5b967d4-296e-44dc-98b9-7336d0224e57
            macs: fa:16:3e:8c:d0:a8
    lswitch bd7dbbf9-1325-491f-b46b-80b4ecfc560b (neutron-cfc9ff50-4435-4b29-bf2e-c27dd6cf5a5e)
        lport f24756f3-d803-47d3-9fc7-1315f4071ac0
            macs: fa:16:3e:b1:34:ed
        lport 381a2d96-bc4a-4785-82bc-4f2b48e007e8
            macs: fa:16:3e:76:12:96
        lport b2e0ae9e-d472-42ed-8776-6b338349d01d
            macs: fa:16:3e:b7:cd:77

Launching VMs

Now that the environment is ready, we can start launching VMs.  We will launch two VMs so that one will end up on each of our compute nodes.  We’ll verify that the data path is working and then inspect what OVN has done to make it work.

We want our VMs to have a single vNIC attached to the private Neutron network.

(ovn-devstack-1)$ . openrc demo
(ovn-devstack-1)$ neutron net-list
+--------------------------------------+---------+----------------------------------------------------------+
| id                                   | name    | subnets                                                  |
+--------------------------------------+---------+----------------------------------------------------------+
| 7e78ba86-2114-47ac-8194-201936e3820a | public  | ebfe46b4-e0ab-4cda-b2ee-5bb1761b5995                     |
|                                      |         | b435f473-bed1-41bf-9110-797424364016                     |
| cfc9ff50-4435-4b29-bf2e-c27dd6cf5a5e | private | 74056863-9d45-452a-a431-344a33cf517b fdc1:1919:4bd6::/64 |
|                                      |         | d5ad74d7-7bd9-4646-add2-e816cfee1ec3 10.0.0.0/24         |
+--------------------------------------+---------+----------------------------------------------------------+

(ovn-devstack-1)$ PRIVATE_NET_ID=cfc9ff50-4435-4b29-bf2e-c27dd6cf5a5e

DevStack automatically imports a very small test image, CirrOS, which suits our needs.

(ovn-devstack-1)$ glance image-list
+--------------------------------------+---------------------------------+-------------+------------------+----------+--------+
| ID                                   | Name                            | Disk Format | Container Format | Size     | Status |
+--------------------------------------+---------------------------------+-------------+------------------+----------+--------+
| 4d9443ee-1497-4bb3-b917-2e35b0e59eab | cirros-0.3.4-x86_64-uec         | ami         | ami              | 25165824 | active |
| ab38e2d2-8397-4ece-8aa3-9a3058e63029 | cirros-0.3.4-x86_64-uec-kernel  | aki         | aki              | 4979632  | active |
| 38721471-6a19-45f7-8c8d-fd64b7737fd7 | cirros-0.3.4-x86_64-uec-ramdisk | ari         | ari              | 3740163  | active |
+--------------------------------------+---------------------------------+-------------+------------------+----------+--------+

(ovn-devstack-1)$ IMAGE_ID=4d9443ee-1497-4bb3-b917-2e35b0e59eab

We’ll use the m1.nano flavor, as minimal resources are sufficient for our testing with these VMs.

(ovn-devstack-1)$ nova flavor-list
+----+-----------+-----------+------+-----------+------+-------+-------------+-----------+
| ID | Name      | Memory_MB | Disk | Ephemeral | Swap | VCPUs | RXTX_Factor | Is_Public |
+----+-----------+-----------+------+-----------+------+-------+-------------+-----------+
| 1  | m1.tiny   | 512       | 1    | 0         |      | 1     | 1.0         | True      |
| 2  | m1.small  | 2048      | 20   | 0         |      | 1     | 1.0         | True      |
| 3  | m1.medium | 4096      | 40   | 0         |      | 2     | 1.0         | True      |
| 4  | m1.large  | 8192      | 80   | 0         |      | 4     | 1.0         | True      |
| 42 | m1.nano   | 64        | 0    | 0         |      | 1     | 1.0         | True      |
| 5  | m1.xlarge | 16384     | 160  | 0         |      | 8     | 1.0         | True      |
| 84 | m1.micro  | 128       | 0    | 0         |      | 1     | 1.0         | True      |
+----+-----------+-----------+------+-----------+------+-------+-------------+-----------+

(ovn-devstack-1)$ FLAVOR_ID=42

We also need to create an SSH keypair for logging in to the VMs we create.

(ovn-devstack-1)$ nova keypair-add demo > id_rsa_demo
(ovn-devstack-1)$ chmod 600 id_rsa_demo

We now have everything needed to boot some VMs. We’ll create two of them, named test1 and test2.

(ovn-devstaqck-1)$ nova boot --nic net-id=$PRIVATE_NET_ID --image $IMAGE_ID --flavor $FLAVOR_ID --key-name demo test1
+--------------------------------------+----------------------------------------------------------------+
| Property                             | Value                                                          |
+--------------------------------------+----------------------------------------------------------------+
| OS-DCF:diskConfig                    | MANUAL                                                         |
| OS-EXT-AZ:availability_zone          | nova                                                           |
| OS-EXT-STS:power_state               | 0                                                              |
| OS-EXT-STS:task_state                | scheduling                                                     |
| OS-EXT-STS:vm_state                  | building                                                       |
| OS-SRV-USG:launched_at               | -                                                              |
| OS-SRV-USG:terminated_at             | -                                                              |
| accessIPv4                           |                                                                |
| accessIPv6                           |                                                                |
| adminPass                            | 9NMJrLeCDPJv                                                   |
| config_drive                         |                                                                |
| created                              | 2015-05-14T13:33:55Z                                           |
| flavor                               | m1.nano (42)                                                   |
| hostId                               |                                                                |
| id                                   | d91cf422-fe2e-4131-bc49-2f310daa5cf0                           |
| image                                | cirros-0.3.4-x86_64-uec (4d9443ee-1497-4bb3-b917-2e35b0e59eab) |
| key_name                             | demo                                                           |
| metadata                             | {}                                                             |
| name                                 | test1                                                          |
| os-extended-volumes:volumes_attached | []                                                             |
| progress                             | 0                                                              |
| security_groups                      | default                                                        |
| status                               | BUILD                                                          |
| tenant_id                            | 92fbf8554b2246c5bb9b0db0be55529c                               |
| updated                              | 2015-05-14T13:33:56Z                                           |
| user_id                              | 207b4e55a2684f20a2a21e14c28dffed                               |
+--------------------------------------+----------------------------------------------------------------+

(ovn-devstack-1)$ nova boot --nic net-id=$PRIVATE_NET_ID --image $IMAGE_ID --flavor $FLAVOR_ID --key-name demo test2
+--------------------------------------+----------------------------------------------------------------+
| Property                             | Value                                                          |
+--------------------------------------+----------------------------------------------------------------+
| OS-DCF:diskConfig                    | MANUAL                                                         |
| OS-EXT-AZ:availability_zone          | nova                                                           |
| OS-EXT-STS:power_state               | 0                                                              |
| OS-EXT-STS:task_state                | scheduling                                                     |
| OS-EXT-STS:vm_state                  | building                                                       |
| OS-SRV-USG:launched_at               | -                                                              |
| OS-SRV-USG:terminated_at             | -                                                              |
| accessIPv4                           |                                                                |
| accessIPv6                           |                                                                |
| adminPass                            | BgL89P2oKotD                                                   |
| config_drive                         |                                                                |
| created                              | 2015-05-14T13:34:47Z                                           |
| flavor                               | m1.nano (42)                                                   |
| hostId                               |                                                                |
| id                                   | 4da9dd8e-4583-4955-94b1-0f9eaf77663c                           |
| image                                | cirros-0.3.4-x86_64-uec (4d9443ee-1497-4bb3-b917-2e35b0e59eab) |
| key_name                             | demo                                                           |
| metadata                             | {}                                                             |
| name                                 | test2                                                          |
| os-extended-volumes:volumes_attached | []                                                             |
| progress                             | 0                                                              |
| security_groups                      | default                                                        |
| status                               | BUILD                                                          |
| tenant_id                            | 92fbf8554b2246c5bb9b0db0be55529c                               |
| updated                              | 2015-05-14T13:34:48Z                                           |
| user_id                              | 207b4e55a2684f20a2a21e14c28dffed                               |
+--------------------------------------+----------------------------------------------------------------+

We can use admin credentials to see which hypervisor each VM ended up on. This is just to show that we now have an environment with two VMs on the private Neutron virtual network that spans two hypervisors.

(ovn-devstack-1)$ . openrc admin
(ovn-devstack-1)$ nova show test1 | grep hypervisor_hostname
| OS-EXT-SRV-ATTR:hypervisor_hostname  | ovn-devstack-1.os1.phx2.redhat.com

(ovn-devstack-1)$ nova show test2 | grep hypervisor_hostname
| OS-EXT-SRV-ATTR:hypervisor_hostname  | ovn-devstack-2.os1.phx2.redhat.com

When we first issue the boot requests, the status of each VM was BUILD. Once the VM is running on the hypervisor, it will switch to the ACTIVE status.

(ovn-devstack-1)$ . openrc demo
(ovn-devstack-1)$ nova list --fields name,status,networks
+--------------------------------------+-------+--------+--------------------------------------------------------+
| ID                                   | Name  | Status | Networks                                               |
+--------------------------------------+-------+--------+--------------------------------------------------------+
| d91cf422-fe2e-4131-bc49-2f310daa5cf0 | test1 | ACTIVE | private=fdc1:1919:4bd6:0:f816:3eff:fe24:463a, 10.0.0.3 |
| 4da9dd8e-4583-4955-94b1-0f9eaf77663c | test2 | ACTIVE | private=fdc1:1919:4bd6:0:f816:3eff:fe50:191, 10.0.0.4  |
+--------------------------------------+-------+--------+--------------------------------------------------------+

Testing and Inspecting the Network

Our two new VMs has resulted in two more Neutron ports being created.  This is shown in Horizon’s visual representation of the network topology:

topology-2-vms

We can also get all of the details from the Neutron API:

(ovn-devstack-1)$ . openrc admin
(ovn-devstack-1)$ neutron port-list
+--------------------------------------+------+-------------------+-------------------------------------------------------------------------------------------------------------+
| id                                   | name | mac_address       | fixed_ips                                                                                                   |
+--------------------------------------+------+-------------------+-------------------------------------------------------------------------------------------------------------+
| 10964198-b218-417e-a59e-6a6d7096c936 |      | fa:16:3e:50:01:91 | {"subnet_id": "d5ad74d7-7bd9-4646-add2-e816cfee1ec3", "ip_address": "10.0.0.4"}                             |
|                                      |      |                   | {"subnet_id": "74056863-9d45-452a-a431-344a33cf517b", "ip_address": "fdc1:1919:4bd6:0:f816:3eff:fe50:191"}  |
| 381a2d96-bc4a-4785-82bc-4f2b48e007e8 |      | fa:16:3e:76:12:96 | {"subnet_id": "d5ad74d7-7bd9-4646-add2-e816cfee1ec3", "ip_address": "10.0.0.2"}                             |
|                                      |      |                   | {"subnet_id": "74056863-9d45-452a-a431-344a33cf517b", "ip_address": "fdc1:1919:4bd6:0:f816:3eff:fe76:1296"} |
| a5b967d4-296e-44dc-98b9-7336d0224e57 |      | fa:16:3e:8c:d0:a8 | {"subnet_id": "ebfe46b4-e0ab-4cda-b2ee-5bb1761b5995", "ip_address": "172.24.4.2"}                           |
|                                      |      |                   | {"subnet_id": "b435f473-bed1-41bf-9110-797424364016", "ip_address": "2001:db8::1"}                          |
| a7a8ee94-996e-4623-941c-1ef7b7862f6e |      | fa:16:3e:24:46:3a | {"subnet_id": "d5ad74d7-7bd9-4646-add2-e816cfee1ec3", "ip_address": "10.0.0.3"}                             |
|                                      |      |                   | {"subnet_id": "74056863-9d45-452a-a431-344a33cf517b", "ip_address": "fdc1:1919:4bd6:0:f816:3eff:fe24:463a"} |
| b2e0ae9e-d472-42ed-8776-6b338349d01d |      | fa:16:3e:b7:cd:77 | {"subnet_id": "74056863-9d45-452a-a431-344a33cf517b", "ip_address": "fdc1:1919:4bd6::1"}                    |
| f24756f3-d803-47d3-9fc7-1315f4071ac0 |      | fa:16:3e:b1:34:ed | {"subnet_id": "d5ad74d7-7bd9-4646-add2-e816cfee1ec3", "ip_address": "10.0.0.1"}                             |
+--------------------------------------+------+-------------------+-------------------------------------------------------------------------------------------------------------+

The Ping Test

Now let’s verify that the network seems to work as we expect.  In this environment we can connect to the private Network from ovn-devstack-1. We can start with a quick check that we can ping both VMs and also that we can ping from one VM to the other.

(ovn-devstack-1)$ ping -c 1 10.0.0.3
PING 10.0.0.3 (10.0.0.3) 56(84) bytes of data.
64 bytes from 10.0.0.3: icmp_seq=1 ttl=63 time=2.90 ms

(ovn-devstack-1)$ ping -c 1 10.0.0.4
PING 10.0.0.4 (10.0.0.4) 56(84) bytes of data.
64 bytes from 10.0.0.4: icmp_seq=1 ttl=63 time=3.87 ms

(ovn-devstack-1)$ ssh -i id_rsa_demo cirros@10.0.0.3
(test1)$ ping -c 1 10.0.0.4
PING 10.0.0.4 (10.0.0.4): 56 data bytes
64 bytes from 10.0.0.4: seq=0 ttl=64 time=2.945 ms

It works!

OVN Northbound Database

Now let’s take a closer look at what OVN has done to make this work. We looked at the OVN_Northbound database earlier. It now includes the two additional ports for the VMs in its configuration for the private virtual network.

(ovn-devstack-1)$ ovn-nbctl show
$ ovn-nbctl show
    lswitch f8e8c67c-ce4a-4f23-a01b-0eb31b4ab3e2 (neutron-7e78ba86-2114-47ac-8194-201936e3820a)
        lport a5b967d4-296e-44dc-98b9-7336d0224e57
            macs: fa:16:3e:8c:d0:a8
    lswitch bd7dbbf9-1325-491f-b46b-80b4ecfc560b (neutron-cfc9ff50-4435-4b29-bf2e-c27dd6cf5a5e)
        lport f24756f3-d803-47d3-9fc7-1315f4071ac0
            macs: fa:16:3e:b1:34:ed
        lport 10964198-b218-417e-a59e-6a6d7096c936
            macs: fa:16:3e:50:01:91
        lport 381a2d96-bc4a-4785-82bc-4f2b48e007e8
            macs: fa:16:3e:76:12:96
        lport b2e0ae9e-d472-42ed-8776-6b338349d01d
            macs: fa:16:3e:b7:cd:77
        lport a7a8ee94-996e-4623-941c-1ef7b7862f6e
            macs: fa:16:3e:24:46:3a

When we requested a new VM from Nova, Nova asked Neutron to create a new port on the network we specified. As the port was created, the Neutron OVN driver added this entry to the OVN_Northbound database. The northbound database is the desired state of the system. As it gets changed, the rest of OVN gets to work to implement the change.

OVN Chassis

OVN has a second database, OVN_Southbound, that is used internally to track the current state of the system. The Chassis table of OVN_Southbound is used to keep track of the different hypervisors running ovn-controller and how to connect to them. When ovn-controller starts, it registers itself in this table.

(ovn-devstack-1)$ ovsdb-client dump OVN_Southbound
...
Chassis table
_uuid                                encaps                                 gateway_ports name                                  
------------------------------------ -------------------------------------- ------------- --------------------------------------
4979df20-56a8-4c74-a499-d2409acb05cc [a59cbc44-b998-4a13-98b4-bc02c79d4d1e] {}            "2a33c976-54ec-4f62-878e-863eea3edcf5"
f58f3955-3dc4-4f79-8d4d-e0250a01a850 [d1e19aec-0e8a-4338-b1d7-eb83dfe197e8] {}            "b29ae352-588f-45bc-aefe-ba15bf2f889b"

Encap table
_uuid                                ip              options type  
------------------------------------ --------------- ------- ------
a59cbc44-b998-4a13-98b4-bc02c79d4d1e "172.16.189.10" {}      geneve
d1e19aec-0e8a-4338-b1d7-eb83dfe197e8 "172.16.189.6"  {}      geneve
...

OVN Bindings

As logical ports get added to OVN_Northbound, the ovn-northd service creates entries in the Binding table of OVN_Southbound. This table is used to keep track of which physical chassis a logical port resides on. At first, the chassis column is empty. Once ovn-controller sees a port plugged into the local br-int with an iface-id that matches a logical port, ovn-controller will update the chassis column of that logical port’s Binding row to reflect that the port resides on that chassis.

(ovn-devstack-1)$ ovsdb-client dump OVN_Southbound
...
Binding table
_uuid                                chassis                                logical_datapath                     logical_port                           mac                   parent_port tag tunnel_key
------------------------------------ -------------------------------------- ------------------------------------ -------------------------------------- --------------------- ----------- --- ----------
977a249e-3ec4-4d7c-a7bb-e751415ee4b1 ""                                     f8e8c67c-ce4a-4f23-a01b-0eb31b4ab3e2 "a5b967d4-296e-44dc-98b9-7336d0224e57" ["fa:16:3e:8c:d0:a8"] []          []  3         
2e213a46-e52c-4e46-ac48-2de9bc5c56a4 "2a33c976-54ec-4f62-878e-863eea3edcf5" bd7dbbf9-1325-491f-b46b-80b4ecfc560b "10964198-b218-417e-a59e-6a6d7096c936" ["fa:16:3e:50:01:91"] []          []  6         
4cd9430e-0735-4e63-b5b2-25a666649f4e "b29ae352-588f-45bc-aefe-ba15bf2f889b" bd7dbbf9-1325-491f-b46b-80b4ecfc560b "381a2d96-bc4a-4785-82bc-4f2b48e007e8" ["fa:16:3e:76:12:96"] []          []  1         
23fc5778-5a00-4e3f-b1e7-4c37c999a378 "b29ae352-588f-45bc-aefe-ba15bf2f889b" bd7dbbf9-1325-491f-b46b-80b4ecfc560b "a7a8ee94-996e-4623-941c-1ef7b7862f6e" ["fa:16:3e:24:46:3a"] []          []  5         
744a141e-8f02-4783-8cbe-af13992f74f7 "b29ae352-588f-45bc-aefe-ba15bf2f889b" bd7dbbf9-1325-491f-b46b-80b4ecfc560b "b2e0ae9e-d472-42ed-8776-6b338349d01d" ["fa:16:3e:b7:cd:77"] []          []  4         
225cc721-7c59-4466-872b-02f5e61efe56 "b29ae352-588f-45bc-aefe-ba15bf2f889b" bd7dbbf9-1325-491f-b46b-80b4ecfc560b "f24756f3-d803-47d3-9fc7-1315f4071ac0" ["fa:16:3e:b1:34:ed"] []          []  2         
...

OVN Pipeline

Another function of the ovn-northd service is defining the contents of the Pipeline table in the OVN_Southbound database. Each row in the Pipeline table represents a logical flow. ovn-controller on each chassis is responsible for converting the logical flows into OpenFlow flows appropriate for that node. We will go through annotated Pipeline contents for the current configuration. The output has been reordered to make it easier to follow. It’s sorted by datapath (the logical switch the flows are associated with), then table_id, then priority.

The Pipeline table has a similar format to OpenFlow. For each logical datapath (logical switch), processing starts at the highest priority match in table 0. A complete description of the syntax for the Pipeline table can be found in the ovn-sb document.

(ovn-devstack-1)$ ovsdb-client dump OVN_Southbound
...
Pipeline table
_uuid                                actions                                                                                                                                                                                                                                                                                             logical_datapath                     match                                                                                   priority table_id
------------------------------------ --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ------------------------------------ --------------------------------------------------------------------------------------- -------- --------
...

Table 0 starts by dropping anything with an invalid source MAC address. It also says to drop anything with a logical vlan tag, because there’s no concept of logical vlans.

_uuid                                actions                                                                                                                                                                                                                                                                                             logical_datapath                     match                                                                                   priority table_id
------------------------------------ --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ------------------------------------ --------------------------------------------------------------------------------------- -------- --------
9bbe8795-d093-4c9b-a712-1c7b8e953ae7 "drop;"                                                                                                                                                                                                                                                                                             bd7dbbf9-1325-491f-b46b-80b4ecfc560b "eth.src[40]"                                                                           100      0       
71af3be1-b0b7-4e5a-aadb-c510907bfabd "drop;"                                                                                                                                                                                                                                                                                             bd7dbbf9-1325-491f-b46b-80b4ecfc560b vlan.present                                                                            100      0       

The next 5 rows correspond to the five logical ports on this logical network. If the packet came in from one of the logical ports and its source MAC address is one that is allowed, processing will continue in table 1.

_uuid                                actions                                                                                                                                                                                                                                                                                             logical_datapath                     match                                                                                   priority table_id
------------------------------------ --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ------------------------------------ --------------------------------------------------------------------------------------- -------- --------
914eab89-b327-4a7d-ad88-08d2e4be104c "next;"                                                                                                                                                                                                                                                                                             bd7dbbf9-1325-491f-b46b-80b4ecfc560b "inport == \"10964198-b218-417e-a59e-6a6d7096c936\" && eth.src == {fa:16:3e:50:01:91}"  50       0       
2f80e9b3-a2db-4160-9d94-598960160cfb "next;"                                                                                                                                                                                                                                                                                             bd7dbbf9-1325-491f-b46b-80b4ecfc560b "inport == \"381a2d96-bc4a-4785-82bc-4f2b48e007e8\" && eth.src == {fa:16:3e:76:12:96}"  50       0       
a761a821-9cc3-49d1-a00b-b2eabd242480 "next;"                                                                                                                                                                                                                                                                                             bd7dbbf9-1325-491f-b46b-80b4ecfc560b "inport == \"a7a8ee94-996e-4623-941c-1ef7b7862f6e\" && eth.src == {fa:16:3e:24:46:3a}"  50       0       
6b0675e9-9c09-48c3-8cbd-c0da0fd9f608 "next;"                                                                                                                                                                                                                                                                                             bd7dbbf9-1325-491f-b46b-80b4ecfc560b "inport == \"b2e0ae9e-d472-42ed-8776-6b338349d01d\" && eth.src == {fa:16:3e:b7:cd:77}"  50       0       
b6fb9eb2-3047-4231-9131-88f34d56ff77 "next;"                                                                                                                                                                                                                                                                                             bd7dbbf9-1325-491f-b46b-80b4ecfc560b "inport == \"f24756f3-d803-47d3-9fc7-1315f4071ac0\" && eth.src == {fa:16:3e:b1:34:ed}"  50       0       

Finally, if the packet did not patch any higher priority flows, it just gets dropped.

_uuid                                actions                                                                                                                                                                                                                                                                                             logical_datapath                     match                                                                                   priority table_id
------------------------------------ --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ------------------------------------ --------------------------------------------------------------------------------------- -------- --------
14e02175-d6ab-4a7c-bf30-7774ecf8074c "drop;"                                                                                                                                                                                                                                                                                             bd7dbbf9-1325-491f-b46b-80b4ecfc560b "1"                                                                                     0        0       

The highest priority flow in table 1 matches packets with a broadcast destination MAC address. In that case, processing continues in table 2 several times (once for each logical port on this network) with the outport variable set.

_uuid                                actions                                                                                                                                                                                                                                                                                             logical_datapath                     match                                                                                   priority table_id
------------------------------------ --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ------------------------------------ --------------------------------------------------------------------------------------- -------- --------
d4adaf36-f63f-4c04-b37b-63399b5b9459 "outport = \"10964198-b218-417e-a59e-6a6d7096c936\"; next; outport = \"381a2d96-bc4a-4785-82bc-4f2b48e007e8\"; next; outport = \"f24756f3-d803-47d3-9fc7-1315f4071ac0\"; next; outport = \"b2e0ae9e-d472-42ed-8776-6b338349d01d\"; next; outport = \"a7a8ee94-996e-4623-941c-1ef7b7862f6e\"; next;" bd7dbbf9-1325-491f-b46b-80b4ecfc560b "eth.dst[40]"                                                                           100      1       

The next 5 flows match when the destination MAC address is a MAC address assigned to one of the logical ports. In that case, the outport variable gets set and processing continues in table 2.

_uuid                                actions                                                                                                                                                                                                                                                                                             logical_datapath                     match                                                                                   priority table_id
------------------------------------ --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ------------------------------------ --------------------------------------------------------------------------------------- -------- --------
22e05122-a917-4fd5-804f-b8dc6e48d334 "outport = \"10964198-b218-417e-a59e-6a6d7096c936\"; next;"                                                                                                                                                                                                                                         bd7dbbf9-1325-491f-b46b-80b4ecfc560b "eth.dst == fa:16:3e:50:01:91"                                                          50       1       
b444b9f2-4924-4d4c-bc92-c44b86fb7fc2 "outport = \"381a2d96-bc4a-4785-82bc-4f2b48e007e8\"; next;"                                                                                                                                                                                                                                         bd7dbbf9-1325-491f-b46b-80b4ecfc560b "eth.dst == fa:16:3e:76:12:96"                                                          50       1       
a2fe2a1a-6b44-4255-bda0-0b665c2bfafc "outport = \"a7a8ee94-996e-4623-941c-1ef7b7862f6e\"; next;"                                                                                                                                                                                                                                         bd7dbbf9-1325-491f-b46b-80b4ecfc560b "eth.dst == fa:16:3e:24:46:3a"                                                          50       1       
834f9f4b-999b-4445-b7ae-3e912c97cbe7 "outport = \"b2e0ae9e-d472-42ed-8776-6b338349d01d\"; next;"                                                                                                                                                                                                                                         bd7dbbf9-1325-491f-b46b-80b4ecfc560b "eth.dst == fa:16:3e:b7:cd:77"                                                          50       1       
dcd4f6c2-627a-4441-927a-feedcf2295cb "outport = \"f24756f3-d803-47d3-9fc7-1315f4071ac0\"; next;"                                                                                                                                                                                                                                         bd7dbbf9-1325-491f-b46b-80b4ecfc560b "eth.dst == fa:16:3e:b1:34:ed"                                                          50       1       

Table 2 does nothing important in this environment. It will eventually be used to implement ACLs. In the context of Neutron, security groups will get translated into OVN ACLs and those ACLs will be reflected by flow entries in this table.

_uuid                                actions                                                                                                                                                                                                                                                                                             logical_datapath                     match                                                                                   priority table_id
------------------------------------ --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ------------------------------------ --------------------------------------------------------------------------------------- -------- --------
325b6008-7cfd-480f-8b15-d9d42dfff567 "next;"                                                                                                                                                                                                                                                                                             bd7dbbf9-1325-491f-b46b-80b4ecfc560b "1"                                                                                     0        2       

Table 3 is the final table. The first flow matches a broadcast destination MAC address. The action is output;, which means to output the packet to the logical port identified by the outport variable.

_uuid                                actions                                                                                                                                                                                                                                                                                             logical_datapath                     match                                                                                   priority table_id
------------------------------------ --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ------------------------------------ --------------------------------------------------------------------------------------- -------- --------
70c9b568-0e8d-4592-904d-4f5b0c7ca606 "output;"                                                                                                                                                                                                                                                                                           bd7dbbf9-1325-491f-b46b-80b4ecfc560b "eth.dst[40]"                                                                           100      3       

The following 5 flows are associated with the 5 logical ports on this network. They will match if the outport variable matches a logical port and the destination MAC address is in the set of allowed MAC addresses.

_uuid                                actions                                                                                                                                                                                                                                                                                             logical_datapath                     match                                                                                   priority table_id
------------------------------------ --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ------------------------------------ --------------------------------------------------------------------------------------- -------- --------
1d6f7f7f-f3ac-409c-9dda-f55b0fd4c6da "output;"                                                                                                                                                                                                                                                                                           bd7dbbf9-1325-491f-b46b-80b4ecfc560b "outport == \"10964198-b218-417e-a59e-6a6d7096c936\" && eth.dst == {fa:16:3e:50:01:91}" 50       3       
730a39c1-93ea-41a1-81e2-a8f08f011981 "output;"                                                                                                                                                                                                                                                                                           bd7dbbf9-1325-491f-b46b-80b4ecfc560b "outport == \"381a2d96-bc4a-4785-82bc-4f2b48e007e8\" && eth.dst == {fa:16:3e:76:12:96}" 50       3       
9b83c397-1d24-4446-a201-c56eec8cb9ba "output;"                                                                                                                                                                                                                                                                                           bd7dbbf9-1325-491f-b46b-80b4ecfc560b "outport == \"a7a8ee94-996e-4623-941c-1ef7b7862f6e\" && eth.dst == {fa:16:3e:24:46:3a}" 50       3       
f37cc0d7-e69a-4f45-b8ef-595c25b5c62b "output;"                                                                                                                                                                                                                                                                                           bd7dbbf9-1325-491f-b46b-80b4ecfc560b "outport == \"b2e0ae9e-d472-42ed-8776-6b338349d01d\" && eth.dst == {fa:16:3e:b7:cd:77}" 50       3       
5ec90c56-fe85-4199-ad30-f0f32ee2b8da "output;"                                                                                                                                                                                                                                                                                           bd7dbbf9-1325-491f-b46b-80b4ecfc560b "outport == \"f24756f3-d803-47d3-9fc7-1315f4071ac0\" && eth.dst == {fa:16:3e:b1:34:ed}" 50       3       

All of the flows above are associated with the private network. These flows follow the same pattern, but are for the public network.

_uuid                                actions                                                                                                                                                                                                                                                                                             logical_datapath                     match                                                                                   priority table_id
------------------------------------ --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ------------------------------------ --------------------------------------------------------------------------------------- -------- --------
e9e8854e-cb44-437d-84b1-21fec5ce2929 "drop;"                                                                                                                                                                                                                                                                                             f8e8c67c-ce4a-4f23-a01b-0eb31b4ab3e2 "eth.src[40]"                                                                           100      0       
859cf482-2119-43f6-8c41-f91bae89759a "drop;"                                                                                                                                                                                                                                                                                             f8e8c67c-ce4a-4f23-a01b-0eb31b4ab3e2 vlan.present                                                                            100      0       
47cdc942-0274-40d0-985c-5f218801adc7 "next;"                                                                                                                                                                                                                                                                                             f8e8c67c-ce4a-4f23-a01b-0eb31b4ab3e2 "inport == \"a5b967d4-296e-44dc-98b9-7336d0224e57\" && eth.src == {fa:16:3e:8c:d0:a8}"  50       0       
5b80edbe-a556-4ae4-abc2-3708706c0c2a "drop;"                                                                                                                                                                                                                                                                                             f8e8c67c-ce4a-4f23-a01b-0eb31b4ab3e2 "1"                                                                                     0        0       
bf9659cb-2632-4c37-ac7a-b960b1f168ac "outport = \"a5b967d4-296e-44dc-98b9-7336d0224e57\"; next;"                                                                                                                                                                                                                                         f8e8c67c-ce4a-4f23-a01b-0eb31b4ab3e2 "eth.dst[40]"                                                                           100      1       
06684b55-3623-474d-8053-48febf7716f6 "outport = \"a5b967d4-296e-44dc-98b9-7336d0224e57\"; next;"                                                                                                                                                                                                                                         f8e8c67c-ce4a-4f23-a01b-0eb31b4ab3e2 "eth.dst == fa:16:3e:8c:d0:a8"                                                          50       1       
dc42b637-e9f3-49ed-b158-cd71026ee021 "next;"                                                                                                                                                                                                                                                                                             f8e8c67c-ce4a-4f23-a01b-0eb31b4ab3e2 "1"                                                                                     0        2       
1ba4eb72-9b66-4250-991b-6431b3360fce "output;"                                                                                                                                                                                                                                                                                           f8e8c67c-ce4a-4f23-a01b-0eb31b4ab3e2 "eth.dst[40]"                                                                           100      3       
37827ec2-c7c6-4e20-b2b7-77f16db4d3d3 "output;"                                                                                                                                                                                                                                                                                           f8e8c67c-ce4a-4f23-a01b-0eb31b4ab3e2 "outport == \"a5b967d4-296e-44dc-98b9-7336d0224e57\" && eth.dst == {fa:16:3e:8c:d0:a8}" 50       3       

The Integration Bridge

Part of the configuration for ovn-controller is the integration bridge to use for all of its configuration.  By default, this is br-int.  Let’s start by looking at the configuration of br-int on ovn-devstack-2, as it is a bit simpler than ovn-devstack-1.

(ovn-devstack-2)$ ovs-vsctl show
a70d8333-9b36-4765-8eb2-a91a3d5833f8
    Bridge br-int
        fail_mode: secure
        Port br-int
            Interface br-int
                type: internal
        Port "tap10964198-b2"
            Interface "tap10964198-b2"
        Port "ovn-b29ae3-0"
            Interface "ovn-b29ae3-0"
                type: geneve
                options: {key=flow, remote_ip="172.16.189.6"}

The port tap10964198-b2 is the port associated with VM running on this compute node (test2, 10.0.0.4). The other port, ovn-b29ae3-0, is for sending packets over a geneve tunnel to ovn-devstack-1.

Now we can look at the configuration of br-int on the other host, ovn-devstack-1. The setup is very similar, except it has some additional ports that are associated with the default Neutron setup done by DevStack.

(ovn-devstack-1)$ ovs-vsctl show
197d2a0d-c85d-4113-94bb-bd836ef03970
    Bridge br-int
        fail_mode: secure
        Port "qr-f24756f3-d8"
            Interface "qr-f24756f3-d8"
                type: internal
        Port "tapa7a8ee94-99"
            Interface "tapa7a8ee94-99"
        Port "ovn-2a33c9-0"
            Interface "ovn-2a33c9-0"
                type: geneve
                options: {key=flow, remote_ip="172.16.189.10"}
        Port "qr-b2e0ae9e-d4"
            Interface "qr-b2e0ae9e-d4"
                type: internal
        Port br-int
            Interface br-int
                type: internal
        Port "tap381a2d96-bc"
            Interface "tap381a2d96-bc"
                type: internal
    Bridge br-ex
        Port br-ex
            Interface br-ex
                type: internal
        Port "qg-a5b967d4-29"
            Interface "qg-a5b967d4-29"
                type: internal

OpenFlow

ovn-controller on each compute node converts the logical pipeline into OpenFlow flows. The processing maps conceptually to what we went through for the Pipeline table. Here are the flows for br-int on ovn-devstack-1.

(ovn-devstack-1)$ sudo ovs-ofctl -O OpenFlow13 dump-flows br-int
OFPST_FLOW reply (OF1.3) (xid=0x2):
 cookie=0x0, duration=15264.413s, table=0, n_packets=28, n_bytes=3302, priority=100,in_port=1 actions=set_field:0x1->metadata,set_field:0x1->reg6,resubmit(,16)
 cookie=0x0, duration=15264.413s, table=0, n_packets=1797, n_bytes=294931, priority=100,in_port=2 actions=set_field:0x1->metadata,set_field:0x2->reg6,resubmit(,16)
 cookie=0x0, duration=15264.413s, table=0, n_packets=12857, n_bytes=1414286, priority=100,in_port=3 actions=set_field:0x1->metadata,set_field:0x4->reg6,resubmit(,16)
 cookie=0x0, duration=15264.413s, table=0, n_packets=1239, n_bytes=143548, priority=100,in_port=5 actions=set_field:0x1->metadata,set_field:0x5->reg6,resubmit(,16)
 cookie=0x0, duration=15264.413s, table=0, n_packets=20, n_bytes=1940, priority=50,tun_id=0x1 actions=output:1
 cookie=0x0, duration=15264.413s, table=0, n_packets=237, n_bytes=23848, priority=50,tun_id=0x2 actions=output:2
 cookie=0x0, duration=15264.413s, table=0, n_packets=14, n_bytes=1430, priority=50,tun_id=0x4 actions=output:3
 cookie=0x0, duration=15264.413s, table=0, n_packets=75, n_bytes=8516, priority=50,tun_id=0x5 actions=output:5
 cookie=0x0, duration=15264.413s, table=16, n_packets=0, n_bytes=0, priority=100,metadata=0x1,vlan_tci=0x1000/0x1000 actions=drop
 cookie=0x0, duration=15264.413s, table=16, n_packets=0, n_bytes=0, priority=100,metadata=0x2,vlan_tci=0x1000/0x1000 actions=drop
 cookie=0x0, duration=15264.413s, table=16, n_packets=0, n_bytes=0, priority=100,metadata=0x1,dl_src=01:00:00:00:00:00/01:00:00:00:00:00 actions=drop
 cookie=0x0, duration=15264.413s, table=16, n_packets=0, n_bytes=0, priority=100,metadata=0x2,dl_src=01:00:00:00:00:00/01:00:00:00:00:00 actions=drop
 cookie=0x0, duration=15264.413s, table=16, n_packets=28, n_bytes=3302, priority=50,reg6=0x1,metadata=0x1,dl_src=fa:16:3e:76:12:96 actions=resubmit(,17)
 cookie=0x0, duration=15264.413s, table=16, n_packets=1797, n_bytes=294931, priority=50,reg6=0x2,metadata=0x1,dl_src=fa:16:3e:b1:34:ed actions=resubmit(,17)
 cookie=0x0, duration=15264.413s, table=16, n_packets=0, n_bytes=0, priority=50,reg6=0x3,metadata=0x2,dl_src=fa:16:3e:8c:d0:a8 actions=resubmit(,17)
 cookie=0x0, duration=15264.413s, table=16, n_packets=12857, n_bytes=1414286, priority=50,reg6=0x4,metadata=0x1,dl_src=fa:16:3e:b7:cd:77 actions=resubmit(,17)
 cookie=0x0, duration=15264.413s, table=16, n_packets=1239, n_bytes=143548, priority=50,reg6=0x5,metadata=0x1,dl_src=fa:16:3e:24:46:3a actions=resubmit(,17)
 cookie=0x0, duration=15264.413s, table=16, n_packets=0, n_bytes=0, priority=50,reg6=0x6,metadata=0x1,dl_src=fa:16:3e:50:01:91 actions=resubmit(,17)
 cookie=0x0, duration=15264.413s, table=16, n_packets=0, n_bytes=0, priority=0,metadata=0x1 actions=drop
 cookie=0x0, duration=15264.413s, table=16, n_packets=0, n_bytes=0, priority=0,metadata=0x2 actions=drop
 cookie=0x0, duration=15264.413s, table=17, n_packets=12978, n_bytes=1420946, priority=100,metadata=0x1,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00 actions=set_field:0x6->reg7,resubmit(,18),set_field:0x1->reg7,resubmit(,18),set_field:0x2->reg7,resubmit(,18),set_field:0x4->reg7,resubmit(,18),set_field:0x5->reg7,resubmit(,18)
 cookie=0x0, duration=15264.413s, table=17, n_packets=0, n_bytes=0, priority=100,metadata=0x2,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00 actions=set_field:0x3->reg7,resubmit(,18)
 cookie=0x0, duration=15264.413s, table=17, n_packets=7, n_bytes=552, priority=50,metadata=0x1,dl_dst=fa:16:3e:76:12:96 actions=set_field:0x1->reg7,resubmit(,18)
 cookie=0x0, duration=15264.413s, table=17, n_packets=1064, n_bytes=129938, priority=50,metadata=0x1,dl_dst=fa:16:3e:b1:34:ed actions=set_field:0x2->reg7,resubmit(,18)
 cookie=0x0, duration=15264.413s, table=17, n_packets=0, n_bytes=0, priority=50,metadata=0x2,dl_dst=fa:16:3e:8c:d0:a8 actions=set_field:0x3->reg7,resubmit(,18)
 cookie=0x0, duration=15264.413s, table=17, n_packets=0, n_bytes=0, priority=50,metadata=0x1,dl_dst=fa:16:3e:b7:cd:77 actions=set_field:0x4->reg7,resubmit(,18)
 cookie=0x0, duration=15264.413s, table=17, n_packets=1492, n_bytes=154092, priority=50,metadata=0x1,dl_dst=fa:16:3e:24:46:3a actions=set_field:0x5->reg7,resubmit(,18)
 cookie=0x0, duration=15264.413s, table=17, n_packets=380, n_bytes=150539, priority=50,metadata=0x1,dl_dst=fa:16:3e:50:01:91 actions=set_field:0x6->reg7,resubmit(,18)
 cookie=0x0, duration=15264.413s, table=18, n_packets=37895, n_bytes=4255421, priority=0,metadata=0x1 actions=resubmit(,19)
 cookie=0x0, duration=15264.413s, table=18, n_packets=0, n_bytes=0, priority=0,metadata=0x2 actions=resubmit(,19)
 cookie=0x0, duration=15264.413s, table=19, n_packets=34952, n_bytes=3820300, priority=100,metadata=0x1,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,64)
 cookie=0x0, duration=15264.413s, table=19, n_packets=0, n_bytes=0, priority=100,metadata=0x2,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,64)
 cookie=0x0, duration=15264.413s, table=19, n_packets=7, n_bytes=552, priority=50,reg7=0x1,metadata=0x1,dl_dst=fa:16:3e:76:12:96 actions=resubmit(,64)
 cookie=0x0, duration=15264.413s, table=19, n_packets=1064, n_bytes=129938, priority=50,reg7=0x2,metadata=0x1,dl_dst=fa:16:3e:b1:34:ed actions=resubmit(,64)
 cookie=0x0, duration=15264.413s, table=19, n_packets=0, n_bytes=0, priority=50,reg7=0x3,metadata=0x2,dl_dst=fa:16:3e:8c:d0:a8 actions=resubmit(,64)
 cookie=0x0, duration=15264.413s, table=19, n_packets=0, n_bytes=0, priority=50,reg7=0x4,metadata=0x1,dl_dst=fa:16:3e:b7:cd:77 actions=resubmit(,64)
 cookie=0x0, duration=15264.413s, table=19, n_packets=1492, n_bytes=154092, priority=50,reg7=0x5,metadata=0x1,dl_dst=fa:16:3e:24:46:3a actions=resubmit(,64)
 cookie=0x0, duration=15264.413s, table=19, n_packets=380, n_bytes=150539, priority=50,reg7=0x6,metadata=0x1,dl_dst=fa:16:3e:50:01:91 actions=resubmit(,64)
 cookie=0x0, duration=15264.413s, table=64, n_packets=9, n_bytes=726, priority=100,reg6=0x1,reg7=0x1 actions=drop
 cookie=0x0, duration=15264.413s, table=64, n_packets=6, n_bytes=252, priority=100,reg6=0x2,reg7=0x2 actions=drop
 cookie=0x0, duration=15264.413s, table=64, n_packets=3238, n_bytes=356180, priority=100,reg6=0x4,reg7=0x4 actions=drop
 cookie=0x0, duration=15264.413s, table=64, n_packets=96, n_bytes=4818, priority=100,reg6=0x5,reg7=0x5 actions=drop
 cookie=0x0, duration=15264.413s, table=64, n_packets=0, n_bytes=0, priority=100,reg6=0x6,reg7=0x6 actions=drop
 cookie=0x0, duration=15264.413s, table=64, n_packets=12976, n_bytes=1420772, priority=50,reg7=0x1 actions=output:1
 cookie=0x0, duration=15264.413s, table=64, n_packets=14018, n_bytes=1549120, priority=50,reg7=0x2 actions=output:2
 cookie=0x0, duration=15264.413s, table=64, n_packets=96, n_bytes=4818, priority=50,reg7=0x4 actions=output:3
 cookie=0x0, duration=15264.413s, table=64, n_packets=4722, n_bytes=509392, priority=50,reg7=0x5 actions=output:5
 cookie=0x0, duration=15264.413s, table=64, n_packets=2734, n_bytes=409343, priority=50,reg7=0x6 actions=set_field:0x6->tun_id,output:4

And here are the flows for br-int on ovn-devstack-2

(ovn-devstack-2)$ sudo ovs-ofctl -O OpenFlow13 dump-flows br-int
OFPST_FLOW reply (OF1.3) (xid=0x2):
 cookie=0x0, duration=20967.205s, table=0, n_packets=304, n_bytes=31444, priority=100,in_port=2 actions=set_field:0x1->metadata,set_field:0x6->reg6,resubmit(,16)
 cookie=0x0, duration=20967.205s, table=0, n_packets=2674, n_bytes=292967, priority=50,tun_id=0x6 actions=output:2
 cookie=0x0, duration=83073.583s, table=16, n_packets=0, n_bytes=0, priority=100,metadata=0x2,dl_src=01:00:00:00:00:00/01:00:00:00:00:00 actions=drop
 cookie=0x0, duration=83073.582s, table=16, n_packets=0, n_bytes=0, priority=100,metadata=0x1,dl_src=01:00:00:00:00:00/01:00:00:00:00:00 actions=drop
 cookie=0x0, duration=83073.582s, table=16, n_packets=0, n_bytes=0, priority=100,metadata=0x2,vlan_tci=0x1000/0x1000 actions=drop
 cookie=0x0, duration=83073.582s, table=16, n_packets=0, n_bytes=0, priority=100,metadata=0x1,vlan_tci=0x1000/0x1000 actions=drop
 cookie=0x0, duration=83073.582s, table=16, n_packets=0, n_bytes=0, priority=50,reg6=0x4,metadata=0x1,dl_src=fa:16:3e:b7:cd:77 actions=resubmit(,17)
 cookie=0x0, duration=83073.582s, table=16, n_packets=0, n_bytes=0, priority=50,reg6=0x1,metadata=0x1,dl_src=fa:16:3e:76:12:96 actions=resubmit(,17)
 cookie=0x0, duration=83073.582s, table=16, n_packets=0, n_bytes=0, priority=50,reg6=0x3,metadata=0x2,dl_src=fa:16:3e:8c:d0:a8 actions=resubmit(,17)
 cookie=0x0, duration=83073.582s, table=16, n_packets=0, n_bytes=0, priority=50,reg6=0x2,metadata=0x1,dl_src=fa:16:3e:b1:34:ed actions=resubmit(,17)
 cookie=0x0, duration=21021.391s, table=16, n_packets=0, n_bytes=0, priority=50,reg6=0x5,metadata=0x1,dl_src=fa:16:3e:24:46:3a actions=resubmit(,17)
 cookie=0x0, duration=20968.863s, table=16, n_packets=304, n_bytes=31444, priority=50,reg6=0x6,metadata=0x1,dl_src=fa:16:3e:50:01:91 actions=resubmit(,17)
 cookie=0x0, duration=83073.583s, table=16, n_packets=0, n_bytes=0, priority=0,metadata=0x1 actions=drop
 cookie=0x0, duration=83073.582s, table=16, n_packets=0, n_bytes=0, priority=0,metadata=0x2 actions=drop
 cookie=0x0, duration=83073.583s, table=17, n_packets=14, n_bytes=1430, priority=100,metadata=0x1,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00 actions=set_field:0x1->reg7,resubmit(,18),set_field:0x2->reg7,resubmit(,18),set_field:0x4->reg7,resubmit(,18),set_field:0x5->reg7,resubmit(,18)
 cookie=0x0, duration=83073.582s, table=17, n_packets=0, n_bytes=0, priority=100,metadata=0x2,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00 actions=set_field:0x3->reg7,resubmit(,18)
 cookie=0x0, duration=83073.583s, table=17, n_packets=223, n_bytes=22418, priority=50,metadata=0x1,dl_dst=fa:16:3e:b1:34:ed actions=set_field:0x2->reg7,resubmit(,18)
 cookie=0x0, duration=83073.583s, table=17, n_packets=6, n_bytes=510, priority=50,metadata=0x1,dl_dst=fa:16:3e:76:12:96 actions=set_field:0x1->reg7,resubmit(,18)
 cookie=0x0, duration=83073.582s, table=17, n_packets=0, n_bytes=0, priority=50,metadata=0x1,dl_dst=fa:16:3e:b7:cd:77 actions=set_field:0x4->reg7,resubmit(,18)
 cookie=0x0, duration=83073.582s, table=17, n_packets=0, n_bytes=0, priority=50,metadata=0x2,dl_dst=fa:16:3e:8c:d0:a8 actions=set_field:0x3->reg7,resubmit(,18)
 cookie=0x0, duration=21021.390s, table=17, n_packets=61, n_bytes=7086, priority=50,metadata=0x1,dl_dst=fa:16:3e:24:46:3a actions=set_field:0x5->reg7,resubmit(,18)
 cookie=0x0, duration=20968.863s, table=17, n_packets=0, n_bytes=0, priority=50,metadata=0x1,dl_dst=fa:16:3e:50:01:91 actions=set_field:0x6->reg7,resubmit(,18)
 cookie=0x0, duration=83073.583s, table=18, n_packets=0, n_bytes=0, priority=0,metadata=0x2 actions=resubmit(,19)
 cookie=0x0, duration=83073.582s, table=18, n_packets=346, n_bytes=35734, priority=0,metadata=0x1 actions=resubmit(,19)
 cookie=0x0, duration=83073.583s, table=19, n_packets=56, n_bytes=5720, priority=100,metadata=0x1,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,64)
 cookie=0x0, duration=83073.582s, table=19, n_packets=0, n_bytes=0, priority=100,metadata=0x2,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,64)
 cookie=0x0, duration=83073.583s, table=19, n_packets=0, n_bytes=0, priority=50,reg7=0x4,metadata=0x1,dl_dst=fa:16:3e:b7:cd:77 actions=resubmit(,64)
 cookie=0x0, duration=83073.582s, table=19, n_packets=6, n_bytes=510, priority=50,reg7=0x1,metadata=0x1,dl_dst=fa:16:3e:76:12:96 actions=resubmit(,64)
 cookie=0x0, duration=83073.582s, table=19, n_packets=223, n_bytes=22418, priority=50,reg7=0x2,metadata=0x1,dl_dst=fa:16:3e:b1:34:ed actions=resubmit(,64)
 cookie=0x0, duration=83073.582s, table=19, n_packets=0, n_bytes=0, priority=50,reg7=0x3,metadata=0x2,dl_dst=fa:16:3e:8c:d0:a8 actions=resubmit(,64)
 cookie=0x0, duration=21021.390s, table=19, n_packets=61, n_bytes=7086, priority=50,reg7=0x5,metadata=0x1,dl_dst=fa:16:3e:24:46:3a actions=resubmit(,64)
 cookie=0x0, duration=20968.863s, table=19, n_packets=0, n_bytes=0, priority=50,reg7=0x6,metadata=0x1,dl_dst=fa:16:3e:50:01:91 actions=resubmit(,64)
 cookie=0x0, duration=83073.583s, table=64, n_packets=0, n_bytes=0, priority=100,reg6=0x4,reg7=0x4 actions=drop
 cookie=0x0, duration=83073.582s, table=64, n_packets=0, n_bytes=0, priority=100,reg6=0x1,reg7=0x1 actions=drop
 cookie=0x0, duration=83073.582s, table=64, n_packets=0, n_bytes=0, priority=100,reg6=0x2,reg7=0x2 actions=drop
 cookie=0x0, duration=20993.791s, table=64, n_packets=0, n_bytes=0, priority=100,reg6=0x5,reg7=0x5 actions=drop
 cookie=0x0, duration=20967.205s, table=64, n_packets=0, n_bytes=0, priority=100,reg6=0x6,reg7=0x6 actions=drop
 cookie=0x0, duration=83073.583s, table=64, n_packets=237, n_bytes=23848, priority=50,reg7=0x2 actions=set_field:0x2->tun_id,output:1
 cookie=0x0, duration=83073.583s, table=64, n_packets=20, n_bytes=1940, priority=50,reg7=0x1 actions=set_field:0x1->tun_id,output:1
 cookie=0x0, duration=83073.583s, table=64, n_packets=14, n_bytes=1430, priority=50,reg7=0x4 actions=set_field:0x4->tun_id,output:1
 cookie=0x0, duration=20993.791s, table=64, n_packets=75, n_bytes=8516, priority=50,reg7=0x5 actions=set_field:0x5->tun_id,output:1
 cookie=0x0, duration=20967.205s, table=64, n_packets=0, n_bytes=0, priority=50,reg7=0x6 actions=output:2

Future work for OVN+OpenStack

The OpenStack integration with OVN still makes use of the L3 and DHCP agents from Neutron. We expect that functionality to be implemented in OVN, instead. As its available in OVN, we will expand the Neutron integration to make use of it.

Support for Neutron security groups is not yet fully implemented. The plan is to use some brand new OVS conntrack functionality, which should offer much better performance as compared to how security groups are implemented with Neutron today using iptables. This functionality is targeted at the next OVS release (2.4). My understanding is that the code has been working for a while and is just working its way through the review process for both OVS and the upstream Linux kernel.

We have some initial CI testing in place. It installs OpenStack with OVN and makes sure it can all start up and accept the default configuration done by DevStack. We just turned on a new job that runs the tempest test suite but haven’t started working through making it pass correctly. Once the tempest job is passing successfully, I would like to look into a test configuration that uses 2 nodes so that we can regularly exercise the use of tunnels between hosts running ovn-controller.

OVN and OpenStack Status – 2015-04-21

It has been a couple weeks since the last OVN status update. Here is a review of what has happened since that time.

ovn-nbd is now ovn-northd

Someone pointed out that the acronym “nbd” is used for “Network Block Device” and may exist in the same deployment as OVN.  To avoid any possible confusion, we renamed ovn-nbd to ovn-northd.

ovn-controller now exists

ovn-controller is the daemon that runs on every hypervisor or gateway.  The initial version of this daemon has been merged.  The current version of ovn-controller performs two important functions.

First, ovn-controller populates the Chassis table of the OVN_Southbound database.  Each row in the Chassis table represents a hypervisor or gateway running ovn-controller.  It contains information that identifies the chassis and what encapsulation types it supports.  If you run ovs-sandbox with OVN support enabled, it will run the following commands to configure ovn-controller:

ovs-vsctl set open . external-ids:system-id=56b18105-5706-46ef-80c4-ff20979ab068
ovs-vsctl set open . external-ids:ovn-remote=unix:"$sandbox"/db.sock
ovs-vsctl set open . external-ids:ovn-encap-type=vxlan
ovs-vsctl set open . external-ids:ovn-encap-ip=127.0.0.1
ovs-vsctl add-br br-int

After setup is complete, we can check the OVN_Southbound table’s contents and see the corresponding Chassis entry:

Chassis table
_uuid                                encaps                                 gateway_ports name                                  
------------------------------------ -------------------------------------- ------------- --------------------------------------
2852bf00-db63-4732-8b44-a3bc689ed1bc [e1c1f7fc-409d-4f74-923a-fc6de8409f82] {}            "56b18105-5706-46ef-80c4-ff20979ab068"

Encap table
_uuid                                ip          options type 
------------------------------------ ----------- ------- -----
e1c1f7fc-409d-4f74-923a-fc6de8409f82 "127.0.0.1" {}      vxlan

The other important task performed by the current version of ovn-controller is to monitor the local switch for ports being added that match up to logical ports created in OVN.  When a port is created on the local switch with an iface-id that matches the OVN logical port’s name, ovn-controller will update the Bindings table to specify that the port exists on this chassis.  Once this is done, ovn-northd will report that the port is up to the OVN_Northbound database.

$ ovsdb-client dump OVN_Southbound
Bindings table
_uuid                                chassis                                logical_port                           mac parent_port tag
------------------------------------ -------------------------------------- -------------------------------------- --- ----------- ---
...
2dc299fa-835b-4e42-aa82-3d2da523b4d9 "81b0f716-c957-43cf-b34e-87ae193f617a" "d03aa502-0d76-4c1e-8877-43778088c55c" []  []          [] 
...

$ ovn-nbctl lport-get-up d03aa502-0d76-4c1e-8877-43778088c55c
up

The next steps for ovn-controller are to program the local switch to create tunnels and flows as appropriate based on the contents of the OVN_Southbound database.  This is currently being worked on.

The Pipeline Table

The OVN_Southbound database has a table called Pipeline.  ovn-northd is responsible for translating the logical network elements defined in OVN_Northbound into entries in the Pipeline table of OVN_Southbound.  The first version of populating the Pipeline table has been merged. One thing that is particularly interesting here is that ovn-northd defines logical flows.  It does not have to figure out the detailed switch configuration for every chassis running ovn-controller.  ovn-controller is responsible for translating the logical flows into OpenFlow flows specific to the chassis.

The OVN_Southbound documentation has a good explanation of the contents of the Pipeline table.  If you’re familiar with OpenFlow, the format will be very familiar.

As a simple example, let’s just use ovn-nbctl to manually create a single logical switch that has 2 logical ports.

ovn-nbctl lswitch-add sw0
ovn-nbctl lport-add sw0 sw0-port1 
ovn-nbctl lport-add sw0 sw0-port2 
ovn-nbctl lport-set-macs sw0-port1 00:00:00:00:00:01
ovn-nbctl lport-set-macs sw0-port2 00:00:00:00:00:02

Now we can check out the resulting contents of the Pipeline table.  The output of ovsdb-client has been reordered to group the entries by table_id and priority. I’ve also cut off the _uuid column since it’s not important for understanding here.

Pipeline table
match                          priority table_id actions                                                                 logical_datapath
------------------------------ -------- -------- ----------------------------------------------------------------------- ------------------------------------
"eth.src[40]"                  100      0        drop                                                                    843a9a4a-8afc-41e2-bea1-5fa58874e109
vlan.present                   100      0        drop                                                                    843a9a4a-8afc-41e2-bea1-5fa58874e109
"inport == \"sw0-port1\""      50       0        resubmit                                                                843a9a4a-8afc-41e2-bea1-5fa58874e109
"inport == \"sw0-port2\""      50       0        resubmit                                                                843a9a4a-8afc-41e2-bea1-5fa58874e109
"1"                            0        0        drop                                                                    843a9a4a-8afc-41e2-bea1-5fa58874e109

"eth.dst[40]"                  100      1        "outport = \"sw0-port2\"; resubmit; outport = \"sw0-port1\"; resubmit;" 843a9a4a-8afc-41e2-bea1-5fa58874e109
"eth.dst == 00:00:00:00:00:01" 50       1        "outport = \"sw0-port1\"; resubmit;"                                    843a9a4a-8afc-41e2-bea1-5fa58874e109
"eth.dst == 00:00:00:00:00:02" 50       1        "outport = \"sw0-port2\"; resubmit;"                                    843a9a4a-8afc-41e2-bea1-5fa58874e109

"1"                            0        2        resubmit                                                                843a9a4a-8afc-41e2-bea1-5fa58874e109

"outport == \"sw0-port1\""     50       3        "output(\"sw0-port1\")"                                                 843a9a4a-8afc-41e2-bea1-5fa58874e109
"outport == \"sw0-port2\""     50       3        "output(\"sw0-port2\")"                                                 843a9a4a-8afc-41e2-bea1-5fa58874e109

In table 0, we’re dropping anything with a broadcast/multicast source MAC. We’re also dropping anything with a logical VLAN tag, as that doesn’t make sense. Next, if the packet comes from one of the ports connected to the logical switch, we will continue processing in table 1. Otherwise, we drop it.

In table 1, we will output the packet to all ports if the destination MAC is broadcast/multicast. Note that the output action to the source port is implicitly handled as a drop. Finally, we’ll set the output variable based on destination MAC address and continue processing in table 2.

Table 2 does nothing but continue to table 3. In the ovn-northd code, table 2 is where entries for ACLs go. ovn-nbctl does not currently support adding ACLs. This table is where Neutron will program security groups, but that’s not ready yet, either.

Table 3 handles sending the packet to the right output port based on the contents of the outport variable set back in table 1.

The logical_datapath column ties all of these rows together as implementing a single logical datapath, which in this case is an OVN logical switch.

There is one other item supported by ovn-northd that is not reflected in this example. The OVN_Northbound database has a port_security column for logical ports. Its contents are defined as “A set of L2 (Ethernet) or L3 (IPv4 or IPv6) addresses or L2+L3 pairs from which the logical port is allowed to send packets and to which it is allowed to receive packets.” If this were set here, table 0 would also handle ingress port security and table 3 would handle egress port security.

We will look at more detailed examples in future posts as both OVN and its Neutron integration progress further.

Neutron Integration

There have also been several changes to the Neutron integration for OVN in the last couple of weeks.  Since ovn-northd and ovn-controller are becoming more functional, the devstack integration runs both of these daemons, along with ovsdb-server and ovs-vswitchd.  That means that as you create networks and ports via the Neutron API, they will be created in OVN and result in Bindings and Pipeline updates.

We now also have a devstack CI job that runs against every patch proposed to the OVN Neutron integration.  It installs and runs Neutron with OVN.  Devstack also creates some default networks.  We still have a bit more work to do in OVN before we can expand this to actually test network connectivity.

Also related to testing, Terry Wilson submitted a patch to OVS that will allow us to publish the OVS Python bindings to PyPI.  The patch has been merged and Terry will soon be publishing the code to PyPI.  This will allow us to install the library for unit test jobs.

The original Neutron ML2 driver implementation used ovn-nbctl.  It has now been converted to use the Python ovsdb library, which should be much more efficient.  neutron-server will maintain an open connection to the OVN_Northbound database for all of its operations.

I’ve also been working on the necessary changes for creating a port in Neutron that is intended to be used by a container running inside a VM.  There is a python-neutronclient change and two changes needed to networking-ovn that I’m still testing.

There are some edge cases where a resource can be created in Neutron but fail before we’ve created it in OVN.  Gal Sagie is working on some code to get them back in sync.

Gal Sagie also has a patch up for the first step toward security group support.  We have to document how we will map Neutron security groups to rules in the OVN_Northbound ACL table.

One piece of information that is communicated back up to the OVN_Northbound database by OVN is the up state of a logical port.  Terry Wilson is working on having our Neutron driver consume that so that we can emit a notification when a port that was created becomes ready for use.  This notification gets turned into a callback to Nova to tell it the VIF is ready for use so the corresponding VM can be started.

OVN and OpenStack Integration Development Update

The Open vSwitch project announced the OVN effort back in January.  After OVN was announced, I got very interested in its potential.  OVN is by no means tied to OpenStack, but the primary reason I’m interested is I see it as a promising open source backend for OpenStack Neutron.  To put it into context with existing Neutron code, it would replace the OVS agent in Neutron in the short term.  It would eventually also replace the L3 and DHCP agents once OVN gains the equivalent functionality.

Implementation has been coming along well in the last month, so I wanted to share an overview of what we have so far.  We’re aiming to have a working implementation of L2 connectivity by the OpenStack Vancouver Summit next month.

Design

The initial design documentation was merged at the end of February.  Here are the rendered versions of those docs: ovn-architecture, ovn-nb schema, ovn schema.

This initial design allows hooking up VMs or containers to OVN managed virtual networks.  There was an update to the design merged that addresses the use case of running containers inside of VMs.  It seems like most existing work just creates another layer of overlay networks for containers.  What’s interesting about this proposal is that it allows you to connect those containers directly to the OVN managed virtual networks.  In the OpenStack world, that means you could have your containers hooked up directly to virtual networks managed by Neutron.  Further, the container hosting VM and all of its containers do not have to be connected to the same network and this works without having to create an extra layer of overlay networks.

OVN Implementation

For most of my OVN development and testing, I’ve been working straight from the ovs git tree. Building it is something like:

$ git clone http://github.com/openvswitch/ovs.git
$ cd ovs

Switch to the ovn branch, as that’s where OVN development is happening for now:

$ git checkout ovn

You’ll need automake, autoconf, libtool, make, patch, and gcc or clang installed, at least. For detailed instructions on building ovs, see INSTALL.md in the ovs git tree.

$ ./boot.sh
$ ./configure
$ make

OVS includes a script called ovs-sandbox that I find very helpful for development. It sets up a dummy ovs environment that you can run the tools against, but it doesn’t actually process real traffic. You can send some fake packets through to see how they would be processed if needed. I’ve been adding OVN support to ovs-sandbox along the way.

Here’s a demonstration of ovs-sandbox with what is implemented in OVN so far.  Start by running ovs-sandbox with OVN support turned on:

$ make sandbox SANDBOXFLAGS="-o"

You’ll get output like this:

----------------------------------------------------------------------
You are running in a dummy Open vSwitch environment. You can use
ovs-vsctl, ovs-ofctl, ovs-appctl, and other tools to work with the
dummy switch.

Log files, pidfiles, and the configuration database are in the
"sandbox" subdirectory.

Exit the shell to kill the running daemons.

Now everything is running:

$ ps ax | grep ov[sn]
 ...
 ... ovsdb-server --detach --no-chdir --pidfile -vconsole:off --log-file --remote=punix:/home/rbryant/src/ovs/tutorial/sandbox/db.sock ovn.db ovnnb.db conf.db
 ... ovs-vswitchd --detach --no-chdir --pidfile -vconsole:off --log-file --enable-dummy=override -vvconn -vnetdev_dummy
 ... ovn-nbd --detach --no-chdir --pidfile -vconsole:off --log-file

Note the ovn-nbd daemon. Soon there will also be an ovn-controller daemon running. Also note that ovsdb-server is serving up 3 databases (ovn.db, ovnnb.db, and conf.db).

You can run ovn-nbctl to create resources via the OVN public interface (the OVN_Northbound database). So, for example:

$ ovn-nbctl lswitch-add sw0
$ ovn-nbctl lswitch-add sw1
$ ovn-nbctl lswitch-list
4956f6b4-a1ba-49aa-86a6-134b9cfdfdf6 (sw1)
52858b33-995f-43fa-a1cf-445f16d2ab09 (sw0)
$ ovn-nbctl lport-add sw0-port0 sw0
$ ovn-nbctl lport-add sw0-port1 sw0
$ ovn-nbctl lport-list sw0
d4d78dc5-166d-4457-8bb0-1f6ed5f1ed91 (sw0-port1)
c2114eaa-2f75-443f-b23e-6dda664a979b (sw0-port0)

One of the things that ovn-nbd does is create entries in the Bindings table of the OVN database when logical ports are added to the OVN_Northbound database. The Bindings table is used to keep track of which hypervisor a port exists on after VIFs get created and plugged into the local ovs switch. After the commands above, there should be 2 entries in the Bindings table. We can dump the OVN db and see that they are there:

$ ovsdb-client dump OVN
Bindings table
_uuid chassis logical_port mac parent_port tag
------------------------------------ ------- ------------ --- ----------- ---
997e0c14-2fba-499d-b077-26ddfc87e935 "" "sw0-port0" [] [] []
f7b61ef1-01d5-42ab-b08e-176bf6f3eb4b "" "sw0-port1" [] [] []

Note that the chassis column is empty, meaning that the port hasn’t been placed on a hypervisor yet.

We can also see that the state of the port is still down in the OVN_Northbound database since it hasn’t been created on a hypervisor yet.

$ ovn-nbctl lport-get-up sw0-port0
down

One of the tasks of ovn-controller running on each hypervisor is to monitor the local switch and detect when a new port on the local switch corresponds with an OVN logical port. When that occurs, ovn-controller will update the chassis column. For now, we can simulate that with a manual ovsdb transaction:

$ ovsdb-client transact '["OVN",{"op":"update","table":"Bindings","where":[["_uuid","==",["uuid","997e0c14-2fba-499d-b077-26ddfc87e935"]]],"row":{"chassis":"hostname"}}]'
[{"count":1}]
$ ovsdb-client dump OVN
Bindings table
_uuid chassis logical_port mac parent_port tag
------------------------------------ -------- ------------ --- ----------- ---
f7b61ef1-01d5-42ab-b08e-176bf6f3eb4b "" "sw0-port1" [] [] []
997e0c14-2fba-499d-b077-26ddfc87e935 hostname "sw0-port0" [] [] []

Now that the chassis column has been populated, ovn-nbd should notice and set the port state to up in the OVN_Northbound db.

$ ovn-nbctl lport-get-up sw0-port0
up

OpenStack Integration

Like with most OpenStack projects, you can try out the Neutron support for OVN using devstack.  Instructions for using the OVN devstack plugin are in the networking-ovn git repo.

You start by cloning both devstack and networking-ovn.

$ git clone http://git.openstack.org/openstack-dev/devstack.git
$ git clone http://git.openstack.org/openstack/networking-ovn.git

If you don’t have any devstack configuration, you can use a sample local.conf from the networking-ovn repo:

$ cd devstack
$ cp ../networking-ovn/devstack/local.conf.sample local.conf

If you’re new to using devstack, it is best if you use a throwaway VM for this.  You will also need to run devstack with a sudo enabled user.  Once your configuration that enables OVN support is in place, run devstack:

$ ./stack.sh

In my case, I’m running this on Fedora 21.  It has also been tested on Ubuntu. Once devstack finishes running successfully, you should get output that looks like this:

This is your host ip: 192.168.122.31
Keystone is serving at http://192.168.122.31:5000/
The default users are: admin and demo
The password: password
2015-04-08 14:31:10.242 | stack.sh completed in 165 seconds.

One bit of environment initialization that devstack does is create some initial Neutron networks.  You can see them using the neutron command, which talks to the Neutron REST API.

$ . openrc
$ neutron net-list
+--------------------------------------+---------+--------------------------------------------------+
| id | name | subnets |
+--------------------------------------+---------+--------------------------------------------------+
| a28b651e-5cb9-481b-9f9b-d5d57e55c6d0 | public | df0aee67-166c-4ad4-890c-bbf5d02ca3cf |
| 2637f01e-f41e-4d1b-865f-195253027031 | private | eac6621f-e8cc-4c94-84bf-e73dab610018 10.0.0.0/24 |
+--------------------------------------+---------+--------------------------------------------------+

Since OVN is the configured backend, we can use the ovn-nbctl utility to verify that these networks were created in OVN.

$ ovn-nbctl lswitch-list
480235d0-d1a5-43a9-821b-d32e109445fd (neutron-2637f01e-f41e-4d1b-865f-195253027031)
a60a2c16-cea7-4bdc-8082-b47745d016b3 (neutron-a28b651e-5cb9-481b-9f9b-d5d57e55c6d0)
$ ovn-nbctl lswitch-get-external-id 480235d0-d1a5-43a9-821b-d32e109445fd
neutron:network_name=private
$ ovn-nbctl lswitch-get-external-id a60a2c16-cea7-4bdc-8082-b47745d016b3
neutron:network_name=public

We can also create ports using the Neutron API and verify that they get created in OVN. To do that, we first create a port in Neutron:

$ neutron port-create private
Created a new port:
+-----------------------+---------------------------------------------------------------------------------+
| Field | Value |
+-----------------------+---------------------------------------------------------------------------------+
| admin_state_up | True |
| allowed_address_pairs | |
| binding:vnic_type | normal |
| device_id | |
| device_owner | |
| fixed_ips | {"subnet_id": "eac6621f-e8cc-4c94-84bf-e73dab610018", "ip_address": "10.0.0.3"} |
| id | ff07588c-4b11-4ec8-b7c5-1be64fc0ebac |
| mac_address | fa:16:3e:23:bd:f6 |
| name | |
| network_id | 2637f01e-f41e-4d1b-865f-195253027031 |
| security_groups | ab539a1c-c3d8-49f7-9ad1-3a8b451bce91 |
| status | DOWN |
| tenant_id | 64f29642350d4c978cf03a4917a35999 |
+-----------------------+---------------------------------------------------------------------------------+

Then we can list the logical ports in OVN for the logical switch associated with the Neutron network named private.  The output is the OVN UUID for the port followed by the port name in parentheses.  Neutron sets the port name equal to the UUID of the Neutron port.

$ ovn-nbctl lswitch-get-external-id 480235d0-d1a5-43a9-821b-d32e109445fd
neutron:network_name=private
$ ovn-nbctl lport-list 480235d0-d1a5-43a9-821b-d32e109445fd
...
fe959cfa-fd20-4129-9669-67af1fa6bbf7 (ff07588c-4b11-4ec8-b7c5-1be64fc0ebac)

We can also see that the port is down since it has not yet been plugged in to the local ovs switch on a hypervisor:

$ ovn-nbctl lport-get-up fe959cfa-fd20-4129-9669-67af1fa6bbf7
down

Ongoing Work

All OVN development discussion, patch submission, and patch review happens on the ovs-dev mailing list.  Development is currently happening in the ovn branch until things are further along.  Discussion about the OpenStack integration happens on the openstack-dev mailing list, while patch submission and review happens in OpenStack’s gerrit.

As mentioned earlier, the ovn-controller daemon is not yet running in this development environment.  That will change shortly as Justin Pettit posted it for review earlier this week.

As you might have noticed, there’s a lot of infrastructure in place, but the actual flows and tunnels necessary to implement these virtual networks are not yet in place.  There’s been a lot of work in preparation for that, though.  Ben Pfaff has had a patch series up for review for expression matching needed for OVN.  It probably should have been merged by now, but the reviews have been a little slow.  (That’s my guilt talking.)  Ben has also started working on making ovn-nbd populate the Pipeline table of the OVN database.

Finally, the proposed OVN design introduces some new demands on ovsdb-server.  In particular, there will easily be hundreds of instances of ovn-controller connected to ovsdb-server.  Andy Zhou has been doing some very nice work around increasing performance in anticipation of these new demands.