OpenStack Security Groups using OVN ACLs

OpenStack Security Groups give you a way to define packet filtering policy that is implemented by the cloud infrastructure.  OVN and its OpenStack Neutron integration now includes support for security groups and this post discusses how it works.

Existing OVS Support in OpenStack

It’s worth looking at how this has been implemented with OVS in the past for OpenStack.  OpenStack’s existing OVS integration (ML2+OVS) makes use of iptables to implement security groups.  Unfortunately, to make that work, we have to connect the VM to a tap device, put that on a linux bridge, and then connect the linux bridge to the OVS bridge using a veth pair so that we have a place to implement the iptables rules.  It’s great that this works, but the extra layers are not ideal.

old-security-group-impl

To get rid of all of the extra layers between the VM and OVS, we need to be able to build stateful firewall services in OVS directly.

Enter OVS with Conntrack Integration

OVS integration with the kernel’s connection tracker has been a hotly anticipated feature for OVS, and for good reason.  At the last OpenStack Summit in May, 2015, in Vancouver, there was a presentation that covered the benefits of this integration and how it will benefit security groups once available.  They were able to demonstrate significant performance benefit over the current approach of implementing security groups using iptables.  You can watch the presentation here:

The talk goes into some good detail about how this works.  However, at that time the conntrack integration was not yet finished and available for use.  Since then there has been fantastic progress!  The upstream kernel changes have been accepted and the userspace changes have all merged into the OVS project.  This will all be available in the next OVS release after 2.4.

The major piece left is completing a backport of the kernel changes.  Even though the openvswitch module is included in the upstream kernel, the OVS project maintains a version of the code that is backported to older kernels.  Backports of the conntrack integration are available as of writing in this branch.

This functionality can now be used to build stateful services in OVS.  Without having to get into what this looks like in terms of detailed flows, here is an idea of what it lets you do in your packet processing pipeline.

  1. In one stage, you can match all IP traffic and send it through the connection tracker.
  2. In the next stage, you now have the connection tracker’s state associated with this packet.
    1. For packets representing a new connection, you can use custom policy to decide if you’d like to accept the connection or not.  If you do accept it, you can tell the connection tracker to remember this connection.
    2. You know when packets are associated with existing connections and can allow them through.  This also applies to associated return traffic.
    3. You know if a packet is invalid because it’s not the right type of packet for a new connection and doesn’t match any existing known connection.

Now let’s take a closer look at some real usage.

OVN Stateful ACLs

An example use of OVS+conntrack is the implementation of ACLs in OVN.  ACLs provide a way to do distributed packet filtering for OVN networks. OVN ACLs are used to implement security groups for OpenStack Neutron.

I always find ovs-sandbox incredibly useful for exploring OVN features.  In fact, I’ve been writing an OVN tutorial that uses ovs-sandbox. Let’s use ovs-sandbox to look at how OVN uses OVS+conntrack to implement ACLs.

I always run ovs-sandbox straight from the ovs git tree.  If you’re starting from scratch, you’ll first need to clone the ovs git repository. Note that you may also need to install some dependencies, including: autoconf, automake, libtool, gcc, patch, and make.

$ git clone https://github.com/openvswitch/ovs.git
$ cd ovs
$ ./configure
& make

Now that we have ovs compiled from git, we can run ovs-sandbox with OVN enabled from the git tree.

$ make sandbox SANDBOXFLAGS="--ovn"

Next, we need to create a simple OVN logical topology. We’ll reuse a script from the OVN tutorial that creates a single logical switch with two logical ports. It then binds the two logical ports to the local ovs bridge in our sandbox. This script outputs all of the commands it executes.

$ ovn/env1/setup.sh 
+ ovn-nbctl lswitch-add sw0
+ ovn-nbctl lport-add sw0 sw0-port1
+ ovn-nbctl lport-add sw0 sw0-port2
+ ovn-nbctl lport-set-addresses sw0-port1 00:00:00:00:00:01
+ ovn-nbctl lport-set-addresses sw0-port2 00:00:00:00:00:02
+ ovn-nbctl lport-set-port-security sw0-port1 00:00:00:00:00:01
+ ovn-nbctl lport-set-port-security sw0-port2 00:00:00:00:00:02
+ ovs-vsctl add-port br-int lport1 -- set Interface lport1 external_ids:iface-id=sw0-port1
+ ovs-vsctl add-port br-int lport2 -- set Interface lport2 external_ids:iface-id=sw0-port2

We can view the logical topology using ovn-nbctl.

$ ovn-nbctl show
    lswitch caef7a2c-71fb-4af3-9cbc-589889606a2b (sw0)
        lport sw0-port1
            addresses: 00:00:00:00:00:01
        lport sw0-port2
            addresses: 00:00:00:00:00:02

We can also look at the physical topology to see that the two logical ports are bound to our single local chassis (hypervisor).

$ ovn-sbctl show
Chassis "56b18105-5706-46ef-80c4-ff20979ab068"
    Encap geneve
        ip: "127.0.0.1"
    Port_Binding "sw0-port1"
    Port_Binding "sw0-port2"

Now let’s create some ACLs! A common use case would be creating a policy for a given port that looks something like this:

  • Allow incoming ICMP requests and associated return traffic.
  • Allow incoming SSH connections and associated return traffic.
  • Drop other incoming IP traffic.

Here’s how we’d create that policy for sw0-port1 using ACLs.

$ ovn-nbctl acl-add sw0 to-lport 1002 'outport == "sw0-port1" && ip && icmp' allow-related
$ ovn-nbctl acl-add sw0 to-lport 1002 'outport == "sw0-port1" && ip && tcp && tcp.dst == 22' allow-related
$ ovn-nbctl acl-add sw0 to-lport 1001 'outport == "sw0-port1" && ip' drop

To verify what we’ve done, we can list the ACLs configured on the logical switch sw0.

$ ovn-nbctl acl-list sw0
  to-lport  1002 (outport == "sw0-port1" && ip && icmp) allow-related
  to-lport  1002 (outport == "sw0-port1" && ip && tcp && tcp.dst == 22) allow-related
  to-lport  1001 (outport == "sw0-port1" && ip) drop

Next we can look at how OVN integrates these ACLs into its Logical Flows.

As an aside, the more I work on and with OVN, the more convinced I am that Logical Flows are an incredibly powerful abstraction used in the OVN implementation. OVN first describes the packet processing pipeline in a structure that seems similar to OpenFlow, but only talks about logical network elements. This single logical packet processing pipeline is sent down to all hypervisors. A local controller on each hypervisor converts the logical flows into OpenFlow flows that reflect the local view of the world. The end result of all of this is that we’re able to implement more and more complex features in logical flows without having to worry about the current physical topology.

Now that we have ACLs configured, there are new entries in the logical flow table in the stages switch_in_pre_acl, switch_in_acl, switch_out_pre_acl, and switch_out_acl. The full logical flow table at this point can be seen with ovn-sbctl.

$ ovn-sbctl lflow-list

Let’s take a closer look at the switch_out_pre_acl and switch_out_post_acl stages of the egress logical flows for sw0.

In switch_out_pre_acl, we match IP traffic and put it through the connection tracker. This populates the connection state fields so that we can apply policy as appropriate.

    table=0(switch_out_pre_acl), priority=  100, match=(ip), action=(ct_next;)
    table=0(switch_out_pre_acl), priority=    0, match=(1), action=(next;)

In switch_out_acl, we allow packets associated with existing connections. We drop packets that are deemed to be invalid (such as non-SYN TCP packet not associated with an existing connection).

    table=1(switch_out_acl), priority=65535, match=(!ct.est && ct.rel && !ct.new && !ct.inv), action=(next;)
    table=1(switch_out_acl), priority=65535, match=(ct.est && !ct.rel && !ct.new && !ct.inv), action=(next;)
    table=1(switch_out_acl), priority=65535, match=(ct.inv), action=(drop;)

For new connections, we apply our configured ACL policy to decide whether to allow the connection or not. In this case, we’ll allow ICMP or SSH. Otherwise, we’ll drop the packet.

    table=1(switch_out_acl), priority= 2002, match=(ct.new && (outport == “sw0-port1” && ip && icmp)), action=(ct_commit; next;)
    table=1(switch_out_acl), priority= 2002, match=(ct.new && (outport == “sw0-port1” && ip && tcp && tcp.dst == 22)), action=(ct_commit; next;)
    table=1(switch_out_acl), priority= 2001, match=(outport == “sw0-port1” && ip), action=(drop;)

When using ACLs, the default policy is to allow and track IP connections. Based on our above policy, IP traffic directed at sw0-port1 will never hit this flow at priority 1.

    table=1(switch_out_acl), priority=    1, match=(ip), action=(ct_commit; next;)
    table=1(switch_out_acl), priority=    0, match=(1), action=(next;)

Currently, ovs-sandbox’s fake datapath doesn’t support conntrack integration so looking at OpenFlow at this point won’t show the flows you’d expect. Let’s jump over to a real OpenStack environment that implements security groups using OVN ACLs to dig deeper.

Security Groups using OVN ACLs

The original OVS support in OpenStack could, and most likely will be updated to use conntrack integration to implement security groups.  In this example, we’re using Neutron integration with OVN, which just merged support for implementing security groups using OVN ACLs. This example uses a single node devstack environment as described in this document.

Let’s start with a security group that implements a policy similar to the example we started with in ovs-sandbox. OpenStack security groups drop all traffic by default. The default security group shown here has been set up to allow all outbound IP traffic and associated return traffic. It also allows inbound ICMP requests and SSH connections.

$ neutron security-group-list
+--------------------------------------+---------+-----------------------+
| id                                   | name    | security_group_rules  |
+--------------------------------------+---------+-----------------------+
| a5e41dd4-4b15-4e68-a81d-45466bda3949 | default | egress, IPv4          |
|                                      |         | egress, IPv6          |
|                                      |         | ingress, IPv4, 22/tcp |
|                                      |         | ingress, IPv4, icmp   |
+--------------------------------------+---------+-----------------------+

The OVN Neutron driver translates this to the following OVN ACLs:

$ ovn-nbctl acl-list neutron-a920d5ef-eca8-4c4f-9c24-55e29e1c03d6
from-lport  1002 (inport == "a4a81c09-4e93-41e2-be83-cfe1f8b39f77" && ip4) allow-related
from-lport  1002 (inport == "a4a81c09-4e93-41e2-be83-cfe1f8b39f77" && ip6) allow-related
from-lport  1001 (inport == "a4a81c09-4e93-41e2-be83-cfe1f8b39f77" && ip) drop
  to-lport  1002 (outport == "a4a81c09-4e93-41e2-be83-cfe1f8b39f77" && ip4 && icmp4) allow-related
  to-lport  1002 (outport == "a4a81c09-4e93-41e2-be83-cfe1f8b39f77" && ip4 && tcp && tcp.dst >= 22 && tcp.dst <= 22) allow-related
  to-lport  1001 (outport == "a4a81c09-4e93-41e2-be83-cfe1f8b39f77" && ip) drop

In the ovs-sandbox example, we looked at the egress logical flows. Let’s do that again to see the ACL stages which correspond to the to-lport direction ACLs.

$ ovn-sbctl lflow-list
...
  table=0(switch_out_pre_acl), priority=  100, match=(ip), action=(ct_next;)
  table=0(switch_out_pre_acl), priority=    0, match=(1), action=(next;)
...

We send all IP traffic through the connection tracker to initialize the ct state fields.

...
  table=1(switch_out_acl), priority=65534, match=(!ct.est && ct.rel && !ct.new && !ct.inv), action=(next;)
  table=1(switch_out_acl), priority=65534, match=(ct.est && !ct.rel && !ct.new && !ct.inv), action=(next;)

Traffic associated with existing connections is let through.

  table=1(switch_out_acl), priority=65534, match=(ct.inv), action=(drop;)

Invalid traffic is dropped.

  table=1(switch_out_acl), priority= 2002, match=(ct.new && (outport == "a4a81c09-4e93-41e2-be83-cfe1f8b39f77" && ip4 && icmp4)), action=(ct_commit; next;)
  table=1(switch_out_acl), priority= 2002, match=(ct.new && (outport == "a4a81c09-4e93-41e2-be83-cfe1f8b39f77" && ip4 && tcp && tcp.dst >= 22 && tcp.dst <= 22)), action=(ct_commit; next;)

These logical flows correspond to our ACLs. If the packet represents a new connection and that connection is IPv4 ICMP or SSH, we store info about the connection for later and allow it through.

  table=1(switch_out_acl), priority= 2001, match=(outport == "a4a81c09-4e93-41e2-be83-cfe1f8b39f77" && ip), action=(drop;)

This is our flow to drop traffic directed at our VM by default if it didn’t match one of the rules above for ICMP or SSH.

  table=1(switch_out_acl), priority=    1, match=(ip), action=(ct_commit; next;)
  table=1(switch_out_acl), priority=    0, match=(1), action=(next;)
...

Otherwise, OVN defaults to allowing traffic through.

All of that is logical flows. Now let’s look at how this is implemented in OpenFlow. The OpenFlow flows associated with ACLs in the egress logical flows are in OpenFlow tables 48 and 49.

$ sudo ovs-ofctl -O OpenFlow13 dump-flows br-int | cut -f4- -d' '
...
table=48, n_packets=22, n_bytes=2000, priority=100,ip,metadata=0x1 actions=ct(table=49,zone=NXM_NX_REG5[0..15])
table=48, n_packets=0, n_bytes=0, priority=100,ipv6,metadata=0x1 actions=ct(table=49,zone=NXM_NX_REG5[0..15])
table=48, n_packets=31490, n_bytes=3460940, priority=0,metadata=0x1 actions=resubmit(,49)
...
table=49, n_packets=0, n_bytes=0, priority=65534,ct_state=-new-est+rel-inv+trk,metadata=0x1 actions=resubmit(,50)
table=49, n_packets=14, n_bytes=1294, priority=65534,ct_state=-new+est-rel-inv+trk,metadata=0x1 actions=resubmit(,50)
table=49, n_packets=0, n_bytes=0, priority=65534,ct_state=+inv+trk,metadata=0x1 actions=drop
table=49, n_packets=0, n_bytes=0, priority=2002,ct_state=+new+trk,tcp,reg7=0x4,metadata=0x1,tp_dst=22 actions=ct(commit,zone=NXM_NX_REG5[0..15]),resubmit(,50)
table=49, n_packets=1, n_bytes=98, priority=2002,ct_state=+new+trk,icmp,reg7=0x4,metadata=0x1 actions=ct(commit,zone=NXM_NX_REG5[0..15]),resubmit(,50)
table=49, n_packets=0, n_bytes=0, priority=2001,ip,reg7=0x4,metadata=0x1 actions=drop
table=49, n_packets=0, n_bytes=0, priority=2001,ipv6,reg7=0x4,metadata=0x1 actions=drop
table=49, n_packets=7, n_bytes=608, priority=1,ip,metadata=0x1 actions=ct(commit,zone=NXM_NX_REG5[0..15]),resubmit(,50)
table=49, n_packets=0, n_bytes=0, priority=1,ipv6,metadata=0x1 actions=ct(commit,zone=NXM_NX_REG5[0..15]),resubmit(,50)
table=49, n_packets=31490, n_bytes=3460940, priority=0,metadata=0x1 actions=resubmit(,50)

This showed a pretty simple security group. Let’s make the security group a bit more complicated, add a couple more VMs, and then see what the ACLs look like. Imagine we have some sort of web app running on these three VMs. We want to allow TCP ports 80 and 443 from the outside to these VMs. Imagine also that these apps present an internal only API for the VMs to talk to each other on port 8080. So, we want any VM using this security group to be able to access other VMs on this security group on port 8080, but no access from outside. While we’re at it, we want everything to work on both IPv4 and IPv6. Here’s what the resulting security group looks like.

$ neutron security-group-list
+--------------------------------------+---------+--------------------------------------------------------------------------------+
| id                                   | name    | security_group_rules                                                           |
+--------------------------------------+---------+--------------------------------------------------------------------------------+
| a5e41dd4-4b15-4e68-a81d-45466bda3949 | default | egress, IPv4                                                                   |
|                                      |         | egress, IPv6                                                                   |
|                                      |         | ingress, IPv4, 22/tcp                                                          |
|                                      |         | ingress, IPv4, 443/tcp                                                         |
|                                      |         | ingress, IPv4, 80/tcp                                                          |
|                                      |         | ingress, IPv4, 8080/tcp, remote_group_id: a5e41dd4-4b15-4e68-a81d-45466bda3949 |
|                                      |         | ingress, IPv4, icmp                                                            |
|                                      |         | ingress, IPv6, 22/tcp                                                          |
|                                      |         | ingress, IPv6, 443/tcp                                                         |
|                                      |         | ingress, IPv6, 80/tcp                                                          |
|                                      |         | ingress, IPv6, 8080/tcp, remote_group_id: a5e41dd4-4b15-4e68-a81d-45466bda3949 |
|                                      |         | ingress, IPv6, icmp                                                            |
+--------------------------------------+---------+--------------------------------------------------------------------------------+

Now, after booting a couple more VMs (for a total of 3), Neutron’s OVN plugin has created the following ACLs. All of these will get automatically translated into logical flows, and then translated into OpenFlow flows by the local ovn-controller on each hypervisor as appropriate.

$ ovn-nbctl acl-list neutron-a920d5ef-eca8-4c4f-9c24-55e29e1c03d6
from-lport  1002 (inport == "62848020-ba3b-445c-a8a9-c13094648b34" && ip4) allow-related
from-lport  1002 (inport == "62848020-ba3b-445c-a8a9-c13094648b34" && ip6) allow-related
from-lport  1002 (inport == "6353ad55-f6e7-4bc5-9e5d-55e975b6736e" && ip4) allow-related
from-lport  1002 (inport == "6353ad55-f6e7-4bc5-9e5d-55e975b6736e" && ip6) allow-related
from-lport  1002 (inport == "a4a81c09-4e93-41e2-be83-cfe1f8b39f77" && ip4) allow-related
from-lport  1002 (inport == "a4a81c09-4e93-41e2-be83-cfe1f8b39f77" && ip6) allow-related
from-lport  1001 (inport == "62848020-ba3b-445c-a8a9-c13094648b34" && ip) drop
from-lport  1001 (inport == "6353ad55-f6e7-4bc5-9e5d-55e975b6736e" && ip) drop
from-lport  1001 (inport == "a4a81c09-4e93-41e2-be83-cfe1f8b39f77" && ip) drop
  to-lport  1002 (outport == "62848020-ba3b-445c-a8a9-c13094648b34" && ip4 && icmp4) allow-related
  to-lport  1002 (outport == "62848020-ba3b-445c-a8a9-c13094648b34" && ip4 && inport == {"6353ad55-f6e7-4bc5-9e5d-55e975b6736e","a4a81c09-4e93-41e2-be83-cfe1f8b39f77"} && tcp && tcp.dst >= 8080 && tcp.dst <= 8080) allow-related
  to-lport  1002 (outport == "62848020-ba3b-445c-a8a9-c13094648b34" && ip4 && tcp && tcp.dst >= 22 && tcp.dst <= 22) allow-related
  to-lport  1002 (outport == "62848020-ba3b-445c-a8a9-c13094648b34" && ip4 && tcp && tcp.dst >= 443 && tcp.dst <= 443) allow-related
  to-lport  1002 (outport == "62848020-ba3b-445c-a8a9-c13094648b34" && ip4 && tcp && tcp.dst >= 80 && tcp.dst <= 80) allow-related
  to-lport  1002 (outport == "62848020-ba3b-445c-a8a9-c13094648b34" && ip6 && icmp6) allow-related
  to-lport  1002 (outport == "62848020-ba3b-445c-a8a9-c13094648b34" && ip6 && inport == {"6353ad55-f6e7-4bc5-9e5d-55e975b6736e","a4a81c09-4e93-41e2-be83-cfe1f8b39f77"} && tcp && tcp.dst >= 8080 && tcp.dst <= 8080) allow-related
  to-lport  1002 (outport == "62848020-ba3b-445c-a8a9-c13094648b34" && ip6 && tcp && tcp.dst >= 22 && tcp.dst <= 22) allow-related
  to-lport  1002 (outport == "62848020-ba3b-445c-a8a9-c13094648b34" && ip6 && tcp && tcp.dst >= 443 && tcp.dst <= 443) allow-related
  to-lport  1002 (outport == "62848020-ba3b-445c-a8a9-c13094648b34" && ip6 && tcp && tcp.dst >= 80 && tcp.dst <= 80) allow-related
  to-lport  1002 (outport == "6353ad55-f6e7-4bc5-9e5d-55e975b6736e" && ip4 && icmp4) allow-related
  to-lport  1002 (outport == "6353ad55-f6e7-4bc5-9e5d-55e975b6736e" && ip4 && inport == {"62848020-ba3b-445c-a8a9-c13094648b34","a4a81c09-4e93-41e2-be83-cfe1f8b39f77"} && tcp && tcp.dst >= 8080 && tcp.dst <= 8080) allow-related
  to-lport  1002 (outport == "6353ad55-f6e7-4bc5-9e5d-55e975b6736e" && ip4 && tcp && tcp.dst >= 22 && tcp.dst <= 22) allow-related
  to-lport  1002 (outport == "6353ad55-f6e7-4bc5-9e5d-55e975b6736e" && ip4 && tcp && tcp.dst >= 443 && tcp.dst <= 443) allow-related
  to-lport  1002 (outport == "6353ad55-f6e7-4bc5-9e5d-55e975b6736e" && ip4 && tcp && tcp.dst >= 80 && tcp.dst <= 80) allow-related
  to-lport  1002 (outport == "6353ad55-f6e7-4bc5-9e5d-55e975b6736e" && ip6 && icmp6) allow-related
  to-lport  1002 (outport == "6353ad55-f6e7-4bc5-9e5d-55e975b6736e" && ip6 && inport == {"62848020-ba3b-445c-a8a9-c13094648b34","a4a81c09-4e93-41e2-be83-cfe1f8b39f77"} && tcp && tcp.dst >= 8080 && tcp.dst <= 8080) allow-related
  to-lport  1002 (outport == "6353ad55-f6e7-4bc5-9e5d-55e975b6736e" && ip6 && tcp && tcp.dst >= 22 && tcp.dst <= 22) allow-related
  to-lport  1002 (outport == "6353ad55-f6e7-4bc5-9e5d-55e975b6736e" && ip6 && tcp && tcp.dst >= 443 && tcp.dst <= 443) allow-related
  to-lport  1002 (outport == "6353ad55-f6e7-4bc5-9e5d-55e975b6736e" && ip6 && tcp && tcp.dst >= 80 && tcp.dst <= 80) allow-related
  to-lport  1002 (outport == "a4a81c09-4e93-41e2-be83-cfe1f8b39f77" && ip4 && icmp4) allow-related
  to-lport  1002 (outport == "a4a81c09-4e93-41e2-be83-cfe1f8b39f77" && ip4 && inport == {"62848020-ba3b-445c-a8a9-c13094648b34","6353ad55-f6e7-4bc5-9e5d-55e975b6736e"} && tcp && tcp.dst >= 8080 && tcp.dst <= 8080) allow-related
  to-lport  1002 (outport == "a4a81c09-4e93-41e2-be83-cfe1f8b39f77" && ip4 && tcp && tcp.dst >= 22 && tcp.dst <= 22) allow-related
  to-lport  1002 (outport == "a4a81c09-4e93-41e2-be83-cfe1f8b39f77" && ip4 && tcp && tcp.dst >= 443 && tcp.dst <= 443) allow-related
  to-lport  1002 (outport == "a4a81c09-4e93-41e2-be83-cfe1f8b39f77" && ip4 && tcp && tcp.dst >= 80 && tcp.dst <= 80) allow-related
  to-lport  1002 (outport == "a4a81c09-4e93-41e2-be83-cfe1f8b39f77" && ip6 && icmp6) allow-related
  to-lport  1002 (outport == "a4a81c09-4e93-41e2-be83-cfe1f8b39f77" && ip6 && inport == {"62848020-ba3b-445c-a8a9-c13094648b34","6353ad55-f6e7-4bc5-9e5d-55e975b6736e"} && tcp && tcp.dst >= 8080 && tcp.dst <= 8080) allow-related
  to-lport  1002 (outport == "a4a81c09-4e93-41e2-be83-cfe1f8b39f77" && ip6 && tcp && tcp.dst >= 22 && tcp.dst <= 22) allow-related
  to-lport  1002 (outport == "a4a81c09-4e93-41e2-be83-cfe1f8b39f77" && ip6 && tcp && tcp.dst >= 443 && tcp.dst <= 443) allow-related
  to-lport  1002 (outport == "a4a81c09-4e93-41e2-be83-cfe1f8b39f77" && ip6 && tcp && tcp.dst >= 80 && tcp.dst <= 80) allow-related
  to-lport  1001 (outport == "62848020-ba3b-445c-a8a9-c13094648b34" && ip) drop
  to-lport  1001 (outport == "6353ad55-f6e7-4bc5-9e5d-55e975b6736e" && ip) drop
  to-lport  1001 (outport == "a4a81c09-4e93-41e2-be83-cfe1f8b39f77" && ip) drop

Possible Future Work

The biggest issue we have with this is just how new it is. It requires compiling and loading a custom version of the openvswitch kernel module from a custom branch of ovs. All of that is handled automatically by our devstack plugin, but it’s not exactly what you’d want for production usage. As the kernel backport is finalized, we expect it to be backported into distro kernels as well, which will make this much more consumable. It will certainly be backported for RHEL 7 and its derivatives.

I’m looking forward to seeing what other features get implemented using OVS+conntrack, both for OVN and beyond!

Bridging Asterisk RTP streams with OVS

I’m at the AstriCon conference this week, which is a conference built around the Asterisk open source project.  I worked on the Asterisk project for about 7 years before joining Red Hat to hack on cloud infrastructure.  I also helped write a book about it.  While I’m not working on Asterisk directly anymore, I still find it a very interesting project.  The community is full of great people.  Another reason I still pay attention is that communications infrastructure in general is an incredibly important use case for cloud infrastructure. The telco world is going through a rapid transformation with SDN and NFV.

I did a keynote at AstriCon last year about open cloud infrastructure and its importance to Asterisk and communications infrastructure more broadly.  This year I did a talk more focused on networking and how some of the SDN trends apply to this project.  One of the things this conference has started doing is have a session called “dangerous demos”.  The idea is for people to come up on stage and attempt a short (3-5 minute) live demo.  They give awards for various categories, including the most amusing case of a demo crashing and burning, as is often the case with live demos, especially using conference wifi.  Sounds fun, doesn’t it?  I thought so.

Last Friday I set off to see what kind of demo I could whip up in an afternoon.  Here’s what I came up with.

Asterisk Call Bridging

Before getting to the demo, it’s important to have some background on how Asterisk and some related technologies work.  Asterisk supports many different communications technologies.  It supports many different methods of traditional telephone network (PSTN) connectivity.  It also supports several Voice over IP (VoIP) protocols.  Any connection to the system via any of these technologies is represented as an Asterisk channel.

[A Single Call Leg, Represented by a Single Channel]

In some cases, there is only one channel.  This is when Asterisk itself is the endpoint of the call.  Some traditional examples would be something like voicemail or a system that implements an IVR such as an automated system to make payments on account.

It’s also common to have two channels bridged together.  Imagine two phones on a call talking to each other.

[Two Call Legs Represented by Two Channels]

Architecturally, there are some layers involved here.  There is channel technology abstraction so that two channels using different technologies can still be bridged together.

[Channel Technology and Abstract Channel Layers]

This is an incredibly powerful part of Asterisk’s architecture.  It lets you bridge new technologies like WebRTC to traditional telephony protocols.  However, bridging media streams through the abstract channel layer is not the most efficient way to do it if the two channels bridged together are actually the same technology.  So, Asterisk also has a concept of “native bridging”.  This lets channel technology implementations implement more efficient ways of bridging.

SIP is the most commonly used VoIP protocol.  SIP is actually just a signaling (control) protocol.  The actual media streams are independent streams using the RTP protocol.  In some cases, the media streams can be sent directly between endpoints, but not always.  Asterisk sometimes has to transcode the media streams between two different codecs.  Another common case is that the streams may be fully compatible, but the system is used to put all streams through a controlled point (or set of points) at the edge of a company’s network. This use case is sometimes referred to as a Session Border Controller (SBC).

An RTP stream is a good example of a painful scenario for packet processing performance.  It’s a stream of small packets.  A typical RTP stream would be 50 UDP packets per second in each direction.  Each packet would hold 20 milliseconds of audio.  This can be different.  You can increase packet sizes, but it comes at the cost of increasing latency into the call. 20 ms of audio using G.711 is 160 bytes of audio payload. There are several other codecs that may increase or decrease the audio payload. For example, 20 ms using G.729 would be only 20 bytes of audio payload. Every packet also includes ethernet, IP, UDP, and RTP headers.

When two of these RTP streams are bridged in Asterisk, there is a thread handling the call that’s polling on two UDP sockets.  When a packet comes in on one socket, it’s processed if necessary and then written out to the other socket.

You can find a somewhat dated chapter that I wrote several years ago about Asterisk in the book “Architecture of Open Source Applications”. I re-used some of the diagrams from that chapter for this post.

The Demo

This demo is targeted at the case of Asterisk bridging two RTP streams that are fully compatible (same codec, same payload sizes, among other things).  During my talk about “SDN and Asterisk” yesterday, I talked about several things. One thing I talked about is how the Linux networking datapath is becoming more programmable and I talked about Open vSwitch (OVS) as a specific example of that.

My demo consists of two VMs on my laptop (asterisk1 and asterisk2).  They both have a single vCPU and 1 GB of RAM.

asterisk1 serves as both endpoints of calls passing through asterisk2, so asterisk2 is doing bridging of compatible RTP streams.  Both ends of the call on asterisk1 are executing the Milliwatt() application, which just generates a tone. Each call looks like this:

call-topology

I also customized the networking configuration on asterisk2. Instead of just having eth0, I have an OVS bridge named breth0 and eth0 is attached to that bridge.

[rbryant@asterisk2 ~]$ sudo ovs-vsctl show
e00ae5a3-5f81-476e-b40c-ff0c03817dea
    Bridge "breth0"
        fail_mode: standalone
        Port "eth0"
            Interface "eth0"
        Port "breth0"
            Interface "breth0"
                type: internal
    ovs_version: "2.4.0"

[rbryant@asterisk2 ~]$ ip addr list breth0
4: breth0@NONE: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default 
    link/ether 52:54:00:31:cf:ce brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.149/24 brd 192.168.122.255 scope global dynamic breth0
       valid_lft 2258sec preferred_lft 2258sec
    inet6 fe80::5054:ff:fe31:cfce/64 scope link 
       valid_lft forever preferred_lft forever

With this setup in place, I generated 100 calls, which means both asterisk1 and asterisk2 have 200 active channels.  On asterisk1:

asterisk1*CLI> core show channels
Channel              Location             State   Application(Data)             
SIP/asterisk2-000000 555@public:2         Up      Milliwatt()                   
SIP/asterisk2-000000 555@public:2         Up      Milliwatt()                   
SIP/asterisk2-000000 555@public:2         Up      Milliwatt()                   
...
200 active channels
200 active calls
200 calls processed

and on asterisk2:

asterisk2*CLI> core show channels
Channel              Location             State   Application(Data)             
SIP/asterisk1-000000 (None)               Up      AppDial((Outgoing Line))      
SIP/asterisk1-000000 555@public:1         Up      Dial(SIP/555@asterisk1)       
SIP/asterisk1-000000 (None)               Up      AppDial((Outgoing Line))   
SIP/asterisk1-000000 555@public:1         Up      Dial(SIP/555@asterisk1)          
...
200 active channels
100 active calls
100 calls processed

Why 200 channels? It’s a nice round number. It also generates enough load on asterisk2 for the demo without making my laptop melt.

I mentioned earlier that in this case, Asterisk does the bridging of two RTP streams in a thread that’s polling on two UDP sockets, reading packets from one, doing any necessary processing, and then writing it back out to the other socket. In this scenario, Asterisk is using roughly 25% of the vCPU on asterisk2.

What if in the simple forwarding case we could push this forwarding down into the kernel?

To pull this off, first I needed to know about all of the RTP streams active on asterisk2. I actually need to know about pairs of RTP streams. When a packet arrives on one stream, I need to know what other stream it’s associated with for sending it back out. Asterisk honestly does not make it very easy to get this information. You can get it using the CHANNEL() function. I probably could have written an AMI script to get the info I needed. I’m not sure if I could have done it with ARI.  All of that sounded like too much work for my Friday afternoon hack.  The easiest way for me was to write a custom Asterisk C module that provided a CLI command to dump all of the info I wanted.  Here’s the relevant code minus all of the module and CLI command boilerplate code:

	struct ast_channel *chan;
	struct ast_channel_iterator *chan_iter;
	chan_iter = ast_channel_iterator_all_new();
	for (; (chan = ast_channel_iterator_next(chan_iter)); ast_channel_unref(chan)) {
		char src[1024] = "";
		char dest[1024] = "";
		char src2[1024] = "";
		char dest2[1024] = "";
		struct ast_channel *chan2;
		ast_func_read(chan, "CHANNEL(rtpsource)", src, sizeof(src));
		ast_func_read(chan, "CHANNEL(rtpdest)", dest, sizeof(dest));
		chan2 = ast_bridged_channel(chan);
		ast_func_read(chan2, "CHANNEL(rtpsource)", src2, sizeof(src2));
		ast_func_read(chan2, "CHANNEL(rtpdest)", dest2, sizeof(dest2));
		ast_cli(a->fd, "%s %s %s %s\n", src, dest, src2, dest2);
	}
	ast_channel_iterator_destroy(chan_iter);

This code is a terrible hack that you’d never use on anything but this controlled environment, but it got me the info I wanted quickly.  The output looks something like this:

asterisk2*CLI> rtpstreams 
0.0.0.0:12164 192.168.122.130:10322 0.0.0.0:18364 192.168.122.130:19818
0.0.0.0:10364 192.168.122.130:15394 0.0.0.0:10110 192.168.122.130:17640
0.0.0.0:10110 192.168.122.130:17640 0.0.0.0:10364 192.168.122.130:15394
...

Now that we have the info we need about RTP stream pairs, we want to program the OVS bridge to do the RTP forwarding for us. We do that using the OpenFlow protocol. In this case, we’ll use the ovs-ofctl command line utility to create and delete flows as needed.

I don’t intend to go into any great detail about OpenFlow or how OVS works, but I think a really high level overview of flows is needed to be able to understand what happens next. OpenFlow lets you define a multi-stage packet processing pipeline. Each stage is a table. Processing starts in table 0. Processing may continue in other tables based on what actions are executed. Each flow in a table has a priority. The flow that gets executed in a table is the one with the highest priority that matches the packet. If multiple flows at the same priority match, which one gets executed is undefined.

What we want are flows that match an incoming RTP stream. In this demo we create flows with the following match conditions: the packet arrived on eth0, it’s a UDP packet, and the UDP destination port number is N. When a packet matches one of our flows, we execute these actions: change the source and destination MAC addresses, change the source and destination IP addresses, change the source and destination UDP port numbers, and send the packet back out where it came from (eth0).

An example command to install a flow like this would be:

sudo ovs-ofctl -O OpenFlow13 add-flow breth0 priority=100,in_port=1,udp,udp_dst=10758,actions=mod_dl_src:52:54:00:31:cf:ce,mod_dl_dst:52:54:00:88:75:61,mod_nw_src:192.168.122.148,mod_nw_dst:192.168.122.130,mod_tp_src:14508,mod_tp_dst:10060,in_port

Of course, typing up 200 of those would be pretty tiring, so I just scripted it. Here is a simple Python script to generate all of the flows we need:

#!/usr/bin/env python

import os
import subprocess

asterisk1_mac = '52:54:00:88:75:61'
asterisk2_mac = '52:54:00:31:cf:ce'
asterisk1_ip = '192.168.122.130'
asterisk2_ip = '192.168.122.148'

output = subprocess.check_output(['sudo', 'asterisk', '-rx', 'rtpstreams'])
pairs = []
for l in output.splitlines():
    parts = l.split()
    if parts[0] == 'Setting':
        continue
    try:
        pair = ((parts[0].split(':')[1], parts[1].split(':')[1]),
                (parts[2].split(':')[1], parts[3].split(':')[1]))
    except:
        print "Failed to parse parts: %s" % parts
    reverse_pair = (pair[1], pair[0])
    if reverse_pair not in pairs:
        pairs.append(pair)

for p in pairs:
    os.system('sudo ovs-ofctl -O OpenFlow13 add-flow breth0 '
            'priority=100,in_port=1,udp,'
            'udp_dst=%s,actions=mod_dl_src:%s,mod_dl_dst:%s,'
            'mod_nw_src:%s,mod_nw_dst:%s,'
            'mod_tp_src:%s,mod_tp_dst:%s,in_port'
            % (p[0][0],
               asterisk2_mac, asterisk1_mac,
               asterisk2_ip, asterisk1_ip,
               p[1][0], p[1][1]))
    os.system('sudo ovs-ofctl -O OpenFlow13 add-flow breth0 '
            'priority=100,in_port=1,udp,'
            'udp_dst=%s,actions=mod_dl_src:%s,mod_dl_dst:%s,'
            'mod_nw_src:%s,mod_nw_dst:%s,'
            'mod_tp_src:%s,mod_tp_dst:%s,in_port'
            % (p[1][0],
               asterisk2_mac, asterisk1_mac,
               asterisk2_ip, asterisk1_ip,
               p[0][0], p[0][1]))

After running the above script, we can view the flows on breth0 using the following command:

[rbryant@asterisk2 ~]$ sudo ovs-ofctl -O OpenFlow13 dump-flows breth0 | grep table | cut -f4- -d' '
table=0, n_packets=591, n_bytes=126474, priority=100,udp,in_port=1,tp_dst=12164 actions=set_field:52:54:00:31:cf:ce->eth_src,set_field:52:54:00:88:75:61->eth_dst,set_field:192.168.122.148->ip_src,set_field:192.168.122.130->ip_dst,set_field:18364->udp_src,set_field:19818->udp_dst,IN_PORT
table=0, n_packets=588, n_bytes=125832, priority=100,udp,in_port=1,tp_dst=18364 actions=set_field:52:54:00:31:cf:ce->eth_src,set_field:52:54:00:88:75:61->eth_dst,set_field:192.168.122.148->ip_src,set_field:192.168.122.130->ip_dst,set_field:12164->udp_src,set_field:10322->udp_dst,IN_PORT
table=0, n_packets=588, n_bytes=125832, priority=100,udp,in_port=1,tp_dst=10364 actions=set_field:52:54:00:31:cf:ce->eth_src,set_field:52:54:00:88:75:61->eth_dst,set_field:192.168.122.148->ip_src,set_field:192.168.122.130->ip_dst,set_field:10110->udp_src,set_field:17640->udp_dst,IN_PORT
...

We can see in the n_packets field of each flow that packets are matching all of our flows for forwarding RTP streams.

Here’s what’s really cool about this. After these flows are configured, Asterisk takes up less than 1% of the vCPU and the vCPU is 96-97% idle.

If we want to clear all of these flows and let RTP go back through Asterisk in userspace, we can run this script:

#!/bin/bash

for n in $(sudo ovs-ofctl -O OpenFlow13 dump-flows breth0 | grep "priority=100" | cut -f7 -d' ') ; do
    sudo ovs-ofctl -O OpenFlow13 del-flows --strict breth0 $n
done

At this point, the CPU usage jumps back up to where it was before.

Future Work

This was just the result of an afternoon hack.  My primary goal was just to spur some interest in exploring how cool things happening in the SDN space could provide new ways of doing things.

If someone wanted to explore doing this in Asterisk more seriously, you could write some code in Asterisk that could speak OpenFlow to the local OVS bridge to create and delete flows as needed.  You could also imagine the possibility of speaking OpenFlow to a top-of-rack switch to push the forwarding out of the host completely, yet still through a controlled point in your network.

Another major caveat in this demo is that OVS and OpenFlow don’t know what RTP is. There’s no way (that I know of) to do any sort of validation on the packets before forwarding them along.  If one end started sending garbage, this setup would happily forward it along.  It’s up to you how much that matters.  RTP devices are supposed to be built for the possibility of media streaming directly between endpoints, and in that case, there’s nothing in the middle doing any checking of things.

If you were at AstriCon, thank you for coming to my talk and/or demo.  To everyone, I hope you found this interesting and that it inspires you to go off and learn more about this cool technology!