May | 2013 | Russell Bryant

UPDATE 2014-06-18: There was a talk at the last OpenStack Summit in Atlanta on this topic, Divide and Conquer: Resource Segregation in the OpenStack Cloud.

Confusion around Host Aggregates and Availabaility Zones in Nova seems to be very common. In this post I’ll attempt to show how each are used. All information in this post is based on the way things work in the Grizzly version of Nova.

First, go ahead and forget everything you know about things called Availability Zones in other systems. They are not the same thing and trying to map Nova’s concept of Availability Zones to what something else calls Availability Zones will only cause confusion.

The high level view is this: A host aggregate is a grouping of hosts with associated metadata. A host can be in more than one host aggregate. The concept of host aggregates is only exposed to cloud administrators.

A host aggregate may be exposed to users in the form of an availability zone. When you create a host aggregate, you have the option of providing an availability zone name. If specified, the host aggregate you have created is now available as an availability zone that can be requested.

Here is a tour of some commands.

Create a host aggregate:

$ nova aggregate-create test-aggregate1
+----+-----------------+-------------------+-------+----------+
| Id | Name            | Availability Zone | Hosts | Metadata |
+----+-----------------+-------------------+-------+----------+
| 1  | test-aggregate1 | None              |       |          |
+----+-----------------+-------------------+-------+----------+

Create a host aggregate that is exposed to users as an availability zone. (This is not creating a host aggregate within an availability zone! It is creating a host aggregate that is the availability zone!)

$ nova aggregate-create test-aggregate2 test-az
+----+-----------------+-------------------+-------+----------+
| Id | Name            | Availability Zone | Hosts | Metadata |
+----+-----------------+-------------------+-------+----------+
| 2  | test-aggregate2 | test-az           |       |          |
+----+-----------------+-------------------+-------+----------+

Add a host to a host aggregate, test-aggregate2. Since this host aggregate defines the availability zone test-az, adding a host to this aggregate makes it a part of the test-az availability zone.

nova aggregate-add-host 2 devstack
Aggregate 2 has been successfully updated.
+----+-----------------+-------------------+---------------+------------------------------------+
| Id | Name            | Availability Zone | Hosts         | Metadata                           |
+----+-----------------+-------------------+---------------+------------------------------------+
| 2  | test-aggregate2 | test-az           | [u'devstack'] | {u'availability_zone': u'test-az'} |
+----+-----------------+-------------------+---------------+------------------------------------+

Note that the novaclient output shows the availability zone twice. The data model on the backend only stores the availability zone in the metadata. There is not a separate column for it. The API returns the availability zone separately from the general list of metadata, though, since it’s a special piece of metadata.

Now that the test-az availability zone has been defined and contains one host, a user can boot an instance and request this availability zone.

$ nova boot --flavor 84 --image 64d985ba-2cfa-434d-b789-06eac141c260 \
> --availability-zone test-az testinstance
$ nova show testinstance
+-------------------------------------+----------------------------------------------------------------+
| Property                            | Value                                                          |
+-------------------------------------+----------------------------------------------------------------+
| status                              | BUILD                                                          |
| updated                             | 2013-05-21T19:46:06Z                                           |
| OS-EXT-STS:task_state               | spawning                                                       |
| OS-EXT-SRV-ATTR:host                | devstack                                                       |
| key_name                            | None                                                           |
| image                               | cirros-0.3.1-x86_64-uec (64d985ba-2cfa-434d-b789-06eac141c260) |
| private network                     | 10.0.0.2                                                       |
| hostId                              | f038bdf5ff35e90f0a47e08954938b16f731261da344e87ca7172d3b       |
| OS-EXT-STS:vm_state                 | building                                                       |
| OS-EXT-SRV-ATTR:instance_name       | instance-00000002                                              |
| OS-EXT-SRV-ATTR:hypervisor_hostname | devstack                                                       |
| flavor                              | m1.micro (84)                                                  |
| id                                  | 107d332a-a351-451e-9cd8-aa251ce56006                           |
| security_groups                     | [{u'name': u'default'}]                                        |
| user_id                             | d0089a5a8f5440b587606bc9c5b2448d                               |
| name                                | testinstance                                                   |
| created                             | 2013-05-21T19:45:48Z                                           |
| tenant_id                           | 6c9cfd6c838d4c29b58049625efad798                               |
| OS-DCF:diskConfig                   | MANUAL                                                         |
| metadata                            | {}                                                             |
| accessIPv4                          |                                                                |
| accessIPv6                          |                                                                |
| progress                            | 0                                                              |
| OS-EXT-STS:power_state              | 0                                                              |
| OS-EXT-AZ:availability_zone         | test-az                                                        |
| config_drive                        |                                                                |
+-------------------------------------+----------------------------------------------------------------+

All of the examples so far show how host-aggregates provide an API driven mechanism for cloud administrators to define availability zones. The other use case host aggregates serves is a way to tag a group of hosts with a type of capability. When creating custom flavors, you can set a requirement for a capability. When a request is made to boot an instance of that type, it will only consider hosts in host aggregates tagged with this capability in its metadata.

We can add some metadata to the original host aggregate we created that was *not* also an availability zone, test-aggregate1.

$ nova aggregate-set-metadata 1 coolhardware=true
Aggregate 1 has been successfully updated.
+----+-----------------+-------------------+-------+----------------------------+
| Id | Name            | Availability Zone | Hosts | Metadata                   |
+----+-----------------+-------------------+-------+----------------------------+
| 1  | test-aggregate1 | None              | []    | {u'coolhardware': u'true'} |
+----+-----------------+-------------------+-------+----------------------------+

A flavor can include a set of key/value pairs called extra_specs. Here’s an example of creating a flavor that will only run on hosts in an aggregate with the coolhardware=true metadata.

$ nova flavor-create --is-public true m1.coolhardware 100 2048 20 2
+-----+-----------------+-----------+------+-----------+------+-------+-------------+-----------+
| ID  | Name            | Memory_MB | Disk | Ephemeral | Swap | VCPUs | RXTX_Factor | Is_Public |
+-----+-----------------+-----------+------+-----------+------+-------+-------------+-----------+
| 100 | m1.coolhardware | 2048      | 20   | 0         |      | 2     | 1.0         | True      |
+-----+-----------------+-----------+------+-----------+------+-------+-------------+-----------+
$ nova flavor-key 100 set coolhardware=true
$ nova flavor-show 100
+----------------------------+----------------------------+
| Property                   | Value                      |
+----------------------------+----------------------------+
| name                       | m1.coolhardware            |
| ram                        | 2048                       |
| OS-FLV-DISABLED:disabled   | False                      |
| vcpus                      | 2                          |
| extra_specs                | {u'coolhardware': u'true'} |
| swap                       |                            |
| os-flavor-access:is_public | True                       |
| rxtx_factor                | 1.0                        |
| OS-FLV-EXT-DATA:ephemeral  | 0                          |
| disk                       | 20                         |
| id                         | 100                        |
+----------------------------+----------------------------+

Hopefully this provides some useful information on what host aggregates and availability zones are, and how they are used.

The Havana design summit was held mid-April. Since then we have been documenting the Havana roadmap and going full speed ahead on development of these features. The list of features that developers have committed to completing for the Havana release is tracked using blueprints on Launchpad. At the time of writing, we have 74 blueprints listed that cover a wide range of development efforts. Here are some highlights in no particular order:

Database Handling

Vish Ishaya made a change at the very beginning of the development cycle that will allow us to backport database migrations to the Grizzly release if needed. This is needed in case we need to backport a bug fix that requires a migration.

Dan Smith and Chris Behrens are working on a unified object model. One of the things that has been in the way of rolling upgrades of a Nova deployment is that the code and the database schema are very tightly coupled. The primary goal of this effort is to decouple these things. This effort is bringing in some other improvements, as well, including better object serialization handling for rpc, as well as object versioning.

Boris Pavlovic continues to do a lot of cleanup of database support in Nova. He’s adding tests (and more tests), adding unique constraints, improving session handling, and improving archiving.

Chris Behrens has been working on a native MySQL database driver that performs much better than the SQLAlchemy driver for use in large scale deployments.

Mike Wilson is working on supporting read-only database slaves. This will allow distributing some queries to other database servers to help scaling in large scale deployments.

Bare Metal

The Grizzly release of Nova included the bare metal provisioning driver. Interest in this functionality has been rapidly increasing. Devananda van der Veen proposed that the bare metal provisioning code be split out into a new project called Ironic. The new project was approved for incubation by the OpenStack Technical Committee last week. Once this has been completed, there will be a driver in Nova that talks to the Ironic API. The Ironic API will present some additional functionality that doesn’t make sense to use to present in the Compute API in Nova.

Prior to the focus shift to Ironic, some new features were added to the bare metal driver. USC-ISI added support for Tilera and Devananda added a feature that allows you to request a specific bare metal node when provisioning a server.

Version 3 (v3) of the Compute API

The Havana release will include a new revision of the compute REST API in Nova. This effort is being led by Christopher Yeoh, with help from others. The v3 API will include a new framework for implementing extensions, extension versioning, and a whole bunch of cleanup: (1) (2) (3) (4).

Networking

The OpenStack community has been maintaining two network stacks for some time. Nova includes the nova-network service. Meanwhile, the OpenStack Networking project has been developed from scratch to support much more than nova-network does. Nova currently supports both. OpenStack Networking is expected to reach and surpass feature parity with nova-network in the Havana cycle. As a result, it’s time to deprecate nova-network. Vish Ishaya (from the Nova side) and Gary Kotton (from the OpenStack Networking side) have agreed to take on the challenging task of figuring out how to migrate existing deployments using nova-network to an updated environment that includes OpenStack Networking.

Scheduling

The Havana roadmap includes a mixed bag of scheduler features.

Andrew Laski is going to make the changes required so that the scheduler becomes exclusively a resource that gets queried. Currently, when starting an instance, the request is handed off to the scheduler, which then hands it off to the compute node that is selected. This change will make it so proxying through nova-scheduler is no longer done. This will mean that every operation that uses the scheduler will interact with it the same way, as opposed to some operations querying and others proxying.

Phil Day will be adding an API extension that allows you to discover which scheduler hints are supported. Phil is also looking at adding a way to allocate an entire host to a single tenant.

Inbar Shapira is looking at allowing multiple scheduling policies to be in effect at the same time. This will allow you to have different sets of scheduler filters activated depending on some type of criteria (perhaps the requested availability zone).

Rerngvit Yanggratoke is implementing support for weighting scheduling decisions based on the CPU utilization of existing instances on a host.

Migrations

Nova includes support for different types of migrations. We have cold migrations (migrate) and live migrations (live-migrate). We also have resize and evactuate, which are very related functions. The code paths for all of these features have evolved separately. It turns out that we can rework all of these things to share a lot of code. While we’re at it, we are restructuring the way these operations work to be primarily driven by the nova-conductor service. This will allow the tasks to be tracked in a single place, as opposed to the flow of control being passed around between compute nodes. Having compute nodes tell each other what to do is also a very bad thing from a security perspective. These efforts are well underway. Tiago Rodrigues de Mello is working on moving cold migrations to nova-conductor and John Garbutt is working on moving live migrations. All of this is tracked under the parent blueprint for unified migrations.

And More!

This post doesn’t include every feature on the roadmap. You can find that here. I fully expect that more will be added to this list as Havana progresses. We don’t always know what features are being worked on in advance. If you have another feature you would like to propose, let’s talk about it on the openstack-dev list!

Russell Bryant

Open Source Software Engineering

Monthly Archives: May 2013

Availability Zones and Host Aggregates in OpenStack Compute (Nova)

OpenStack Compute (Nova) Roadmap for Havana