Archive
OpenStack Compute (Nova) Roadmap for Havana
The Havana design summit was held mid-April. Since then we have been documenting the Havana roadmap and going full speed ahead on development of these features. The list of features that developers have committed to completing for the Havana release is tracked using blueprints on Launchpad. At the time of writing, we have 74 blueprints listed that cover a wide range of development efforts. Here are some highlights in no particular order:
Database Handling
Vish Ishaya made a change at the very beginning of the development cycle that will allow us to backport database migrations to the Grizzly release if needed. This is needed in case we need to backport a bug fix that requires a migration.
Dan Smith and Chris Behrens are working on a unified object model. One of the things that has been in the way of rolling upgrades of a Nova deployment is that the code and the database schema are very tightly coupled. The primary goal of this effort is to decouple these things. This effort is bringing in some other improvements, as well, including better object serialization handling for rpc, as well as object versioning.
Boris Pavlovic continues to do a lot of cleanup of database support in Nova. He’s adding tests (and more tests), adding unique constraints, improving session handling, and improving archiving.
Chris Behrens has been working on a native MySQL database driver that performs much better than the SQLAlchemy driver for use in large scale deployments.
Mike Wilson is working on supporting read-only database slaves. This will allow distributing some queries to other database servers to help scaling in large scale deployments.
Bare Metal
The Grizzly release of Nova included the bare metal provisioning driver. Interest in this functionality has been rapidly increasing. Devananda van der Veen proposed that the bare metal provisioning code be split out into a new project called Ironic. The new project was approved for incubation by the OpenStack Technical Committee last week. Once this has been completed, there will be a driver in Nova that talks to the Ironic API. The Ironic API will present some additional functionality that doesn’t make sense to use to present in the Compute API in Nova.
Prior to the focus shift to Ironic, some new features were added to the bare metal driver. USC-ISI added support for Tilera and Devananda added a feature that allows you to request a specific bare metal node when provisioning a server.
Version 3 (v3) of the Compute API
The Havana release will include a new revision of the compute REST API in Nova. This effort is being led by Christopher Yeoh, with help from others. The v3 API will include a new framework for implementing extensions, extension versioning, and a whole bunch of cleanup: (1) (2) (3) (4).
Networking
The OpenStack community has been maintaining two network stacks for some time. Nova includes the nova-network service. Meanwhile, the OpenStack Networking project has been developed from scratch to support much more than nova-network does. Nova currently supports both. OpenStack Networking is expected to reach and surpass feature parity with nova-network in the Havana cycle. As a result, it’s time to deprecate nova-network. Vish Ishaya (from the Nova side) and Gary Kotton (from the OpenStack Networking side) have agreed to take on the challenging task of figuring out how to migrate existing deployments using nova-network to an updated environment that includes OpenStack Networking.
Scheduling
The Havana roadmap includes a mixed bag of scheduler features.
Andrew Laski is going to make the changes required so that the scheduler becomes exclusively a resource that gets queried. Currently, when starting an instance, the request is handed off to the scheduler, which then hands it off to the compute node that is selected. This change will make it so proxying through nova-scheduler is no longer done. This will mean that every operation that uses the scheduler will interact with it the same way, as opposed to some operations querying and others proxying.
Phil Day will be adding an API extension that allows you to discover which scheduler hints are supported. Phil is also looking at adding a way to allocate an entire host to a single tenant.
Inbar Shapira is looking at allowing multiple scheduling policies to be in effect at the same time. This will allow you to have different sets of scheduler filters activated depending on some type of criteria (perhaps the requested availability zone).
Rerngvit Yanggratoke is implementing support for weighting scheduling decisions based on the CPU utilization of existing instances on a host.
Migrations
Nova includes support for different types of migrations. We have cold migrations (migrate) and live migrations (live-migrate). We also have resize and evactuate, which are very related functions. The code paths for all of these features have evolved separately. It turns out that we can rework all of these things to share a lot of code. While we’re at it, we are restructuring the way these operations work to be primarily driven by the nova-conductor service. This will allow the tasks to be tracked in a single place, as opposed to the flow of control being passed around between compute nodes. Having compute nodes tell each other what to do is also a very bad thing from a security perspective. These efforts are well underway. Tiago Rodrigues de Mello is working on moving cold migrations to nova-conductor and John Garbutt is working on moving live migrations. All of this is tracked under the parent blueprint for unified migrations.
And More!
This post doesn’t include every feature on the roadmap. You can find that here. I fully expect that more will be added to this list as Havana progresses. We don’t always know what features are being worked on in advance. If you have another feature you would like to propose, let’s talk about it on the openstack-dev list!
Deployment Considerations for nova-conductor Service in OpenStack Grizzly
The Grizzly release of OpenStack Nova includes a new service, nova-conductor. Some previous posts about this service can be found here and here. This post is intended to provide some additional insight into how this service should be deployed and how the service should be scaled as load increases.
Smaller OpenStack Compute deployments typically consist of a single controller node and one or more compute (hypervisor) nodes. The nova-conductor service fits in the category of controller services. In this style of deployment you would run nova-conductor on the controller node and not on the compute nodes. Note that most of what nova-conductor does in the Grizzly release is doing database operations on behalf of compute nodes. This means that the controller node will have more work to do than in previous releases. Load should be monitored and the controller services should be scaled out if necessary.
Here is a model of what this size of deployment might look like:
As Compute deployments get larger, the controller services are scaled horizontally. For example, there would be multiple instances of the nova-api service on multiple nodes sitting behind a load balancer. You may also be running multiple instances of the nova-scheduler service across multiple nodes. Load balancing is done automatically for the scheduler by the AMQP message broker, RabbitMQ or Qpid. The nova-conductor service should be scaled out in this same way. It can be run multiple times across multiple nodes and the load will be balanced automatically by the message broker.
Here is a second deployment model. This gives an idea of how a deployment grows beyond a single controller node.
There are a couple of ways to monitor the performance of nova-conductor to see if it needs to be scaled out. The first is by monitoring CPU load. The second is by monitoring message queue size. If the queues are getting backed up, it is likely time to scale out services. Both messaging systems provide at least one way to look at the state of message queues. For Qpid, try the qpid-stat command. For RabbitMQ, see the rabbitmqctl list_queues command.
A new Nova service: nova-conductor
The Grizzly release of OpenStack Nova will have a new service, nova-conductor. The service was discussed on the openstack-dev list and it was merged today. There is currently a configuration option that can be turned on to make it optional, but it is possible that by the time Grizzly is released, this service will be required.
One of the efforts that started during Folsom development and is scheduled to be completed in Grizzly is no-db-compute. In short, this effort is to remove direct database access from the nova-compute service. There are two main reasons we are doing this. Compute nodes are the least trusted part of a nova deployment, so removing direct database access is a step toward reducing the potential impact of a compromised compute node. The other benefit of no-db-compute is for upgrades. Direct database access complicates the ability to do live rolling upgrades. We’re working toward eventually making that possible, and this is a part of that.
All of the nova services use a messaging system (usually AMQP based) to communicate with each other. Many of the database accesses in nova-compute can be (and have been) removed by just sending more data in the initial message sent to nova-compute. However, that doesn’t apply to everything. That’s where the new service, nova-conductor, comes in.
The nova-conductor service is key to completing no-db-compute. Conceptually, it implements a new layer on top of nova-compute. It should *not* be deployed on compute nodes, or else the security benefits of removing database access from nova-compute will be negated. Just like other nova services such as nova-api or nova-scheduler, it can be scaled horizontally. You can run multiple instances of nova-conductor on different machines as needed for scaling purposes.
The methods exposed by nova-conductor will initially be relatively simple methods used by nova-compute to offload its database operations. Places where nova-compute previously did database access will now be talking to nova-conductor. However, we have plans in the medium to long term to move more and more of what is currently in nova-compute up to the nova-conductor layer. The compute service will start to look like a less intelligent slave service to nova-conductor. The conductor service will implement long running complex operations, ensuring forward progress and graceful error handling. This will be especially beneficial for operations that cross multiple compute nodes, such as migrations or resizes.
If you have any comments, questions, or suggestions for nova-conductor or the no-db-compute effort in general, please feel free to bring it up on the openstack-dev list.
OpenStack Design Summit and an Eye on Folsom
I just spent a week in San Francisco at the OpenStack design summit and conference. It was quite an amazing week and I’m really looking forward to the Folsom development cycle. You can find a notes from various sessions held at the design summit on the OpenStack wiki.
Essex was the first release that I contributed to. One thing I did was add Qpid support to both Nova and Glance as an alternative to using RabbitMQ. Beyond that, I primarily worked on vulnerability management and other bug fixing. For Folsom, I’m planning on working on some more improvements involving the inter-service messaging layer, also referred to as the rpc API, in Nova.
1) Moving the rpc API to openstack-common
The rpc API in Nova is used for private communication between nova services. As an example, when a tenant requests that a new virtual machine instance be created (either via the EC2 API or the OpenStack compute REST API), the nova-api service sends a message to the nova-scheduler service via the rpc API. The nova-scheduler service decides where the instance is going to live and then sends a message to that compute node’s nova-compute service via the rpc API.
The other usage of the rpc API in Nova has been for notifications. Notifications are asynchronous messages about events that happen within the system. They can be used for monitoring and billing, among other things. Strictly speaking, notifications aren’t directly tied to rpc. A Notifier is an abstraction, of which using rpc is one of the implementations. Glance also has notifications, including a set of Notifier implementations. The code was the same at one point but has diverged quite a bit since.
We would like to move the notifiers into openstack-common. Moving rpc into openstack-common is a prerequisite for that, so I’m going to knock that part out. I’ve already written a few patches in that direction. Once the rpc API is in openstack-common, other projects will be able to make use of it. There was discussion of Quantum using rpc at the design summit, so this will be needed for that, too. Another benefit is that the Heat project is using a copy of Nova’s rpc API right now, but will be able to migrate over to using the version from openstack-common.
2) Versioning the rpc API interfaces
The existing rpc API is pretty lightweight and seems to work quite well. One limitation is that there is nothing provided to help with different versions of services talking to each other. It may work … or it may not. If it doesn’t, the failure you get could be something obvious, or it could be something really bad and bizarre where an operation fails half-way through, leaving things in a bad state. I’d like to clean this up.
The end goal with this effort will be to make sure that as you upgrade from Essex to Folsom, any messages originating from an Essex service can and will be correctly processed by a Folsom service. If that fails, then the failure should be immediate and obvious that a message was rejected due to a version issue.
3) Removing database access from nova-compute
This is by far the biggest effort of the 3 described here, and I won’t be tackling this one alone. I want to help drive it, though. This discussion came up in the design summit session about enhancements to Nova security. By removing direct database access from the nova-compute service, we can help reduce the potential impact if a compute node were to be compromised. There are two main parts to this effort.
The first part is to make more efficient use of messaging by sending full objects through the rpc API instead of IDs. For example, there are many cases where the nova-api service gets an instance object from the database, does its thing, and then just sends the instance ID in rpc message. On the other side it has to go pull that same object out of the database. We have to go through and change all cases like this to include the full object in the message. In addition to the security benefit, it should be more efficient, as well. This doesn’t sound too complicated, and it isn’t really, but it’s quite a bit of work as there is a lot of code that needs to be changed. There will be some snags to deal with along the way, such as dealing with making sure all of the objects can be serialized properly.
Including full objects in messages is only part of the battle here. The nova-compute service also does database updates. We will have to come up with a new approach for this. It will most likely end up being a new service of some sort, that handles state changes coming from compute nodes and makes the necessary database updates according to those state changes. I haven’t fully thought through the solution to this part yet. For example, ensuring that this service maintains proper order of messages without turning into a bottleneck in the overall system will be important.
Onward!
I’m sure I’ll work on other things in Folsom, as well, but those are three that I have on my mind right now. OpenStack is a great project and I’m excited to be a part of it!

