High availability (HA) isn’t just about keeping the lights on all the time; it’s also about quickly turning them back on when they unexpectedly go out. With software, this means capabilities for fault tolerance as well as backup and recovery. Docker Datacenter (DDC) provides this for both the container-based applications as well as the application infrastructure components (such as cluster management, orchestration, account settings, etc.). In this post we will look at how high availability is achieved in the latest release of Docker Datacenter.
As a refresher, Docker Datacenter is comprised of the following software:
- Universal Control Plane (UCP) with Swarm for cluster orchestration and management
- Docker Trusted Registry (DTR) for secure image collaboration and distribution
- Docker Engine with commercial support to run your containerized apps
Setting up HA on your DDC Deployment
Architecturally, let’s start with how HA is achieved for your DDC infrastructure. It all begins with setting Universal Control Plane. In UCP, HA is achieved by setting up multiple UCP controllers, each of which runs on its own host. UCP makes use of a distributed key value store to ensure that each controller is updated with the latest information on the cluster.
As of UCP 1.1, each controller is identical to the original, ensuring that if any controller goes down, the cluster is still running and thus configuration and state are preserved. Previously, replicas of the UCP controller did not replicate the Certificate Authorities (CA’s); the cluster was still preserved if the primary controller failed, but certain actions related to adding nodes or generating user bundles were limited until the primary controller was brought back up. These limitations no longer occur as the CA’s are also replicated in the controllers.
In general, a cluster with N controllers can tolerate (N-1)/2 failures. In the diagram below, 3 UCP controllers allows for one of them to go down while still preserving the cluster. Losing an additional controller will cause the key-value store to lose quorum, thus breaking the cluster. Adding more controllers (up to 7) will increase fault-tolerance, but requires more hosts and can slow down the cluster. After installing the controllers, you can add in additional UCP nodes for user applications as needed for your deployment.
Now that UCP is installed, we can start installing Docker Trusted Registry. DTR 2.0 has a fully redesigned cluster architecture that makes use of replicas for high availability. These replicas contain a distributed key value store and replicated database, and talk to each other over an overlay network. Similarly to UCP, installing N DTR replicas allows the system to tolerate (N-1)/2 failures.
A couple of tips for setup:
- DTR 2.0 uses UCP for orchestration, authentication, and monitoring, and thus UCP must be installed first in order to bring up DTR.
- It is strongly recommended to put the UCP controllers and DTR replicas on separate nodes. This is to ensure that a failure in one solution does not affect the other.
- It is also generally recommended in large-scale production deployments to keep UCP nodes on separate hosts from the controllers/replicas, in order to ensure that application issues do not affect the integrity of the UCP cluster. This may not be necessary in test environments or small-scale deployments.
Backup and Restore
Now that you have your architecture set up, you need the ability to save the state of your cluster–data such as user accounts and configuration settings–in order to recover from a failure. This is where UCP’s new backup/restore CLI commands come in. Using the UCP CLI tool you can take backups of a UCP controller, which saves the state of the cluster in a .tar file. Let’s say you have UCP HA deployment consisting of Controllers A, B, and C. In case of a failure, here’s you would recover:
- Make sure you have previously taken backups of any one of the controllers. In this case, let’s say we have been taking regular backups of Controller A.
- Use the UCP CLI tool “stop” command to stop all UCP system containers on the controllers you have not backed up–in this case, on Controllers B and C.
- Run the UCP CLI tool “restore” command on Controller A.
- Run the UCP CLI tool “uninstall” command on Controllers B and C, then rerun the “join –replica” command on these controllers.
- You should now have a restored cluster!
This functionality is best used for recovering from catastrophic host failures with previous backups. However in some scenarios it may be possible to take a backup after controller failure, particularly in a case where the cluster is broken due to loss of quorum from the controllers.
High Availability for Applications
So far we’ve been talking about UCP and DTR architectural HA. But what about the actual app containers? What do you do if a UCP node running some of your app containers fails? This is where container rescheduling comes in. This feature was experimental in previous versions of Swarm but is now generally available in Swarm v1.2 (which is used by UCP 1.1). With container rescheduling, you can set a label or environment variable to a container that tells Swarm to reschedule a container (i.e. restart on a different node) if the node it is currently on ever goes down.
For more on how to do this, read the Swarm container rescheduling documentation or watch the demo below:
We hope you’ve found this post useful for how you can use Docker Datacenter to provide high availability for both your application infrastructure and containers. Give it a spin via the links below or feel free to ask any questions on the forums.
Additional Resources on Docker Datacenter
- New to Docker Datacenter? Start a free 30 day trial today.
- Register for a webinar: What’s New in Docker Datacenter on May 10th
- Learn more about Docker Datacenter and Containers as a Service (CaaS)
- Read the release notes for Universal Control Plane 1.1 and Trusted Registry 2.0
- Discuss on the UCP forums and DTR forums
Learn More about Docker
- New to Docker? Try our 10 min online tutorial
- Share images, automate builds, and more with a free Docker Hub account
- Read the Docker 1.11 Release Notes
- Subscribe to Docker Weekly
- Sign up for upcoming Docker Online Meetups
- Attend upcoming Docker Meetups
- Watch DockerCon EU 2015 videos
- Start contributing to Docker