[This post was written by Jeremy Yallop and David Sheets.]
Recent Docker releases (17.04 CE Edge onwards) bring significant performance improvements to bind-mounted directories on macOS. (Docker users on the stable channel will see the improvements in the forthcoming 17.06 release.) Commands for bind-mounting directories have new options to selectively enable caching.
Containers that perform large numbers of read operations in mounted directories are the main beneficiaries. Here’s an illustration of the improvements in a few tools and applications in common use among Docker for Mac users: go list is 2.5× faster; symfony is 2.7× faster, and rake is 3.5× faster, as illustrated by the following graphs:
go list (2.5× speedup)
go list ./... in the
symfony (2.7× speedup)
curl of the main page of the Symfony demo app
rake (3.5× speedup)
rake -T in @hirowatari’s benchmark
For more details about how and when to enable caching, and what’s going on under the hood, read on.
Basics of bind-mounting
A defining characteristic of containers is isolation: by default, many parts of the execution environment of a container are isolated both from other containers and from the host system. In the filesystem, isolation shows up as layering: the filesystem of a running container consists of a series of incremental layers, topped by a container-specific read/write layer that keeps changes made within the container concealed from the outside world.
Isolation as a default encourages careful thinking about the best way to bypass isolation in order to share data with a container. For data-in-motion, Docker offers a variety of ways to connect containers via the network. For data-at-rest, Docker Volumes offer a flexible mechanism to share data between containers, and with the host.
The simplest and most common way to use volumes is to bind-mount a host directory when starting a container — that is, to make the directory available at a specified point in the container’s filesystem. For example, the following command runs the
alpine image, exposing the host directory
/Users/yallop/project within the container as
docker run -v /Users/yallop/project:/project:cached alpine command
In this example, modifications to files under
/project in the container appear as modifications to the corresponding files under
/Users/yallop/project on the host. Similarly, modifications to files under
/Users/yallop/project on the host appear as modifications to files under /project in the container.
There are many use cases for bind mounting. For example, you might
- develop software using an editor on your host, running development tools in a container
- run a periodic job in a container, storing the output in a host directory
- cache large data assets on the host for processing in a container
Bind mounts on Linux
Newcomers to Docker are sometimes surprised to discover that the performance overhead of containers is often close to negligible and in many cases, is significantly lower than other forms of virtualization.
On Linux, bind-mounting a directory, like many Docker features, simply selectively exposes host resources directly to a container. Consequently, access to bind mounts carries little-to-no overhead compared to filesystem access in a regular process.
Bind mounts on Docker for Mac
The Linux kernel makes container-style isolation efficient, but running containers on Docker editions for non-Linux operating systems such as macOS involves several additional moving parts that carry additional overhead.
Docker containers run on top of a Linux kernel, and so the Docker for Mac container runtime system runs a minimal Linux instance using the HyperKit framework. Containers running on top of the Linux system cannot directly access macOS filesystem or networking resources, and so Docker for Mac includes libraries that expose those resources in a way that the Docker engine can consume.
Access to filesystem resources is provided by a separate non-privileged macOS process (osxfs) that communicates with a daemon (“transfused”) running on the virtualized Linux. A Linux system call such as
read that accesses bind-mounted files in a container must be.
- turned into a FUSE message in the Linux VFS
- proxied over a virtio socket by transfused
- forwarded onto a UNIX domain socket by HyperKit
- deserialized, dispatched and executed as a macOS system call by osxfs
The entire process then takes place in reverse to return the result of the macOS system call to the container.
Each step in the process is fairly efficient, making the total round trip time around 100 microseconds. However, some software, written under the usually-correct assumption that system calls are instantaneous, can perform tens of thousands of system calls for each user-facing operation. Even a comparatively low overhead can become irksome when scaled up by four orders of magnitude. Consequently, although syscall latency has been reduced several times since the initial release of Docker for Mac, and although a few opportunities for further reducing latency remain, optimizing latency alone will not completely address bind mount performance for all applications.
File sharing design constraints under Docker for Mac
The design described above arises from a number of constraints, which in turn arise from the high-level design goals of Docker for Mac: it should closely match the Linux execution environment, require minimal configuration, and involve as little privileged system access as possible.
Three constraints in particular underlie the design of Docker for Mac file sharing.
The first constraint is consistency: a running container should always have the same view of a bind-mounted directory as the host system. On Linux consistency comes for free, since bind-mounting directly exposes a directory to a container. On macOS maintaining consistency is not free: changes must be synchronously propagated between container and host.
The second constraint is event propagation: several common workflows rely on containers receiving inotify events when files change on the host, or on the host receiving events when the container makes changes. Again, event propagation is automatic and free on Linux, but Docker for Mac must perform additional work to ensure that events are propagated promptly and reliably.
These constraints rule out a number of alternative solutions. Using
rsync to copy files into a container provides fast access, but does not support consistency. Mounting directories into containers using NFS works well for some use cases, but does not support event propagation. Reverse-mounting container directories onto the host might provide good performance for some workloads, but would require a very different interface.
The design constraints above describe useful defaults. In particular, a system that was not consistent by default would behave in ways that were unpredictable and surprising, especially for casual users, for users used to the Linux implementation, and for software invoking docker on the host.
However, not all applications need the guarantees which arise for free from the Linux implementation. In particular, although the Linux implementation guarantees that the container and host have consistent views at all times, temporary inconsistency between container and host is sometimes acceptable. Allowing temporary inconsistency makes it possible to cache filesystem state, avoiding unnecessary communication between the container and macOS, and increasing performance.
Different applications require different levels of consistency. Full consistency is sometimes essential, and remains the default. However, to support cases where temporary inconsistency is an acceptable price to pay for improved performance, Docker 17.04 CE Edge includes new flags for the
- consistent: Full consistency. The container runtime and the host maintain an identical view of the mount at all times. This is the default, as described above.
- cached: The host’s view of the mount is authoritative. There may be delays before updates made on the host are visible within a container.
For example, to enable cached mode for the bind-mounted directory above, you might write
docker run -v /Users/yallop/project:/project:cached alpine command
And caching is enabled on a per-mount basis, so you can mount each directory in a different mode:
docker run -v /Users/yallop/project:/project:cached \ -v /host/another-path:/mount/another-point:consistent \ alpine command
The osxfs documentation has more details about the guarantees provided by consistent and cached. On Linux, where full consistency comes for free, cached behaves identically to consistent.
We have seen significant improvements in the performance of several common applications when directories are mounted in the new cached mode.
For the moment, read-heavy workloads will benefit most from caching. Improvements in the performance of write-heavy workloads, including a popular dd-based benchmark, are under development.
Test cases involving real world applications are a big help in guiding Docker for Mac development. So, if you have field reports or other comments about file sharing performance, we’d love to hear from you.