Mitigating CVE-2026-31431 (“Copy Fail”) in Docker Engine

投稿日: 5月 27日, 2026年

CVE-2026-31431 is a Linux kernel vulnerability that was recently disclosed. This CVE does not compromise Docker infrastructure.

That said, Docker Engine’s default profiles prior to v29.4.3 allowed containers to create AF_ALG sockets, which is the syscall surface the exploit uses. You are not exposed if you are running Docker Engine v29.4.3 or later, OR a patched host kernel. If either of those is missing, you have exposure on that host, and you should read the rest of this post.

As of writing, the kernel patch is available on Debian (CVE-2026-31431) and RHEL 9 (RHSB-2026-002) but not yet on Ubuntu. For users on distros that haven’t shipped a kernel fix, upgrading Docker Engine is the mitigation you can apply today.

Why you should read about Copy-Fail

This CVE drew a lot of attention because the exploit became public before many Linux distributions had kernel patches available. As a result, most distros were still vulnerable and had no ready fix at the time of disclosure. It was especially notable because the bug affected Linux kernels going back to around 2017, making the potential impact unusually broad.

On the Docker Engine team, I started investigating what we could do from our end to protect users on vulnerable hosts. It turned out the mitigation was more involved than it first looked, and the first attempt broke 32-bit binaries. This post is what we shipped, what broke, what we learned, and where things stand now.

What Copy Fail is

On April 29, researchers disclosed CVE-2026-31431, dubbed “Copy Fail,” a privilege escalation vulnerability in the Linux kernel’s AF_ALG crypto subsystem.

The flaw is in the algif_aead module. It allows any unprivileged user with access to an AF_ALG socket to perform controlled writes to the page cache. Since the page cache backs file reads across the entire system, an attacker can temporarily modify the contents of any readable file as seen by every process on the host. Corrupting a setuid binary is the most direct path to local root, but the primitive itself is more general.

The exploit is trivial and works on every unpatched Linux kernel shipped since 2017.

The correct fix is a kernel update. The mitigations described below reduce exposure for containers running on unpatched kernels, but they do not fix the underlying vulnerability. If your kernel vendor has released a patch, apply it.

What does this mean for containers?

Inside a container running with default security profiles, an attacker with code execution can use Copy Fail to corrupt pages in the page cache. One possible outcome is escalating to root inside the container by corrupting setuid binaries.

But the page cache is shared across the host, so the impact is not confined to the attacker’s container. Modified pages are visible to the host and to every other container that maps the same file, including shared image layers. Other workloads on the same node can be affected.

The attack does not require any special capabilities or privileges beyond what a default container provides. The only requirement is the ability to create an AF_ALG socket, which was previously allowed by Docker’s default security profiles.

First attempt: seccomp (v29.4.2)

We updated Docker Engine’s default seccomp profile to block AF_ALG sockets. The seccomp filter inspects the first argument to socket(2) and denies address families AF_ALG and AF_VSOCK (which was already blocked).

Blocking socket(2) is not enough on its own. There is another way to create sockets on x86_64 Linux: socketcall(2), an older multiplexed syscall that wraps socket, bind, connect, and other socket operations behind a single syscall number.

There is another way to create sockets on Linux: socketcall(2), an older multiplexed syscall that wraps socket, bind, connect, and other socket operations behind a single syscall number.

The problem for seccomp is that socketcall packs the real arguments (including the address family) into a userspace array and passes a pointer, which BPF cannot dereference and inspect. There is no way to selectively block AF_ALG through socketcall with seccomp.

Linux 4.3 already added direct socket syscalls for i386 and s390, so we assumed most modern binaries would already use the new socket syscall and that socketcall would only matter for old binaries. So we blocked it entirely and shipped Docker Engine v29.4.2 (release notes).

What broke

The socketcall deny turned out to be too broad.

Older versions of glibc on i386 route all socket operations through socketcall, the Go runtime uses it unconditionally for GOARCH=386 (independent of glibc), and many legacy and gaming workloads (SteamCMD, Wine) depend on it.

Blocking socketcall broke networking for a lot of 32-bit binaries running inside a container (moby/moby#52506).

And this is not just an i386 problem. On amd64, any process can switch into ia32 compatibility mode with int $0x80 and invoke socketcall directly, bypassing the socket(2) arg filter entirely. You do not need a 32-bit container or a 32-bit binary to reach that path.

Affected containers could work around this by using a custom seccomp profile that re-enables socketcall while keeping AF_ALG blocked for the direct socket(2) path.

But that just pokes a hole in the hardening for those containers, since an attacker inside them could still reach AF_ALG through socketcall.

Second attempt: LSM-based enforcement (v29.4.3)

The fundamental problem is that seccomp operates at the syscall boundary, and socketcall multiplexes many operations behind a single syscall number with pointer arguments. You cannot selectively block AF_ALG through socketcall with seccomp alone.

AppArmor and SELinux operate on a different level. Linux Security Modules hook directly into the kernel’s security_socket_create() callback, which fires when the kernel actually creates the socket object, regardless of which syscall entry point was used. An LSM can deny AF_ALG specifically while leaving all other socketcall usage intact.

In v29.4.3 (release notes), we:

  1. Reverted the socketcall seccomp deny to restore 32-bit compatibility.
  2. Added deny network alg, to the default AppArmor profile (moby/profiles#22).
    On systems with AppArmor enabled (e.g. Ubuntu, Debian), this blocks AF_ALG through both socket(2) and socketcall(2).
  3. Integrated a SELinux CIL policy module for systems running SELinux (Fedora, RHEL, CentOS).
    The module denies alg_socket creation for all container_domain types and can be loaded via semodule.
    SELinux enforcement requires the daemon to be running with --selinux-enabled.
  4. Kept the seccomp socket(AF_ALG) arg filter as defense-in-depth for the direct socket(2) syscall path.

What you should do

  1. Patch your kernel.
    This is the real fix.
    Check with your distribution for a kernel update that addresses CVE-2026-31431.
  2. Upgrade Docker Engine to v29.4.3 or later. You get the updated seccomp + AppArmor + SELinux defaults. A systemctl restart docker (or equivalent) is enough; no host reboot required.
  3. If you cannot upgrade the kernel or the engine immediately:
  • Blacklist the kernel modules: add blacklist af_alg and blacklist algif_aead to /etc/modprobe.d/.
    This only works if the modules are built as loadable modules (CONFIG_CRYPTO_USER_API=m), not compiled into the kernel.
  • Apply a custom seccomp profile that denies AF_ALG using --security-opt seccomp=/path/to/profile.json or the seccomp-profile option in daemon.json.

Closing thoughts

Security comes in layers, and sometimes no single layer is enough. Seccomp blocks socket(AF_ALG) on every system but is blind to socketcall. AppArmor and SELinux block both paths, but they depend on host configuration. Together, they cover what neither can alone.

On systems without an LSM, the socketcall path remains unblocked from Docker’s side. Ultimately, the kernel bug is what needs to be fixed.

Kernel vulnerabilities will keep coming. When they do, the container runtime is often the fastest place to deploy a mitigation, because updating the engine is one change that protects every container on the host. The Copy Fail timeline made that especially clear: the embargo broke before distros had fixes ready, and for several days the engine was the only place users could mitigate anything without waiting for a kernel rebuild.

Keeping Docker Engine up to date is not just about new features. It is one of the most effective ways to shrink the window between a kernel CVE going public and your workloads being protected against it.

関連記事