ZDI-20-1440: An Incorrect Calculation Bug in the Linux Kernel eBPF Verifier

January 19, 2021 | Lucas Leong

In April 2020, the ZDI received a Linux kernel submission that turned out to be an incorrect calculation bug in the extended Berkeley Packet Filter (eBPF) verifier. If you’re not familiar with it, eBPF is a Linux subsystem that is designed to safely execute untrusted, user-defined extensions inside the kernel for purposes such as packet filtering. It relies on static analysis to protect the kernel against problematic extensions. The submission we received from Ryota Shiga (@Ga_ryo_) of Flatt Security bypasses the eBPF verification and can lead to out-of-bounds (OOB) access in the Linux kernel. The eBPF verifier is a well-known source of Linux kernel local privilege escalation vulnerabilities and has been seen in many cases in the past, including being used at Pwn2Own 2020.

This vulnerability affects the current Linux kernel long term version from 4.9 to 4.13. One particular distribution, Debian 9, is currently using an affected kernel version. The ZDI is disclosing this bug publicly as ZDI-20-1440 without a patch in accordance with our 120-day disclosure policy.

The Vulnerability

If you are not familiar with the eBPF verifier, we highly recommend the write-up by Manfred Paul (@_manfp). There are two passes of verifications before executing any BPF programs. The first pass (check_cfg()) ensures the code is loop-free. The second pass (do_check()) attempts to determine if there are any invalid instructions or possible memory violations. Emulation is used to check for possible memory violations. The incorrect calculation described here comes from opcode BPF_RSH during the second pass. The following excerpts are based on 4.9.249.

The BPF_RSH (unsigned right shift) instruction belongs to the BPF_ALU64 class of instructions. When emulating BPF_RSH, do_check calls check_alu_op at (1), which then calls adjust_reg_min_max_vals at (2). At (3) and (4), it tries to update the minimum and maximum value of dst_reg based upon how the shift operation will modify dst_reg. Note that the local variables min_value and max_value contain the known bounds of the operand that specifies the shift distance. There are corresponding fields named min_value and max_value that hold the known bounds of dst_reg.

However, the calculations at (3) and (4) are wrong. For example, to calculate max(a >> b) (the maximum possible value of a when right-shifted by b bits), the correct formula is max(a) >> min(b). (To understand why, consider that a right shift is equivalent to division by a power of two. The largest possible result is produced by choosing the largest possible numerator and the smallest possible denominator.) Instead, the code at (4) calculates max(a) >> max(b). A corresponding mistake is present at (3).

The consequences of bounds miscalculation during eBPF verification are catastrophic. If the attacker later uses dst_reg as the address for a load or store, the verification in (5) below will be bypassed.

Once the eBPF program passes verification, it will execute in the kernel, and the attacker can achieve an out-of-bounds memory access, as seen in (6) below.

The Trigger

Before triggering the bug, we have to first create two bpf maps with bpf_create_map(). A bpf map is a memory region designated to be accessible from within eBPF code. One map is for triggering the bug, while the other is the target for OOB access. The following opcodes perform preliminary work:

The BPF_FUNC_map_lookup_elem function returns a pointer to a location in a bpf map. After execution of the code shown above, BPF_REG_8 and BPF_REG_9 are set to the values from map1[1] and map1[2] respectively. They will be used as operands for BPF_RSH. The final BPF_GET_MAP shown above loads BPF_REG_0 with a pointer to map2[0].

The next step is to get the verifier to recognize that the operands to BPF_RSH will be bounded within a certain range. Here are the opcodes to limit the range of the registers by using branches.

The verifier will correctly deduce that execution cannot fall through past these instructions unless 0 <= REG_8 <= 0x1000 and 0 <= REG_9 <= 1024. (Note that JA means “jump always”, not “jump if above” as in x86.)

It's time to trigger the bug.

After the BPF_RSH instruction, BPF_REG_8 can still have a value as high as 0x1000. But due to the incorrect computation discussed above, the verifier concludes that the maximum possible value of BPF_REG_8 is now 0. On the basis of this, the verifier incorrectly concludes that the memory operation at (B) is guaranteed to be safe.

BPF_STX_MEM at (B) will perform an OOB write on map2 with an arbitrary offset specified by BPF_REG_8.

However, there is one additional precondition. Recall from above that when encountering an instruction that operates on memory, the verifier performs checks in a function named check_mem_access(). When the address of the memory operation is controlled by a register, check_mem_access() additionally ensures that the verifier has already marked the register contents as PTR_TO_MAP_VALUE or PTR_TO_MAP_VALUE_ADJ. The verifier will only set this mark if the allow_ptr_leaks flag is enabled in the environment, and to enable this flag, the caller must have the CAP_SYS_ADMIN capability.

This means CAP_SYS_ADMIN is required to trigger the bug, even if the eBPF program is attached to a socket owned by the attacker.

Conclusion

Although the precondition reduces the impact and risk, it would still be better to apply this mitigation, or even better, upgrade the kernel to an unaffected version. Our team will try to follow up on the patch when it is released. Thanks again to Ryota Shiga of Flatt Security for submitting this bug. He’s submitted a few other reports to the program, and each has been great. We hope to see more from him in the future.

You can find me on Twitter @_wmliang_, and follow the team for the latest in exploit techniques and security patches.